~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~~ Executable file format ~~ ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ Before we do anything specific to the actual program we're building, we ~ do a lot of ELF-specific stuff to ensure that our output is in a format ~ Linux knows how to run. ~ ~ This relies on the label facility defined in labels.e. Make sure to load ~ that first. ~ ~~~~~~~~~~~~~~~~~~~~~ ~ ~~ ELF file header ~~ ~ ~~~~~~~~~~~~~~~~~~~~~ ~ ~ First, we output ELF's top-level file header. This header describes the ~ entire file. An ELF always has exactly one of this header, which is always ~ at the start of the file. ~ ~ The program we're building should call this word as the first output it ~ generates. ~ ~ The only interesting thing here is the entry pointer. : elf-file-header ~ * denotes mandatory fields according to breadbox current-offset 3unroll 0x7f pack8 s" ELF" pack-raw-string ~ *magic number 2 pack8 ~ 64-bit 1 pack8 ~ little-endian 1 pack8 ~ ELF header format v1 0 pack8 ~ System-V ABI 0 pack64 ~ (padding) 2 pack16 ~ *executable 0x3e pack16 ~ *Intel x86-64 1 pack32 ~ ELF format version L@' cold-start L@' origin + pack64 ~ *entry point ~ This includes the origin, intentionally. L@' elf-program-header pack64 ~ *program header offset ~ We place the program header immediately after the ELF header. This ~ offset is from the start of the file. 0 pack64 ~ section header offset 0 pack32 ~ processor flags L@' elf-header-size pack16 ~ ELF header size L@' elf-program-header-size pack16 ~ *program header entry size 1 pack16 ~ *number of program header entries 0 pack16 ~ section header entry size 0 pack16 ~ number of section header entries 0 pack16 ~ section name string table index ~ Though hardcoding the size of this header would work fine, it's easier ~ to use the label system to keep track of its size. The only place this is ~ actually referenced is right here in the header. current-offset 4 roll - L!' elf-header-size ; ~ ~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~~ ELF program header ~~ ~ ~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ Second, we output ELF's program header, which lists the memory regions ~ ("segments") we want to have and where we want them to come from. There may ~ be any number of these entries, one per segment, , and they may be anywhere ~ in the file as long as they're consecutive. ~ ~ We list just a single region, which is the entire contents of the ELF file ~ from disk, and we put the program header immediately after the file header. ~ The program we're building should call this word as the second output it ~ generates. ~ ~ It would be more typical to use this header to ask the loader to give us ~ separate code and data segments, and perhaps a stack or heap, but this keeps ~ things simple, and we can create those things for ourselves later. ~ ~ We do have a little stack space available, though we don't explicitly ~ request any; the kernel allocates it for us as part of exec() so that it can ~ pass us argc and argv (which we ignore). That stack space will be at a ~ random address, different every time, because of ASLR; that's a neat ~ security feature, so we leave it as-is. Note that ASLR doesn't happen when ~ you run under gdb, so if you aren't seeing it, that's probably why. ~ ~~ : elf-program-header ~ * denotes mandatory fields according to breadbox current-offset L!' elf-program-header current-offset 3unroll 1 pack32 ~ *"loadable" segment type 0x05 pack32 ~ *read+execute permission 0 pack64 ~ *offset in file L@' origin pack64 ~ *virtual address ~ required, but can be anything, subject to alignment 0 pack64 ~ physical address (ignored) L@' total-size pack64 ~ *size in file L@' total-size pack64 ~ *size in memory 0 pack64 ~ segment alignment ~ for relocation, but this doesn't apply to us ~ As with the file header, we use the label system to keep track of the ~ program header's size. current-offset 4 roll - L!' elf-program-header-size ; ~ ~~~~~~~~~~~~~~~~ ~ ~~ That's it! ~~ ~ ~~~~~~~~~~~~~~~~ ~ ~ ELF is a simple format, really. Now you can output your own machine code ~ that you generate however you want; make sure to define the labels "origin" ~ and "cold-start". Origin will control the address the code loads at; ~ cold-start will be the first thing that runs. The origin is arbitrary, but ~ can't be zero.