From 1fdeeb54b127fcd600d16c48c3b1b90e91f2ca28 Mon Sep 17 00:00:00 2001 From: Irene Knapp Date: Thu, 7 May 2026 18:56:23 -0700 Subject: document labels.e; also clean up elf.e the documentation in labels.e is entirely new, synthesized from informal private discussions. this is also intended as a final pass to make sure all the comments and nuances in the ELF code from quine.asm are incorporated in elf.e. also this uses the new `L@'` and `L!'` facilities for terseness Force-Push: yes Change-Id: Ieabb2bb26f4b83260f0072dcdcd0950f9aa9fab2 --- elf.e | 159 ++++++++++++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 115 insertions(+), 44 deletions(-) (limited to 'elf.e') diff --git a/elf.e b/elf.e index 4224b50..c38f740 100644 --- a/elf.e +++ b/elf.e @@ -1,57 +1,128 @@ -~ ~~ -~ ~~ ELF header -~ ~~ -~ ~~ This is the top-level ELF header, for the entire file. An ELF always -~ ~~ has exactly one of this header, which is always at the start of the file. -~ ~~ +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ ~~ Executable file format ~~ +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ +~ Before we do anything specific to the actual program we're building, we +~ do a lot of ELF-specific stuff to ensure that our output is in a format +~ Linux knows how to run. +~ +~ This relies on the label facility defined in labels.e. Make sure to load +~ that first. + +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ ~~ Runtime memory origin ~~ +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ +~ First, we pick an origin to load at. This is arbitrary, but it can't be +~ zero. We define a constant word for it so the body of the program can use +~ it in label calculations in whatever ways it needs to. + +: origin 0x08000000 ; + + +~ ~~~~~~~~~~~~~~~~~~~~~ +~ ~~ ELF file header ~~ +~ ~~~~~~~~~~~~~~~~~~~~~ +~ +~ Second, we output ELF's top-level file header. This header describes the +~ entire file. An ELF always has exactly one of this header, which is always +~ at the start of the file. +~ +~ The program we're building should call this word as the first output it +~ generates. +~ +~ The only interesting thing here is the entry pointer. + : elf-file-header - 0x7f pack8 s" ELF" pack-raw-string ~ magic number - 2 pack8 ~ 64-bit - 1 pack8 ~ little-endian - 1 pack8 ~ ELF header format v1 - 0 pack8 ~ System-V ABI - 0 pack64 ~ (padding) - - 2 pack16 ~ executable - 0x3e pack16 ~ Intel x86-64 - 1 pack32 ~ ELF format version - - L' start use-label origin + pack64 ~ entry point + ~ * denotes mandatory fields according to breadbox + current-offset 3unroll + + 0x7f pack8 s" ELF" pack-raw-string ~ *magic number + 2 pack8 ~ 64-bit + 1 pack8 ~ little-endian + 1 pack8 ~ ELF header format v1 + 0 pack8 ~ System-V ABI + 0 pack64 ~ (padding) + + 2 pack16 ~ *executable + 0x3e pack16 ~ *Intel x86-64 + 1 pack32 ~ ELF format version + + L@' cold-start origin + pack64 ~ *entry point ~ This includes the origin, intentionally. - L' program-header use-label pack64 ~ program header offset + L@' elf-program-header pack64 ~ *program header offset ~ We place the program header immediately after the ELF header. This ~ offset is from the start of the file. - 0 pack64 ~ section header offset - 0 pack32 ~ processor flags - 64 pack16 ~ ELF header size - 56 pack16 ~ program header entry size - 1 pack16 ~ number of program header entries - 0 pack16 ~ section header entry size - 0 pack16 ~ number of section header entries - 0 pack16 ~ section name string table index - ; + 0 pack64 ~ section header offset + 0 pack32 ~ processor flags + + L@' elf-header-size pack16 ~ ELF header size + L@' elf-program-header-size pack16 ~ *program header entry size + 1 pack16 ~ *number of program header entries + 0 pack16 ~ section header entry size + 0 pack16 ~ number of section header entries + 0 pack16 ~ section name string table index + + ~ Though hardcoding the size of this header would work fine, it's easier + ~ to use the label system to keep track of its size. The only place this is + ~ actually referenced is right here in the header. + current-offset 4 roll - L!' elf-header-size ; + + +~ ~~~~~~~~~~~~~~~~~~~~~~~~ +~ ~~ ELF program header ~~ +~ ~~~~~~~~~~~~~~~~~~~~~~~~ +~ +~ Third, we output ELF's program header, which lists the memory regions +~ ("segments") we want to have and where we want them to come from. There may +~ be any number of these entries, one per segment, , and they may be anywhere +~ in the file as long as they're consecutive. +~ +~ We list just a single region, which is the entire contents of the ELF file +~ from disk, and we put the program header immediately after the file header. +~ The program we're building should call this word as the second output it +~ generates. +~ +~ It would be more typical to use this header to ask the loader to give us +~ separate code and data segments, and perhaps a stack or heap, but this keeps +~ things simple, and we can create those things for ourselves later. +~ +~ We do have a little stack space available, though we don't explicitly +~ request any; the kernel allocates it for us as part of exec() so that it can +~ pass us argc and argv (which we ignore). That stack space will be at a +~ random address, different every time, because of ASLR; that's a neat +~ security feature, so we leave it as-is. Note that ASLR doesn't happen when +~ you run under gdb, so if you aren't seeing it, that's probably why. -~ ~~ -~ ~~ Program header -~ ~~ -~ ~~ An ELF program header consists of any number of these entries; they are -~ ~~ always consecutive, but may be anywhere in the file. We always have -~ ~~ exactly one, and it's always right after the ELF file header. ~ ~~ : elf-program-header - current-offset L' program-header set-label - 1 pack32 ~ "loadable" segment type - 0x05 pack32 ~ read+execute permission - 0 pack64 ~ offset in file - origin pack64 ~ virtual address + ~ * denotes mandatory fields according to breadbox + current-offset L!' elf-program-header + current-offset 3unroll + + 1 pack32 ~ *"loadable" segment type + 0x05 pack32 ~ *read+execute permission + 0 pack64 ~ *offset in file + origin pack64 ~ *virtual address ~ required, but can be anything, subject to alignment - 0 pack64 ~ physical address (ignored) + 0 pack64 ~ physical address (ignored) - L' total-size use-label pack64 ~ size in file - L' total-size use-label pack64 ~ size in memory + L@' total-size pack64 ~ *size in file + L@' total-size pack64 ~ *size in memory - 0 pack64 ~ segment alignment + 0 pack64 ~ segment alignment ~ for relocation, but this doesn't apply to us - ; + + ~ As with the file header, we use the label system to keep track of the + ~ program header's size. + current-offset 4 roll - L!' elf-program-header-size ; + +~ ~~~~~~~~~~~~~~~~ +~ ~~ That's it! ~~ +~ ~~~~~~~~~~~~~~~~ +~ +~ ELF is a simple format, really. Now you can output your own machine code +~ that you generate however you want; make sure to define the label +~ cold-start, which will be the first thing that runs. -- cgit 1.4.1