summary refs log tree commit diff
path: root/elf.e
blob: 68eb6f2b58747245990890385d6bf9d1aee6708e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ ~~ Executable file format ~~
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~   Before we do anything specific to the actual program we're building, we
~ do a lot of ELF-specific stuff to ensure that our output is in a format
~ Linux knows how to run.
~
~   This relies on the label facility defined in labels.e. Make sure to load
~ that first.

~ ~~~~~~~~~~~~~~~~~~~~~
~ ~~ ELF file header ~~
~ ~~~~~~~~~~~~~~~~~~~~~
~
~   First, we output ELF's top-level file header. This header describes the
~ entire file. An ELF always has exactly one of this header, which is always
~ at the start of the file.
~
~   The program we're building should call this word as the first output it
~ generates.
~
~   The only interesting thing here is the entry pointer.

: elf-file-header
  ~ * denotes mandatory fields according to breadbox
  current-offset 3unroll

  0x7f pack8 s" ELF" pack-raw-string    ~ *magic number
  2 pack8                               ~ 64-bit
  1 pack8                               ~ little-endian
  1 pack8                               ~ ELF header format v1
  0 pack8                               ~ System-V ABI
  0 pack64                              ~ (padding)

  2 pack16                              ~ *executable
  0x3e pack16                           ~ *Intel x86-64
  1 pack32                              ~ ELF format version

  L@' cold-start L@' origin + pack64        ~ *entry point
    ~ This includes the origin, intentionally.

  L@' elf-program-header pack64         ~ *program header offset
    ~ We place the program header immediately after the ELF header. This
    ~ offset is from the start of the file.
  0 pack64                              ~ section header offset
  0 pack32                              ~ processor flags

  L@' elf-header-size pack16            ~ ELF header size
  L@' elf-program-header-size pack16    ~ *program header entry size
  1 pack16                              ~ *number of program header entries
  0 pack16                              ~ section header entry size
  0 pack16                              ~ number of section header entries
  0 pack16                              ~ section name string table index

  ~   Though hardcoding the size of this header would work fine, it's easier
  ~ to use the label system to keep track of its size. The only place this is
  ~ actually referenced is right here in the header.
  current-offset 4 roll - L!' elf-header-size ;


~ ~~~~~~~~~~~~~~~~~~~~~~~~
~ ~~ ELF program header ~~
~ ~~~~~~~~~~~~~~~~~~~~~~~~
~
~   Second, we output ELF's program header, which lists the memory regions
~ ("segments") we want to have and where we want them to come from. There may
~ be any number of these entries, one per segment, , and they may be anywhere
~ in the file as long as they're consecutive.
~
~   We list just a single region, which is the entire contents of the ELF file
~ from disk, and we put the program header immediately after the file header.
~ The program we're building should call this word as the second output it
~ generates.
~
~   It would be more typical to use this header to ask the loader to give us
~ separate code and data segments, and perhaps a stack or heap, but this keeps
~ things simple, and we can create those things for ourselves later.
~
~   We do have a little stack space available, though we don't explicitly
~ request any; the kernel allocates it for us as part of exec() so that it can
~ pass us argc and argv (which we ignore). That stack space will be at a
~ random address, different every time, because of ASLR; that's a neat
~ security feature, so we leave it as-is. Note that ASLR doesn't happen when
~ you run under gdb, so if you aren't seeing it, that's probably why.

~ ~~
: elf-program-header
  ~ * denotes mandatory fields according to breadbox
  current-offset L!' elf-program-header
  current-offset 3unroll

  1 pack32                              ~ *"loadable" segment type
  0x05 pack32                           ~ *read+execute permission
  0 pack64                              ~ *offset in file
  L@' origin pack64                     ~ *virtual address
    ~ required, but can be anything, subject to alignment
  0 pack64                              ~ physical address (ignored)

  L@' total-size pack64                 ~ *size in file
  L@' total-size pack64                 ~ *size in memory

  0 pack64                              ~ segment alignment
    ~ for relocation, but this doesn't apply to us

  ~   As with the file header, we use the label system to keep track of the
  ~ program header's size.
  current-offset 4 roll - L!' elf-program-header-size ;

~ ~~~~~~~~~~~~~~~~
~ ~~ That's it! ~~
~ ~~~~~~~~~~~~~~~~
~
~   ELF is a simple format, really.  Now you can output your own machine code
~ that you generate however you want; make sure to define the labels "origin"
~ and "cold-start". Origin will control the address the code loads at;
~ cold-start will be the first thing that runs. The origin is arbitrary, but
~ can't be zero.