summary refs log tree commit diff
path: root/labels.e
blob: 4e315f76a41c0b0bb06b3580a7847ef3d0d580b1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ ~~ Machine label facility ~~
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~   Compilers and assemblers always have a need to resolve symbolic names for
~ parts of their code to numeric values. Usually, these symbolic names are
~ called labels.
~
~   It's a surprisingly deep problem. The output itself can end up depending
~ on the actual values, for a variety of reasons. For example, the size of
~ a relative-jump instruction may vary depending on how far away its target
~ is, which changes the position of everything after. Or, there may be a size
~ limit on some particular segment and code may have to be entirely
~ reorganized to make things fit, splitting it into pieces or adding
~ trampolines.
~
~   Often, this winds up being a hidden layer with immense complexity,
~ implemented as part of a linker, that even compiler maintainers find obscure
~ and confusing.
~
~   It's Irenes' position that we should engage with complexity rather than
~ encysting it. So, in Evocation we take a different approach, heavily
~ inspired by the semantics of a tool called flatassembler. We provide
~ downstream code with the basic operations set-label and use-label which it
~ can combine in whatever order it wants. We also provide a harness,
~ label-loop, which takes an execution token for the downstream code
~ generator and runs it over and over, computing the mathematical fixed point
~ of the label assignments.
~
~   On the first pass, label-loop guesses a value of zero for any label that's
~ used before it's set. On subsequent passes, each label starts with the value
~ it had on the previous pass. When we have a pass where labels end with the
~ same value they started with, we say they've converged, and we announce
~ success. If a hundred passes go by without convergence, we fail instead.
~
~   Importantly, the label facility does not have an opinion on where or how
~ the values of labels should be output, nor does it have concepts of
~ dependency tracking or topological sorting. It only cares about the sequence
~ of set- and use-operations. Code relying on the facility is free to make
~ arbitrarily complex decisions about the layout, size, contents etc of its
~ output, and to use label values as part of those decisions. It is the
~ responsibility of this relying code to ensure the process converges, by
~ whatever means it wishes.
~
~   Since the label facility is not directly involved in output, it is also
~ permissible for code relying on it to make retroactive changes to the
~ output, as long as all such changes are completed before returning control
~ to label-loop.
~
~   Most of the time, the shorthand words L@' and L!' will be all you need to
~ use from your own code.

~ TODO this should go somewhere else :)
~ (current output point, string pointer --)
: pack-raw-string
  { unpack8 dup } { 3roll swap pack8 swap } while drop drop ;

~   The labels data structure is a linked list of dictionary entries, with
~ the same header format as the main Evocation dictionary, but instead of
~ being executable, each holds two words of data: status bits, and a value,
~ in that order. There's no "docol" pointer or anything of the sort.
~
~   Status is a bit field:
~    bit zero is whether it's been used
~    bit one is whether it's been defined
~    bit two is whether it was used before being defined
~    bit three is whether the guessed value wound up equaling the actual value
~
~   Just as the Evocation dictionary uses the global variable "latest" as a
~ handle (a pointer to a pointer) beginning a linked list of entries, so the
~ label dictionary also uses a handle variable, named "labels". We initialize
~ it here.
8 allocate s" labels" variable 0 labels !

~   This is analogous to word-heading, but prints label information.
~ (entry pointer --)
: label-heading
  dup entry-to-name dup emitstring space
  stringlen 1+ 54 swap - 0 max indent
  entry-to-execution-token dup 8 + @ .hex64 space @ .hex8
  newline ;

~ TODO this should go elsewhere
~ (dictionary handle)
: oldest-entry-in
  dup
  dup { { dup @ } { @ } while } if
  dup 3roll = { drop 0 } if
  ;

~ TODO this should go elsewhere
~ (entry pointer, dictionary handle)
: next-newer-entry-in
  @
  2dup = { 2drop 0 exit } if
  { dup { 2dup @ != } if }
  { @ } while swap drop ;

~   This prints the headings for all the labels that have been created. Note
~ that labels that have been created once stay in the dictionary forever, even
~ if subsequent passes neither use nor define them. That's because the control
~ loop has no way to know if they're still important; it's up to downstream
~ code to decide that. Everything has been carefully set up so that disused
~ labels don't hurt anything.
: list-labels
  labels oldest-entry-in { dup }
  { dup label-heading labels next-newer-entry-in } while drop ;

~   This looks up a label by name if it exists, or returns 0 if it doesn't.
~ The return value if it's found is an entry pointer.
~
~   Keep in mind that this is a pointer to the entry in the label dictionary,
~ not the label's value. It's common for a label's value to itself be a
~ pointer to a dictionary entry, in the program being compiled, so it's easy
~ to confuse that.
~
~ (name string pointer -- label entry pointer)
: find-label labels swap find-in ;

~   This creates a new label given a name for it, initializing its value and
~ status to zero and adding it to the label dictionary. This is responsible
~ for the initial guess of zero on the first pass.
~
~ (name string pointer -- )
: new-label labels create-in 0 , 0 , ;

~   These helpers take a label entry pointer and return pointers to the status
~ and value fields.
: label-status entry-to-execution-token ;
: label-value entry-to-execution-token 8 + ;

~   This looks up a label by name if it exists, or creates it if it doesn't.
~ Either way, it returns an entry pointer. It's named after the function
~ "intern" that many Lisp dialects have.
~
~ (name string pointer -- label entry pointer)
: intern-label
  dup find-label
  dup { swap drop }
      { drop dup new-label labels swap find-in } if-else ;

~   This returns the value of a label, also doing all necessary status checks
~ and updates to keep track of the circumstances under which it was used. The
~ label loop relies on all read accesses going through this word.
~
~ (label entry pointer -- label value)
: use-label
  ~ If it hasn't been defined yet, mark it used-before-set.
  dup label-status dup @ dup 0x02 & not
  { 0x04 | swap ! } { drop drop } if-else

  ~ Mark it used. It's faster to just do this than to check if it's needed.
  dup label-status dup @ 0x01 | swap !

  ~ Return the guessed value.
  label-value @ ;
  ;

~   This overwrites the value of a label, also doing all necessary status
~ checks and jupdates to keep track of the cirumstances under which it was
~ set. The label loop relies on all write accesses going through this word.
~
~ (new label value, label entry pointer --)
: set-label
  dup label-status @
  ~ (new value, entry, initial status)

  ~   Exit, and print a diagnostic, if the label was already set on this pass.
  ~
  ~   Without cooperation from user code, there's no great way to communicate
  ~ up the call stack that execution should stop, so we don't try to.
  dup 0x02 & { drop swap drop
               ." Failed by attempting to redefine "
               entry-to-name emitstring newline exit } if

  ~ We always set the defined bit to true. We leave the other status bits
  ~ as-is, except that we check whether it's unused up to now, and whether
  ~ the previous value equals the new value. If either of those holds, we set
  ~ the guessed-equals-actual bit to true.
  0x02 |
  ~ (new value, entry, updated status)
  3unroll dup 4 unroll label-value @ swap dup 3unroll =
  ~ (entry, updated status, new value, equality)
  3roll dup 4 unroll 0x01 & not
  ~ (entry, updated status, new value, equality, unused)
  || { swap 0x08 | swap } if
  ~ (entry, updated status, new value)
  3roll dup 3unroll
  ~ (updated status, entry, new value, entry)
  label-value ! label-status !
  ;

~   This is a convenience helper which downstream code can use to check how
~ many bytes it has output thus-far.
~ (output memory start, current output point
~  -- output memory start, current output point, offset)
: current-offset 2dup swap - ;

~   This is a concise syntax for referencing the entry pointer of a label,
~ when you know its name statically. It reads a word of text input and calls
~ intern-label on it.
: L'
  word value@
  interpreter-flags @ 0x01 &
  { ' litstring entry-to-execution-token ,
    here @ swap packstring 8 packalign here !
    ' intern-label entry-to-execution-token ,
    dropstring }
  { intern-label dropstring-with-result } if-else
  ; make-immediate

~   This is a concise syntax for calling use-label with a label whose name
~ you know statically. It performs L', then calls use-label.
: L@'
  ' L' entry-to-execution-token execute
  interpreter-flags @ 1 &
    { ' use-label entry-to-execution-token , }
    { use-label } if-else
  ; make-immediate

~   This is a concise syntax for calling set-label with a label whose name
~ you know statically. It performs L', then calls set-label.
: L!'
  ' L' entry-to-execution-token execute
  interpreter-flags @ 1 &
    { ' set-label entry-to-execution-token , }
    { set-label } if-else
  ; make-immediate

~   This is an internal helper that label-loop uses to check if a specific
~ label entry has "converged", based on its status bits. Downstream code won't
~ need to call this directly.
~
~   For a label to have "converged", at least one of the following must be
~ true:
~
~ 1. The label must never have been used (bit zero clear);
~ 2. The label was both used and defined, but not used before it was defined
~    (bits zero and one set; bit two clear);
~ 3. The label was both used and defined, and the guessed value equalled the
~    actual value (bits zero, one, and three set).
~
~ (label entry pointer -- boolean)
: check-label-converged
  label-status @
  dup 0x01 & not swap
  dup 0x03 & 0x03 = swap dup 0x04 & not swap 3unroll && swap
  0x0b & 0x0b =
  || || ;

~   This is an internal helper that label-loop uses to check if the overall
~ label assignments have converged, based on their status bits. Downstream
~ code won't need to call this directly.
~
~   This returns true if and only if check-label-converged returns true for
~ all the labels that have been created.
~
~ (-- boolean)
: check-labels-converged
  1
  labels @ { dup }
  { dup check-label-converged 3roll && swap
    @ } while drop ;

~   This is an internal helper that label-loop calls between passes, to update
~ the status bits and get ready for the next one. Downstream code won't need
~ to call this directly.
~
: reset-labels
  labels @ { dup }
  { dup label-status 0 swap !
    @ } while drop ;

~   This is the top-level word that invokes the entire label system. All the
~ code generation happens inside it.
~
~   The execution token it's passed should have the interface:
~
~     (output start, current output pointer
~      -- output start, current output pointer)
~
~   In general, this is the same interface that code generation words should
~ use to communicate with each other. For example, all the words in elf.e
~ use this interface.
~
~ (execution token -- output start, output length)
: label-loop
  0 swap
  0x1000 allocate dup
  ~ (iteration count, execution token, output start, output point)
  { 3 pick 100 > }
  { 2 pick execute 4 roll 1+ 4 unroll

    check-labels-converged
    { 4 roll drop
      3 roll drop
      current-offset swap drop
      exit } if
    drop dup
    reset-labels } while
  drop drop drop ." Failed after " . ."  iterations." newline ;