1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
|
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ ~~ Code transformation facility ~~
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~ TODO explain what problem this is solving and why
~
~ The label transform operates on code that compiles itself, and ensures
~ that the result of the compilation is suitable to be included in an
~ executable binary. To achieve this, it makes several changes to the
~ semantics of that code. The transform relies on the label facility, and
~ expects to run from within label-loop.
~
~ The most fundamental change is that the label transform separates words
~ that run in compile mode from words that run immediately. There is no
~ distinction made between words running in immediate mode, and words declared
~ as immediate. Immediate words are looked up and executed based on their
~ "real", currently-executing definitions. Compiled words, including
~ literals, are looked up via the label facility.
~
~ Since the label facility is able to resolve forward references, there is
~ no hard requirement that everything in the file be topologically sorted.
~ However, the transform will refuse to create forward references to compiled
~ words. If you want them, you can create them by hand by calling use-label
~ yourself. This restriction is in place because allowing forward references
~ would be a significant difference from un-transformed code that could easily
~ become confusing, and because it simplifies the implementation a bit.
~
~ Compilation words do make extensive reference to the global variables
~ "here" and "latest". In particular, flow-control words such as if-else
~ expect the log to have recent compilation outputs on it, and to be able to
~ mutate them in-place. In order to make this work, we provide temporary
~ values of these two variables which point to the location of the output
~ buffer. This allows pointer resolution to work correctly without additional
~ effort, but notice that the buffer's address will differ from the address
~ the resulting program loads itself at. There's no simple way to avoid this
~ concern, since the variables must point to one of those addresses or the
~ other, not both.
~
~ We resolve the issue by running our own, alternate versions of the words
~ "create", ":", ";", and ";asm" which use the label facility to compute the
~ addresses that will be needed at runtime. These alternates run instead of
~ the normal versions of these words. The code being compiled is responsible
~ for not doing anything else that would rely on "here" and "latest" matching
~ their runtime addresses, though it is otherwise allowed to modify and rely
~ on them in all the usual ways. The alternate versions are defined in this
~ file as their own words, "Lcreate", "L:", "L;", and "L;asm".
~
~ Note that these alternates are applied via a purely lexical
~ transformation: when a word would be looked up in the dictionary to
~ interpret, first check if it's one of these. That means the transformation
~ won't apply to indirect callers of these words, nor to tick-quotes of them.
~ The code being compiled is responsible for not doing either of those things.
~
~ Notably, the transformation uses the same "interpreter-flags" variable as
~ the rest of Evocation. There's no need to keep it separate like there is
~ with the other variables. This makes it easy to change modes.
~
~ The transformation and the alternates rely on various labels, all of which
~ must be defined elsewhere, lest the label loop fail to converge: "lit",
~ "origin", "docol", "exit", ":", ";", and ";asm".
~
~ All of these limitations result in the compiled code being, in effect,
~ written in a dialect which is like Evocation, but more restricted. This is
~ acceptable, because the label transform is intended for compiling code that
~ is an early part of Evocation itself, and the necessary code has all been
~ written to follow these restrictions.
~ TODO all this buffer stuff should be in its own file
~ (buffer size -- buffer address)
: read-to-buffer
dup allocate dup dup
~ (buffer size, buffer address, word start, output point)
{ key
~ Exit if it's a zero byte.
dup not {
~ Make sure to pack the zero to serve as a null terminator.
pack8
drop drop swap drop exit } if
dup is-space
{ ~ (buffer size, buffer address, word start, output point, key)
~ Tuck the key out of the way until we've done some stuff.
3unroll
~ If it's a space character, first check if we just consumed the magic
~ word...
2dup swap - 8 = dup {
drop
~ Add a null terminator so we can use stringcmp
dup 0 swap !
~ Check for the magic word
over s" pyrzqxgl" stringcmp 0 =
} if
{ ~ It's magic, so exit.
~ Make sure to pack a zero to serve as a null terminator.
0 pack8
drop drop drop swap drop exit }
{ ~ It's not magic, so reset the word start. Of course whitespace is
~ not a word but this will help us keep track of things.
3roll pack8
swap drop dup } if-else }
{ ~ (buffer size, buffer address, word start, output point, key)
~ Tuck the key out of the way again.
3unroll
~ Check if the word just started and the previous character is space.
2dup = dup { drop dup @ is-space } if
{ ~ If so, this is the actual first character of the word.
drop swap pack8 dup }
{ ~ If not, leave the word start alone.
3roll pack8 } if-else } if-else } forever ;
~ In logical terms, this modifies an input buffer metadata structure
~ in-place to push a new, zeroed one into the start of the linked list formed
~ through the next-source field.
~
~ In physical terms, it works by allocating a new structure, copying the
~ fields of the existing one into it, and zeroing the existing one. That's
~ necessary because otherwise we'd need a mutable handle (a pointer to a
~ pointer) to update the start of the list, and there's no way to do that with
~ the main-input-buffer variable working the way it presently does.
~
~ (input buffer metadata pointer --)
: push-input-buffer
allocate-input-buffer-metadata
~ (original metadata pointer, new metadata pointer)
2dup swap 6 8 * memcopy
~ (original metadata pointer, new metadata pointer)
swap dup zero-input-buffer-metadata
input-buffer-next-source ! ;
~ This does the inverse of push-input-buffer. In the event that the
~ next-source field is null, it zeroes the buffer.
~
~ Note, however, that it doesn't deallocate the memory, because that's not
~ how memory allocation on the log works. If necessary, it can be deallocated
~ with "forget", though as usual that requires careful planning.
~
~ (input buffer metadata pointer --)
: pop-input-buffer
dup input-buffer-next-source @
~ (original metadata pointer, next source metadata pointer)
dup { 6 8 * memcopy }
{ drop zero-input-buffer-metadata } if-else ;
: transform-state-saved-here ;
: transform-state-saved-latest 8 + ;
: transform-state-output-buffer-start 2 8 * + ;
: allocate-transform-state
3 8 * allocate
dup transform-state-saved-here 0 swap !
dup transform-state-saved-latest 0 swap !
dup transform-state-output-buffer-start 0 swap ! ;
allocate-transform-state s" transform-state" variable
~ When calling the label facility during a transformation, it's necessary
~ to use the real, non-wrapped "heap" and "latest".
: swap-transform-variables
here @ transform-state transform-state-saved-here @
here ! transform-state transform-state-saved-here !
latest @ transform-state transform-state-saved-latest @
latest ! transform-state transform-state-saved-latest ! ;
~ (address within the output buffer -- address at generated binary's runtime)
: transform-offset
~ Don't transform null pointers.
dup { transform-state transform-state-output-buffer-start @ -
swap-transform-variables L@' origin swap-transform-variables
+ } if ;
~ This is the alternate version of "create" for use with the label
~ transform. Its code is the same as the regular "create" except as noted
~ below. It is likely to be extremely useful to read and understand "create"
~ in interpret.e before attempting to understand "Lcreate".
: Lcreate
dup stringlen 1 + dup 3unroll
here @ 10 + 3unroll memmove
here @
~ This value of "latest" is going into the generated output, so call
~ transform-offset on it first.
latest @ transform-offset pack64
0 pack8
0 pack8
+
8 packalign
here @ latest !
~ Now we're immediately after the word header, which is where the codeword
~ will be. This is the value the label should taken on, so we set it.
dup here @ 10 +
swap-transform-variables
intern-label set-label
swap-transform-variables
here ! ;
~ This is the alternate version of ":" for use with the label transform. Its
~ code is the same as the regular "create" except as noted below. It is likely
~ to be extremely useful to read and understand ":" in interpret.e before
~ attempting to understand "L:".
: L:
~ This calls "Lcreate" instead of "create".
word value@ Lcreate dropstring
~ This looks up "docol" by label.
swap-transform-variables
L@' docol
L@' origin
swap-transform-variables
+ ,
latest @ hide-entry ] ;
~ This is the alternate version of ";" for use with the label transform. Its
~ code is the same as the regular "create" except as noted below. It is likely
~ to be extremely useful to read and understand ";" in interpret.e before
~ attempting to understand "L;".
: L;
~ This looks up "exit" by label.
swap-transform-variables
L@' exit L@' origin
swap-transform-variables
+ ,
latest @ unhide-entry
~ Since [ is an immediate word, we have to go to extra trouble to compile
~ it as part of ;.
[ ' [ entry-to-execution-token , ]
; make-immediate
~ This is the alternate version of ";asm" for use with the label transform.
~ Its code is the same as the regular "create" except as noted below. It is
~ likely to be extremely useful to read and understand ";asm" in interpret.e
~ before attempting to understand "L;asm".
: L;asm
here @ pack-next 8 packalign here !
latest @ dup unhide-entry entry-to-execution-token dup 8 + swap !
~ Since [ is an immediate word, we have to go to extra trouble to compile
~ it as part of ;asm.
[ ' [ entry-to-execution-token , ]
; make-immediate
~ This implements the label transform for a single word. It is directly
~ analogous to "interpret", and reading interpret.e may help in understanding
~ it, though it's meant to still make sense on its own.
~
~ It expects to be called from "transform", below, which loops.
~
~ (-- done)
: transform-one
word
~ If no word was returned, exit.
dup 0 = { drop 0 exit } if
~ The string is on the top of the stack, so to get a pointer to it we get
~ the stack address.
~ (string)
value@
~ If it's the magic word, end the transformation.
dup s" pyrzqxgl" stringcmp 0 = { drop dropstring 1 exit } if
~ Check whether it's one of the words we have alternates for, and look up
~ the alternate if so.
dup 0 swap
~ (name as stack string, name pointer, placeholder, name pointer)
dup s" create" stringcmp 0 = { swap drop ' Lcreate swap } if
dup s" :" stringcmp 0 = { swap drop ' L: swap } if
dup s" ;" stringcmp 0 = { swap drop ' L; swap } if
dup s" ;asm" stringcmp 0 = { swap drop ' L;asm swap } if
drop swap
~ (name as stack string, 0 or alternate entry pointer, name pointer)
~ If an alternate was found, the alternate will be used in immediate mode.
~ If not, we look up the word in the regular, non-transformed dictionary
~ and use that for immediate mode.
over { dup
transform-state transform-state-saved-latest @ swap find-in
3roll drop swap } unless
~ (name as stack string, immediate entry pointer, name pointer)
~ In regular "interpret", we would check whether we found the word before
~ checking the mode. However, we have three different places words could
~ come from, so that's not a simple notion. So, we check the mode first.
interpreter-flags @ 0x01 & {
~ If we're in compile mode, there's still a chance it's an immediate
~ word. First check whether we have an immediate entry, then if so, check
~ that entry's flags. Notice that this means the generated code can't
~ override an immediate word with a non-immediate word of the same name.
over dup { entry-flags@ 0x01 & not } if
{
~ Either there was no immediate entry, or the immediate entry wasn't
~ flagged as an immediate word. So we check whether this could be a
~ compilation.
~
~ To do this, we need to look the word up in the output buffer. We
~ can't easily traverse the next-entry pointers in the output buffer's
~ dictionary, so we check the label. Since we don't know the word's name
~ statically, this is a rare scenario where we can't use the abbreviated
~ label syntax, but that's easy enough.
~
~ Even though we've ruled out the possibility that the word is only
~ ever used immediately, it is still possible that there's some reason
~ the word doesn't exist. In particular, it could be an integer literal.
~ If we were to call use-label first, that would count as a requirement
~ that the label must eventually be set. We don't want to require that
~ quite yet, so we call find-label.
~
~ This check is the means by which forward references are disallowed:
~ On the very first pass, a forward-referenced label won't exist yet, so
~ transform will give a "no such word" error, which in an ideal world
~ would prevent there from being a subsequent pass, but at the very
~ least it will ensure the output isn't a valid ELF.
dup
swap-transform-variables
find-label
swap-transform-variables
{
~ It exists, so we declare our use of it (that's also the only way to
~ get a value for it).
swap-transform-variables
intern-label use-label
swap-transform-variables
~ Labels point to codewords (because that's what "Lcreate" does),
~ which is already what we want to output.
~
~ An important caveat: Though it would require something weird to be
~ happening, such as a forced forward reference, the label may be zero!
~ We need to allow for that possibility by not examining the contents of
~ a nonexistent entry.
~
~ Fortunately we don't have to look at it, just append it to the heap
~ and clean up.
drop , dropstring 0 exit
} if
} if
} if
~ (name as stack string, immediate entry pointer, name pointer)
~ If we got here, one of three things is true: We're in interpret mode;
~ the word is immediate; or no word was found. If the immediate entry
~ pointer is non-zero, run it.
over {
drop dropstring-with-result entry-to-execution-token execute
0 exit
} if
~ If we're still here, it wasn't in the dictionary. Also, we don't need
~ the immediate entry pointer, either.
drop drop
~ (name as stack string)
~ If it's not in the dictionary, check whether it's an integer literal. As
~ before, we get the stack address and use it as a string pointer.
value@ read-integer 0 = {
~ It's a number.
interpreter-flags @ 0x01 & {
~ We're in compile mode; append first "lit", then the number, to the
~ heap. The version of "lit" we use is the one that's current when we
~ ourselves are compiled, hardcoded; doing a dynamic lookup would
~ require dealing with what happens if it's not found.
~ TODO this is wrong
dropstring-with-result
~ We look up "lit" as a label.
swap-transform-variables L@' lit swap-transform-variables
transform-offset
, ,
0 exit
} if
~ We're in interpret mode; push the number to the stack. Or at least, that's
~ what the code we're interpreting will see. Really it's already on the
~ stack, just clean everything else up and leave it there.
dropstring-with-result
0 exit
} if
~ If it's neither in the dictionary nor a number, just print an error.
s" No such word: " emitstring value@ emitstring dropstring 0 ;
~ This implements the label transform for all words in a region given as an
~ input string. It is directly analogous to "quit", in interpret.e, but is far
~ more complex.
~
~ (output buffer start, output point, input string pointer
~ -- output buffer start, output point)
: transform
main-input-buffer dup push-input-buffer
~ TODO the arguments for this seem to be backwards from the documentation
swap attach-string-to-input-buffer
~ Save the old values of "here" and "latest", and set the initial values
~ of the internal ones. These values need to persist across iterations,
~ since client code will make its own updates to them and then rely on those
~ updates having taken effect. So we do the swap just once, here outside the
~ loop, and set it back when the loop ends.
here @ transform-state transform-state-saved-here !
latest @ transform-state transform-state-saved-latest !
over transform-state transform-state-output-buffer-start !
here !
0 latest !
~ Now the stack has nothing of ours on it, so client code can do its thing.
~ It's important that the stack has nothing of ours on it that persists
~ across iterations, so that client code can add and remove stuff there as
~ it sees fit.
{ transform-one
~ (done)
~ When the loop is done, get the real values of "here" and "latest"
~ back. The internal "here" is also the output point, and will become our
~ return value. The internal "latest" is discarded.
{ here @
transform-state transform-state-saved-here @ here !
transform-state transform-state-saved-latest @ latest !
~ (output point)
~ Though we don't actually use transform-state outside of this
~ invocation, for tidiness we zero it out.
0 transform-state transform-state-saved-here !
0 transform-state transform-state-saved-latest !
0 transform-state transform-state-output-buffer-start !
~ Also put the input source back how it was.
main-input-buffer pop-input-buffer
exit } if } forever ;
|