1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
|
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ ~~ Bootstrapping the log ~~
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~ The log is the main region of memory within which most dynamic allocation
~ happens. It's a single contiguous segment of virtual memory, which is
~ requested from the kernel when Evocation starts up. Almost all of
~ Evocation's dynamic data is kept in the log, including the main dictionary;
~ several important global variables which make it possible to find and
~ allocate other data structures; and the control stack.
~
~ This file has the task of providing words which are useful for working
~ with the log, and more specifically which are useful for helping to bring
~ the log into existence. Once the log exists, it can be used to manage
~ itself, but there's a bootstrapping challenge in getting there. That
~ challenge is solved by the warm-start routine in execution.e, which relies
~ on the words in this file and should load after it.
~
~ Some modern Forths, including Jonesforth, refer to the log as the heap.
~ This is a misnomer; a heap is a data structure that allows non-contiguous
~ allocation. Although there are Forths that have true heaps, Evocation is not
~ one of them. Space in the log is allocated by incrementing the "here"
~ variable (one of those important globals), which necessarily can only
~ allocate contiguous blocks; there is no way to compact allocations to
~ reclaim fragmented, unused space in between them. Evocation does allow
~ deallocation using "forget", but this is done by resetting "here" and
~ "latest" to older values, unwinding every allocation that's been done since
~ the point in time they return to.
~
~ It would be a mistake to confuse this allocation strategy with the
~ more-general facilities for allocation, reallocation, and deallocation of
~ individual memory blocks that many other languages have. To avoid confusion,
~ we stay away from the name "heap", though it may still occasionally be used
~ colloquially because it's familiar from other Forths, and because most
~ programming languages have a heap as the main memory segment they request
~ from the kernel.
~
~ In the strictest technical sense, the log is a stack: Things are added
~ to the end of it, and removed from that same end. However, Evocation already
~ has two other stacks, the control and value stacks. Adding to the potential
~ confusion, the control stack is actually stored inside the log (as a
~ fixed-size chunk at the bottom). However, the log isn't really that much
~ like a stack when you look at how it's actually used. Unlike Evocation's
~ control and value stacks, data structures on the log tend to be rich and
~ complex, interlinked in various ways through the use of pointers. They also
~ tend to be long-lived, with the log tending to grow over time, whereas the
~ control and value stacks tend to remain roughly the same size through cycles
~ of growth and shrinking. In order to be able to speak precisely about what
~ we're doing, we introduce the name "log" to refer to the entire memory
~ segment and everything stored within it.
~
~ Another linguistic choice we make is to be clear about dictionaries. A
~ dictionary is a linked list of word entries. Each dictionary has a specific
~ handle, a pointer to a pointer, which is the root of the list. Each
~ word entry begins with a specific data structure, which among other things
~ includes a next-entry pointer, a flags byte, and a string that serves as
~ the entry's name. Older entries in a dictionary seldom change; newer entries
~ are added at the beginning of it, with their next-entry pointers leading to
~ the older entries. It is possible for several dictionaries to exist at once,
~ each with its own dictionary handle.
~
~ Since dictionaries are managed using pointers to individual entries, there
~ is no specific requirement about the order in which those entries occur in
~ memory or where they are allocated, but usually a new entry is allocated at
~ the end of the log, by incrementing the variable "here", in the same manner
~ as any other allocation. There is one particular dictionary, the main
~ dictionary, whose handle is the variable "latest". The main dictionary holds
~ every executable word that can be used normally via Evocation's interpreter.
~
~ Since the main dictionary is by far the most important thing in the log,
~ it can be tempting to conflate the log with the main dictionary. This is
~ accurate enough for some purposes, but note that other dictionaries are
~ often interleaved with it, their allocations entwining like grape vines even
~ while each remains separate, reachable only via its own root. See the
~ machine label facility, in labels.e, for an example of how a secondary
~ dictionary can be useful.
~
~ This may feel tangential, but it's important background and there's no
~ better place to explain it: A handle is a pointer to a pointer. The variable
~ "latest" returns a handle, a fixed address which always holds the pointer to
~ the root entry of the main dictionary. Dereferencing that handle gives you
~ the dictionary pointer, the address of the root entry, which is suitable to
~ pass to find-in and similar words that read the dictionary's contents. When
~ you want to add a new entry to a dictionary, you need the dictionary's
~ handle, so that the root pointer can be changed. When you only want to write
~ it, you only need the regular single pointer.
~
~ When reading the documentation of words that work with dictionaries, pay
~ close attention to whether their parameters include a dictionary handle, or
~ a dictionary pointer.
~
~ The term "handle" was widely known in the early days of microcomputing,
~ when memory-safe languages without direct pointer access were less common.
~ Today it is usually considered specific to systems programming, the type of
~ programming which lies beneath other software and deals with topics such as
~ memory management and processes. Evocation is a systems-programming
~ language, in the sense that it takes pains to not introduce mandatory
~ abstractions which would make it difficult or inefficient to work directly
~ with these topics. So, in understanding Evocation, it's important to know
~ about handles.
~ Find-in is the main word that provides the capability to look up words by
~ name, though it's usually used via "find" rather than being called directly.
~
~ Find-in traverses the linked list formed by a particular dictionary's
~ next-entry pointers, looking for an entry that matches a given name. The
~ dictionary pointer is the pointer (not handle) to the root of the list,
~ which runs from newest to oldest. For example, dereferencing the value of
~ "latest" gives the pointer to the main dictionary, which can be passed to
~ find-in.
~
~ Having find-in separated out is convenient when working with alternate
~ dictionaries, but the main reason for having it is not convenience but
~ necessity: During Evocation's startup, there is a period before global
~ variables are easily accessible, so there would be no way to implement
~ "find". The warm-start routine (see execution.e and transform.e) has the
~ job of fixing that, and it makes extensive use of find-in to do so.
~
~ (dictionary pointer, string pointer -- entry pointer or 0)
: find-in
~ It will be more convenient to have the entry pointer on top.
swap
{
~ If the entry pointer is null, exit.
~ (name pointer to find, current entry pointer)
dup 0 = { swap drop exit } if
~ Check this entry's "hidden" flag.
~ (name pointer to find, current entry pointer)
dup entry-flags@ 0x80 & 0x80 != {
~ Test whether this entry is a match.
~ (name pointer to find, current entry pointer)
2dup 10 + stringcmp 0 = {
~ If we're here, it's a match. Clean up our working state and exit.
~ (name pointer to find, current entry pointer)
swap drop exit
} if
} if
~ If we're here, it's not a match; traverse the pointer and repeat.
~ (name pointer to find, current entry pointer)
@
} forever ;
~ This has the same value as the constant control-stack-size, which is
~ defined in execution.e. Everything will break if it doesn't.
~
~ TODO: remove one of them. Probably the other one.
: log-offset 0x10000 ; ~ 64 KiB
~ (log address -- log address, "latest" pointer)
: log-load-latest
dup log-offset + 3 8 * + ;
~ (log address -- log address, "latest" pointer)
: log-load-here
dup log-offset + 4 8 * + ;
~ This is a helper used by warm-start, which invokes find-in using "latest".
~ It relies on being passed the root address of the log, which is used to find
~ the global variable "latest". It's inconvenient to keep a log pointer around
~ all the time, which is why we stop doing it as soon as possible, but during
~ Evocation's startup there's no alternative. This word is used extensively
~ by code that's been compiled via the log-load transform; see transform.e for
~ details.
~
~ It would be possible to unload this word after the log is created, but
~ there are rare situations in which it's still useful, such as injecting
~ Evocation into another process's address space. Plus, it's small. So, we
~ keep it around.
~
~ (log address, string pointer -- log address, entry pointer or 0)
: log-load-find
swap log-load-latest @ swap 3unroll swap find-in ;
~ In the code generated by the log-load transform, it's convenient to have
~ only a single step needed to look up a word's execution token. This helper
~ does log-load-find, then gets the execution token if an entry is found.
~
~ (log address, string pointer -- log address, execution token or 0)
: log-load-find-execution-token
log-load-find dup { entry-to-execution-token } if ;
~ This is the same as "create", from interpret.e, except that it takes the
~ log's address as a parameter rather than hardcoding it, so that it can be
~ used in situations where the normal compilation process isn't yet available.
~
~ The requisite stack juggling is kind of finicky, sorry if it's hard to
~ read, but it's doing the same steps in the same order as the regular
~ "create".
~
~ (log address, string pointer -- log address)
: log-load-create
dup stringlen 1 + dup 3unroll
~ (log address, name field length, string pointer, name field length)
3 pick log-load-here swap drop @ 10 + 3unroll memmove
~ (log address, name field length)
over log-load-here swap drop @
~ (log address, name field length, output point)
2 pick log-load-latest swap drop @ pack64
~ (log address, name field length, output point)
0 pack8
0 pack8
+
~ (log address, output point)
8 packalign
~ (log address, output point)
over log-load-here swap drop @
~ (log address, output point, old here value)
2 pick log-load-latest swap drop !
~ (log address, output point)
over log-load-here swap drop ! ;
~ This is the same as ",", from interpret.e, except that it takes the log's
~ address as a parameter rather than hardcoding it, so that it can be used in
~ situations where the normal compilation process isn't yet available.
~
~ Again, the stack juggling is kind of a lot, sorry about that.
~
~ (log address, value -- log address)
: log-load-comma
swap log-load-here swap 3unroll
~ (log address, value, here)
@ swap pack64
~ (log address, updated here value)
3roll log-load-here swap 3unroll
~ (log address, updated here value, here)
! ;
|