log-load.e


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178

~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ ~~ Bootstrapping the log ~~
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~   The log is the main region of memory within which most dynamic allocation
~ happens. It's a single contiguous segment of virtual memory, which is
~ requested from the kernel when Evocation starts up. Almost all of
~ Evocation's dynamic data is kept in the log, including the main dictionary;
~ several important global variables which make it possible to find and
~ allocate other data structures; and the control stack.
~
~   This file has the task of providing words which are useful for working
~ with the log, and more specifically which are useful for helping to bring
~ the log into existence. Once the log exists, it can be used to manage
~ itself, but there's a bootstrapping challenge in getting there. That
~ challenge is solved by the warm-start routine in execution.e, which relies
~ on the words in this file and should load after it.
~
~   Some modern Forths, including Jonesforth, refer to the log as the heap.
~ This is a misnomer; a heap is a data structure that allows non-contiguous
~ allocation. Although there are Forths that have true heaps, Evocation is not
~ one of them. Space in the log is allocated by incrementing the "here"
~ variable (one of those important globals), which necessarily can only
~ allocate contiguous blocks; there is no way to compact allocations to
~ reclaim fragmented, unused space in between them. Evocation does allow
~ deallocation using "forget", but this is done by resetting "here" and
~ "latest" to older values, unwinding every allocation that's been done since
~ the point in time they return to.
~
~   It would be a mistake to confuse this allocation strategy with the
~ more-general facilities for allocation, reallocation, and deallocation of
~ individual memory blocks that many other languages have. To avoid confusion,
~ we stay away from the name "heap", though it may still occasionally be used
~ colloquially because it's familiar from other Forths, and because most
~ programming languages have a heap as the main memory segment they request
~ from the kernel.
~
~   In the strictest technical sense, the log is a stack: Things are added
~ to the end of it, and removed from that same end. However, Evocation already
~ has two other stacks, the control and value stacks. Adding to the potential
~ confusion, the control stack is actually stored inside the log (as a
~ fixed-size chunk at the bottom). However, the log isn't really that much
~ like a stack when you look at how it's actually used. Unlike Evocation's
~ control and value stacks, data structures on the log tend to be rich and
~ complex, interlinked in various ways through the use of pointers. They also
~ tend to be long-lived, with the log tending to grow over time, whereas the
~ control and value stacks tend to remain roughly the same size through cycles
~ of growth and shrinking. In order to be able to speak precisely about what
~ we're doing, we introduce the name "log" to refer to the entire memory
~ segment and everything stored within it.
~
~   Another linguistic choice we make is to be clear about dictionaries. A
~ dictionary is a linked list of word entries. Each dictionary has a specific
~ handle, a pointer to a pointer, which is the root of the list. Each
~ word entry begins with a specific data structure, which among other things
~ includes a next-entry pointer, a flags byte, and a string that serves as
~ the entry's name. Older entries in a dictionary seldom change; newer entries
~ are added at the beginning of it, with their next-entry pointers leading to
~ the older entries. It is possible for several dictionaries to exist at once,
~ each with its own dictionary handle.
~
~   Since dictionaries are managed using pointers to individual entries, there
~ is no specific requirement about the order in which those entries occur in
~ memory or where they are allocated, but usually a new entry is allocated at
~ the end of the log, by incrementing the variable "here", in the same manner
~ as any other allocation. There is one particular dictionary, the main
~ dictionary, whose handle is the variable "latest". The main dictionary holds
~ every executable word that can be used normally via Evocation's interpreter.
~
~   Since the main dictionary is by far the most important thing in the log,
~ it can be tempting to conflate the log with the main dictionary. This is
~ accurate enough for some purposes, but note that other dictionaries are
~ often interleaved with it, their allocations entwining like grape vines even
~ while each remains separate, reachable only via its own root. See the
~ machine label facility, in labels.e, for an example of how a secondary
~ dictionary can be useful.
~
~   This may feel tangential, but it's important background and there's no
~ better place to explain it: A handle is a pointer to a pointer. The variable
~ "latest" returns a handle, a fixed address which always holds the pointer to
~ the root entry of the main dictionary. Dereferencing that handle gives you
~ the dictionary pointer, the address of the root entry, which is suitable to
~ pass to find-in and similar words that read the dictionary's contents. When
~ you want to add a new entry to a dictionary, you need the dictionary's
~ handle, so that the root pointer can be changed. When you only want to write
~ it, you only need the regular single pointer.
~
~   When reading the documentation of words that work with dictionaries, pay
~ close attention to whether their parameters include a dictionary handle, or
~ a dictionary pointer.
~
~   The term "handle" was widely known in the early days of microcomputing,
~ when memory-safe languages without direct pointer access were less common.
~ Today it is usually considered specific to systems programming, the type of
~ programming which lies beneath other software and deals with topics such as
~ memory management and processes. Evocation is a systems-programming
~ language, in the sense that it takes pains to not introduce mandatory
~ abstractions which would make it difficult or inefficient to work directly
~ with these topics. So, in understanding Evocation, it's important to know
~ about handles.


~   Find-in is the main word that provides the capability to look up words by
~ name, though it's usually used via "find" rather than being called directly.
~
~   Find-in traverses the linked list formed by a particular dictionary's
~ next-entry pointers, looking for an entry that matches a given name. The
~ dictionary pointer is the pointer (not handle) to the root of the list,
~ which runs from newest to oldest. For example, dereferencing the value of
~ "latest" gives the pointer to the main dictionary, which can be passed to
~ find-in.
~
~   Having find-in separated out is convenient when working with alternate
~ dictionaries, but the main reason for having it is not convenience but
~ necessity: During Evocation's startup, there is a period before global
~ variables are easily accessible, so there would be no way to implement
~ "find". The warm-start routine (see execution.e and transform.e) has the
~ job of fixing that, and it makes extensive use of find-in to do so.
~
~ (dictionary pointer, string pointer -- entry pointer or 0)
: find-in
  ~ It will be more convenient to have the entry pointer on top.
  swap

  {
    ~ If the entry pointer is null, exit.
    ~ (name pointer to find, current entry pointer)
    dup 0 = { swap drop exit } if

    ~ Check this entry's "hidden" flag.
    ~ (name pointer to find, current entry pointer)
    dup entry-flags@ 0x80 & 0x80 != {
      ~ Test whether this entry is a match.
      ~ (name pointer to find, current entry pointer)
      2dup 10 + stringcmp 0 = {
        ~ If we're here, it's a match. Clean up our working state and exit.
        ~ (name pointer to find, current entry pointer)
        swap drop exit
      } if
    } if

    ~ If we're here, it's not a match; traverse the pointer and repeat.
    ~ (name pointer to find, current entry pointer)
    @
  } forever ;


~   This has the same value as the constant control-stack-size, which is
~ defined in execution.e. Everything will break if it doesn't.
~
~ TODO: remove one of them. Probably the other one.
: log-offset                        0x10000 ; ~ 64 KiB

~ (log address -- log address, "latest" pointer)
: log-load-latest
  dup log-offset + 3 8 * + ;
~ (log address -- log address, "latest" pointer)
: log-load-here
  dup log-offset + 4 8 * + ;


~   This is a helper used by warm-start, which invokes find-in using "latest".
~ It relies on being passed the root address of the log, which is used to find
~ the global variable "latest". It's inconvenient to keep a log pointer around
~ all the time, which is why we stop doing it as soon as possible, but during
~ Evocation's startup there's no alternative. This word is used extensively
~ by code that's been compiled via the log-load transform; see transform.e for
~ details.
~
~   It would be possible to unload this word after the log is created, but
~ there are rare situations in which it's still useful, such as injecting
~ Evocation into another process's address space. Plus, it's small. So, we
~ keep it around.
~
~ (log address, string pointer -- log address, entry pointer or 0)
: log-load-find
  swap log-load-latest @ swap 3unroll swap find-in ;