diff options
| author | Irene Knapp <ireneista@irenes.space> | 2026-05-16 13:52:40 -0700 |
|---|---|---|
| committer | Irene Knapp <ireneista@irenes.space> | 2026-05-16 13:52:40 -0700 |
| commit | f4112a05de8bf4c69a7abb9817c7ca70be9f7fb5 (patch) | |
| tree | 2393e83159efb73b71792d28e726dd30a0add2cf /log-load.e | |
| parent | 32f1fde313ce07e086b503b08babffe60a7be05d (diff) | |
add a stub for the log-load transform, and a ton of documentation
Force-Push: yes Change-Id: Ia1fe0e6aefaf6776bd69bca4a26ee0df0b555832
Diffstat (limited to 'log-load.e')
| -rw-r--r-- | log-load.e | 178 |
1 files changed, 178 insertions, 0 deletions
diff --git a/log-load.e b/log-load.e new file mode 100644 index 0000000..029b9da --- /dev/null +++ b/log-load.e @@ -0,0 +1,178 @@ +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ ~~ Bootstrapping the log ~~ +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ +~ The log is the main region of memory within which most dynamic allocation +~ happens. It's a single contiguous segment of virtual memory, which is +~ requested from the kernel when Evocation starts up. Almost all of +~ Evocation's dynamic data is kept in the log, including the main dictionary; +~ several important global variables which make it possible to find and +~ allocate other data structures; and the control stack. +~ +~ This file has the task of providing words which are useful for working +~ with the log, and more specifically which are useful for helping to bring +~ the log into existence. Once the log exists, it can be used to manage +~ itself, but there's a bootstrapping challenge in getting there. That +~ challenge is solved by the warm-start routine in execution.e, which relies +~ on the words in this file and should load after it. +~ +~ Some modern Forths, including Jonesforth, refer to the log as the heap. +~ This is a misnomer; a heap is a data structure that allows non-contiguous +~ allocation. Although there are Forths that have true heaps, Evocation is not +~ one of them. Space in the log is allocated by incrementing the "here" +~ variable (one of those important globals), which necessarily can only +~ allocate contiguous blocks; there is no way to compact allocations to +~ reclaim fragmented, unused space in between them. Evocation does allow +~ deallocation using "forget", but this is done by resetting "here" and +~ "latest" to older values, unwinding every allocation that's been done since +~ the point in time they return to. +~ +~ It would be a mistake to confuse this allocation strategy with the +~ more-general facilities for allocation, reallocation, and deallocation of +~ individual memory blocks that many other languages have. To avoid confusion, +~ we stay away from the name "heap", though it may still occasionally be used +~ colloquially because it's familiar from other Forths, and because most +~ programming languages have a heap as the main memory segment they request +~ from the kernel. +~ +~ In the strictest technical sense, the log is a stack: Things are added +~ to the end of it, and removed from that same end. However, Evocation already +~ has two other stacks, the control and value stacks. Adding to the potential +~ confusion, the control stack is actually stored inside the log (as a +~ fixed-size chunk at the bottom). However, the log isn't really that much +~ like a stack when you look at how it's actually used. Unlike Evocation's +~ control and value stacks, data structures on the log tend to be rich and +~ complex, interlinked in various ways through the use of pointers. They also +~ tend to be long-lived, with the log tending to grow over time, whereas the +~ control and value stacks tend to remain roughly the same size through cycles +~ of growth and shrinking. In order to be able to speak precisely about what +~ we're doing, we introduce the name "log" to refer to the entire memory +~ segment and everything stored within it. +~ +~ Another linguistic choice we make is to be clear about dictionaries. A +~ dictionary is a linked list of word entries. Each dictionary has a specific +~ handle, a pointer to a pointer, which is the root of the list. Each +~ word entry begins with a specific data structure, which among other things +~ includes a next-entry pointer, a flags byte, and a string that serves as +~ the entry's name. Older entries in a dictionary seldom change; newer entries +~ are added at the beginning of it, with their next-entry pointers leading to +~ the older entries. It is possible for several dictionaries to exist at once, +~ each with its own dictionary handle. +~ +~ Since dictionaries are managed using pointers to individual entries, there +~ is no specific requirement about the order in which those entries occur in +~ memory or where they are allocated, but usually a new entry is allocated at +~ the end of the log, by incrementing the variable "here", in the same manner +~ as any other allocation. There is one particular dictionary, the main +~ dictionary, whose handle is the variable "latest". The main dictionary holds +~ every executable word that can be used normally via Evocation's interpreter. +~ +~ Since the main dictionary is by far the most important thing in the log, +~ it can be tempting to conflate the log with the main dictionary. This is +~ accurate enough for some purposes, but note that other dictionaries are +~ often interleaved with it, their allocations entwining like grape vines even +~ while each remains separate, reachable only via its own root. See the +~ machine label facility, in labels.e, for an example of how a secondary +~ dictionary can be useful. +~ +~ This may feel tangential, but it's important background and there's no +~ better place to explain it: A handle is a pointer to a pointer. The variable +~ "latest" returns a handle, a fixed address which always holds the pointer to +~ the root entry of the main dictionary. Dereferencing that handle gives you +~ the dictionary pointer, the address of the root entry, which is suitable to +~ pass to find-in and similar words that read the dictionary's contents. When +~ you want to add a new entry to a dictionary, you need the dictionary's +~ handle, so that the root pointer can be changed. When you only want to write +~ it, you only need the regular single pointer. +~ +~ When reading the documentation of words that work with dictionaries, pay +~ close attention to whether their parameters include a dictionary handle, or +~ a dictionary pointer. +~ +~ The term "handle" was widely known in the early days of microcomputing, +~ when memory-safe languages without direct pointer access were less common. +~ Today it is usually considered specific to systems programming, the type of +~ programming which lies beneath other software and deals with topics such as +~ memory management and processes. Evocation is a systems-programming +~ language, in the sense that it takes pains to not introduce mandatory +~ abstractions which would make it difficult or inefficient to work directly +~ with these topics. So, in understanding Evocation, it's important to know +~ about handles. + + +~ Find-in is the main word that provides the capability to look up words by +~ name, though it's usually used via "find" rather than being called directly. +~ +~ Find-in traverses the linked list formed by a particular dictionary's +~ next-entry pointers, looking for an entry that matches a given name. The +~ dictionary pointer is the pointer (not handle) to the root of the list, +~ which runs from newest to oldest. For example, dereferencing the value of +~ "latest" gives the pointer to the main dictionary, which can be passed to +~ find-in. +~ +~ Having find-in separated out is convenient when working with alternate +~ dictionaries, but the main reason for having it is not convenience but +~ necessity: During Evocation's startup, there is a period before global +~ variables are easily accessible, so there would be no way to implement +~ "find". The warm-start routine (see execution.e and transform.e) has the +~ job of fixing that, and it makes extensive use of find-in to do so. +~ +~ (dictionary pointer, string pointer -- entry pointer or 0) +: find-in + ~ It will be more convenient to have the entry pointer on top. + swap + + { + ~ If the entry pointer is null, exit. + ~ (name pointer to find, current entry pointer) + dup 0 = { swap drop exit } if + + ~ Check this entry's "hidden" flag. + ~ (name pointer to find, current entry pointer) + dup entry-flags@ 0x80 & 0x80 != { + ~ Test whether this entry is a match. + ~ (name pointer to find, current entry pointer) + 2dup 10 + stringcmp 0 = { + ~ If we're here, it's a match. Clean up our working state and exit. + ~ (name pointer to find, current entry pointer) + swap drop exit + } if + } if + + ~ If we're here, it's not a match; traverse the pointer and repeat. + ~ (name pointer to find, current entry pointer) + @ + } forever ; + + +~ This has the same value as the constant control-stack-size, which is +~ defined in execution.e. Everything will break if it doesn't. +~ +~ TODO: remove one of them. Probably the other one. +: log-offset 0x10000 ; ~ 64 KiB + +~ (log address -- log address, "latest" pointer) +: log-load-latest + dup log-offset + 3 8 * + ; +~ (log address -- log address, "latest" pointer) +: log-load-here + dup log-offset + 4 8 * + ; + + +~ This is a helper used by warm-start, which invokes find-in using "latest". +~ It relies on being passed the root address of the log, which is used to find +~ the global variable "latest". It's inconvenient to keep a log pointer around +~ all the time, which is why we stop doing it as soon as possible, but during +~ Evocation's startup there's no alternative. This word is used extensively +~ by code that's been compiled via the log-load transform; see transform.e for +~ details. +~ +~ It would be possible to unload this word after the log is created, but +~ there are rare situations in which it's still useful, such as injecting +~ Evocation into another process's address space. Plus, it's small. So, we +~ keep it around. +~ +~ (log address, string pointer -- log address, entry pointer or 0) +: log-load-find + swap log-load-latest @ swap 3unroll swap find-in ; + |