summary refs log tree commit diff
path: root/log-load.e
diff options
context:
space:
mode:
authorIrene Knapp <ireneista@irenes.space>2026-05-16 13:52:40 -0700
committerIrene Knapp <ireneista@irenes.space>2026-05-16 13:52:40 -0700
commitf4112a05de8bf4c69a7abb9817c7ca70be9f7fb5 (patch)
tree2393e83159efb73b71792d28e726dd30a0add2cf /log-load.e
parent32f1fde313ce07e086b503b08babffe60a7be05d (diff)
add a stub for the log-load transform, and a ton of documentation
Force-Push: yes
Change-Id: Ia1fe0e6aefaf6776bd69bca4a26ee0df0b555832
Diffstat (limited to 'log-load.e')
-rw-r--r--log-load.e178
1 files changed, 178 insertions, 0 deletions
diff --git a/log-load.e b/log-load.e
new file mode 100644
index 0000000..029b9da
--- /dev/null
+++ b/log-load.e
@@ -0,0 +1,178 @@
+~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~ ~~ Bootstrapping the log ~~
+~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~
+~   The log is the main region of memory within which most dynamic allocation
+~ happens. It's a single contiguous segment of virtual memory, which is
+~ requested from the kernel when Evocation starts up. Almost all of
+~ Evocation's dynamic data is kept in the log, including the main dictionary;
+~ several important global variables which make it possible to find and
+~ allocate other data structures; and the control stack.
+~
+~   This file has the task of providing words which are useful for working
+~ with the log, and more specifically which are useful for helping to bring
+~ the log into existence. Once the log exists, it can be used to manage
+~ itself, but there's a bootstrapping challenge in getting there. That
+~ challenge is solved by the warm-start routine in execution.e, which relies
+~ on the words in this file and should load after it.
+~
+~   Some modern Forths, including Jonesforth, refer to the log as the heap.
+~ This is a misnomer; a heap is a data structure that allows non-contiguous
+~ allocation. Although there are Forths that have true heaps, Evocation is not
+~ one of them. Space in the log is allocated by incrementing the "here"
+~ variable (one of those important globals), which necessarily can only
+~ allocate contiguous blocks; there is no way to compact allocations to
+~ reclaim fragmented, unused space in between them. Evocation does allow
+~ deallocation using "forget", but this is done by resetting "here" and
+~ "latest" to older values, unwinding every allocation that's been done since
+~ the point in time they return to.
+~
+~   It would be a mistake to confuse this allocation strategy with the
+~ more-general facilities for allocation, reallocation, and deallocation of
+~ individual memory blocks that many other languages have. To avoid confusion,
+~ we stay away from the name "heap", though it may still occasionally be used
+~ colloquially because it's familiar from other Forths, and because most
+~ programming languages have a heap as the main memory segment they request
+~ from the kernel.
+~
+~   In the strictest technical sense, the log is a stack: Things are added
+~ to the end of it, and removed from that same end. However, Evocation already
+~ has two other stacks, the control and value stacks. Adding to the potential
+~ confusion, the control stack is actually stored inside the log (as a
+~ fixed-size chunk at the bottom). However, the log isn't really that much
+~ like a stack when you look at how it's actually used. Unlike Evocation's
+~ control and value stacks, data structures on the log tend to be rich and
+~ complex, interlinked in various ways through the use of pointers. They also
+~ tend to be long-lived, with the log tending to grow over time, whereas the
+~ control and value stacks tend to remain roughly the same size through cycles
+~ of growth and shrinking. In order to be able to speak precisely about what
+~ we're doing, we introduce the name "log" to refer to the entire memory
+~ segment and everything stored within it.
+~
+~   Another linguistic choice we make is to be clear about dictionaries. A
+~ dictionary is a linked list of word entries. Each dictionary has a specific
+~ handle, a pointer to a pointer, which is the root of the list. Each
+~ word entry begins with a specific data structure, which among other things
+~ includes a next-entry pointer, a flags byte, and a string that serves as
+~ the entry's name. Older entries in a dictionary seldom change; newer entries
+~ are added at the beginning of it, with their next-entry pointers leading to
+~ the older entries. It is possible for several dictionaries to exist at once,
+~ each with its own dictionary handle.
+~
+~   Since dictionaries are managed using pointers to individual entries, there
+~ is no specific requirement about the order in which those entries occur in
+~ memory or where they are allocated, but usually a new entry is allocated at
+~ the end of the log, by incrementing the variable "here", in the same manner
+~ as any other allocation. There is one particular dictionary, the main
+~ dictionary, whose handle is the variable "latest". The main dictionary holds
+~ every executable word that can be used normally via Evocation's interpreter.
+~
+~   Since the main dictionary is by far the most important thing in the log,
+~ it can be tempting to conflate the log with the main dictionary. This is
+~ accurate enough for some purposes, but note that other dictionaries are
+~ often interleaved with it, their allocations entwining like grape vines even
+~ while each remains separate, reachable only via its own root. See the
+~ machine label facility, in labels.e, for an example of how a secondary
+~ dictionary can be useful.
+~
+~   This may feel tangential, but it's important background and there's no
+~ better place to explain it: A handle is a pointer to a pointer. The variable
+~ "latest" returns a handle, a fixed address which always holds the pointer to
+~ the root entry of the main dictionary. Dereferencing that handle gives you
+~ the dictionary pointer, the address of the root entry, which is suitable to
+~ pass to find-in and similar words that read the dictionary's contents. When
+~ you want to add a new entry to a dictionary, you need the dictionary's
+~ handle, so that the root pointer can be changed. When you only want to write
+~ it, you only need the regular single pointer.
+~
+~   When reading the documentation of words that work with dictionaries, pay
+~ close attention to whether their parameters include a dictionary handle, or
+~ a dictionary pointer.
+~
+~   The term "handle" was widely known in the early days of microcomputing,
+~ when memory-safe languages without direct pointer access were less common.
+~ Today it is usually considered specific to systems programming, the type of
+~ programming which lies beneath other software and deals with topics such as
+~ memory management and processes. Evocation is a systems-programming
+~ language, in the sense that it takes pains to not introduce mandatory
+~ abstractions which would make it difficult or inefficient to work directly
+~ with these topics. So, in understanding Evocation, it's important to know
+~ about handles.
+
+
+~   Find-in is the main word that provides the capability to look up words by
+~ name, though it's usually used via "find" rather than being called directly.
+~
+~   Find-in traverses the linked list formed by a particular dictionary's
+~ next-entry pointers, looking for an entry that matches a given name. The
+~ dictionary pointer is the pointer (not handle) to the root of the list,
+~ which runs from newest to oldest. For example, dereferencing the value of
+~ "latest" gives the pointer to the main dictionary, which can be passed to
+~ find-in.
+~
+~   Having find-in separated out is convenient when working with alternate
+~ dictionaries, but the main reason for having it is not convenience but
+~ necessity: During Evocation's startup, there is a period before global
+~ variables are easily accessible, so there would be no way to implement
+~ "find". The warm-start routine (see execution.e and transform.e) has the
+~ job of fixing that, and it makes extensive use of find-in to do so.
+~
+~ (dictionary pointer, string pointer -- entry pointer or 0)
+: find-in
+  ~ It will be more convenient to have the entry pointer on top.
+  swap
+
+  {
+    ~ If the entry pointer is null, exit.
+    ~ (name pointer to find, current entry pointer)
+    dup 0 = { swap drop exit } if
+
+    ~ Check this entry's "hidden" flag.
+    ~ (name pointer to find, current entry pointer)
+    dup entry-flags@ 0x80 & 0x80 != {
+      ~ Test whether this entry is a match.
+      ~ (name pointer to find, current entry pointer)
+      2dup 10 + stringcmp 0 = {
+        ~ If we're here, it's a match. Clean up our working state and exit.
+        ~ (name pointer to find, current entry pointer)
+        swap drop exit
+      } if
+    } if
+
+    ~ If we're here, it's not a match; traverse the pointer and repeat.
+    ~ (name pointer to find, current entry pointer)
+    @
+  } forever ;
+
+
+~   This has the same value as the constant control-stack-size, which is
+~ defined in execution.e. Everything will break if it doesn't.
+~
+~ TODO: remove one of them. Probably the other one.
+: log-offset                        0x10000 ; ~ 64 KiB
+
+~ (log address -- log address, "latest" pointer)
+: log-load-latest
+  dup log-offset + 3 8 * + ;
+~ (log address -- log address, "latest" pointer)
+: log-load-here
+  dup log-offset + 4 8 * + ;
+
+
+~   This is a helper used by warm-start, which invokes find-in using "latest".
+~ It relies on being passed the root address of the log, which is used to find
+~ the global variable "latest". It's inconvenient to keep a log pointer around
+~ all the time, which is why we stop doing it as soon as possible, but during
+~ Evocation's startup there's no alternative. This word is used extensively
+~ by code that's been compiled via the log-load transform; see transform.e for
+~ details.
+~
+~   It would be possible to unload this word after the log is created, but
+~ there are rare situations in which it's still useful, such as injecting
+~ Evocation into another process's address space. Plus, it's small. So, we
+~ keep it around.
+~
+~ (log address, string pointer -- log address, entry pointer or 0)
+: log-load-find
+  swap log-load-latest @ swap 3unroll swap find-in ;
+