diff options
| -rw-r--r-- | log-load.e | 178 | ||||
| -rw-r--r-- | transform.e | 364 |
2 files changed, 532 insertions, 10 deletions
diff --git a/log-load.e b/log-load.e new file mode 100644 index 0000000..029b9da --- /dev/null +++ b/log-load.e @@ -0,0 +1,178 @@ +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ ~~ Bootstrapping the log ~~ +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ +~ The log is the main region of memory within which most dynamic allocation +~ happens. It's a single contiguous segment of virtual memory, which is +~ requested from the kernel when Evocation starts up. Almost all of +~ Evocation's dynamic data is kept in the log, including the main dictionary; +~ several important global variables which make it possible to find and +~ allocate other data structures; and the control stack. +~ +~ This file has the task of providing words which are useful for working +~ with the log, and more specifically which are useful for helping to bring +~ the log into existence. Once the log exists, it can be used to manage +~ itself, but there's a bootstrapping challenge in getting there. That +~ challenge is solved by the warm-start routine in execution.e, which relies +~ on the words in this file and should load after it. +~ +~ Some modern Forths, including Jonesforth, refer to the log as the heap. +~ This is a misnomer; a heap is a data structure that allows non-contiguous +~ allocation. Although there are Forths that have true heaps, Evocation is not +~ one of them. Space in the log is allocated by incrementing the "here" +~ variable (one of those important globals), which necessarily can only +~ allocate contiguous blocks; there is no way to compact allocations to +~ reclaim fragmented, unused space in between them. Evocation does allow +~ deallocation using "forget", but this is done by resetting "here" and +~ "latest" to older values, unwinding every allocation that's been done since +~ the point in time they return to. +~ +~ It would be a mistake to confuse this allocation strategy with the +~ more-general facilities for allocation, reallocation, and deallocation of +~ individual memory blocks that many other languages have. To avoid confusion, +~ we stay away from the name "heap", though it may still occasionally be used +~ colloquially because it's familiar from other Forths, and because most +~ programming languages have a heap as the main memory segment they request +~ from the kernel. +~ +~ In the strictest technical sense, the log is a stack: Things are added +~ to the end of it, and removed from that same end. However, Evocation already +~ has two other stacks, the control and value stacks. Adding to the potential +~ confusion, the control stack is actually stored inside the log (as a +~ fixed-size chunk at the bottom). However, the log isn't really that much +~ like a stack when you look at how it's actually used. Unlike Evocation's +~ control and value stacks, data structures on the log tend to be rich and +~ complex, interlinked in various ways through the use of pointers. They also +~ tend to be long-lived, with the log tending to grow over time, whereas the +~ control and value stacks tend to remain roughly the same size through cycles +~ of growth and shrinking. In order to be able to speak precisely about what +~ we're doing, we introduce the name "log" to refer to the entire memory +~ segment and everything stored within it. +~ +~ Another linguistic choice we make is to be clear about dictionaries. A +~ dictionary is a linked list of word entries. Each dictionary has a specific +~ handle, a pointer to a pointer, which is the root of the list. Each +~ word entry begins with a specific data structure, which among other things +~ includes a next-entry pointer, a flags byte, and a string that serves as +~ the entry's name. Older entries in a dictionary seldom change; newer entries +~ are added at the beginning of it, with their next-entry pointers leading to +~ the older entries. It is possible for several dictionaries to exist at once, +~ each with its own dictionary handle. +~ +~ Since dictionaries are managed using pointers to individual entries, there +~ is no specific requirement about the order in which those entries occur in +~ memory or where they are allocated, but usually a new entry is allocated at +~ the end of the log, by incrementing the variable "here", in the same manner +~ as any other allocation. There is one particular dictionary, the main +~ dictionary, whose handle is the variable "latest". The main dictionary holds +~ every executable word that can be used normally via Evocation's interpreter. +~ +~ Since the main dictionary is by far the most important thing in the log, +~ it can be tempting to conflate the log with the main dictionary. This is +~ accurate enough for some purposes, but note that other dictionaries are +~ often interleaved with it, their allocations entwining like grape vines even +~ while each remains separate, reachable only via its own root. See the +~ machine label facility, in labels.e, for an example of how a secondary +~ dictionary can be useful. +~ +~ This may feel tangential, but it's important background and there's no +~ better place to explain it: A handle is a pointer to a pointer. The variable +~ "latest" returns a handle, a fixed address which always holds the pointer to +~ the root entry of the main dictionary. Dereferencing that handle gives you +~ the dictionary pointer, the address of the root entry, which is suitable to +~ pass to find-in and similar words that read the dictionary's contents. When +~ you want to add a new entry to a dictionary, you need the dictionary's +~ handle, so that the root pointer can be changed. When you only want to write +~ it, you only need the regular single pointer. +~ +~ When reading the documentation of words that work with dictionaries, pay +~ close attention to whether their parameters include a dictionary handle, or +~ a dictionary pointer. +~ +~ The term "handle" was widely known in the early days of microcomputing, +~ when memory-safe languages without direct pointer access were less common. +~ Today it is usually considered specific to systems programming, the type of +~ programming which lies beneath other software and deals with topics such as +~ memory management and processes. Evocation is a systems-programming +~ language, in the sense that it takes pains to not introduce mandatory +~ abstractions which would make it difficult or inefficient to work directly +~ with these topics. So, in understanding Evocation, it's important to know +~ about handles. + + +~ Find-in is the main word that provides the capability to look up words by +~ name, though it's usually used via "find" rather than being called directly. +~ +~ Find-in traverses the linked list formed by a particular dictionary's +~ next-entry pointers, looking for an entry that matches a given name. The +~ dictionary pointer is the pointer (not handle) to the root of the list, +~ which runs from newest to oldest. For example, dereferencing the value of +~ "latest" gives the pointer to the main dictionary, which can be passed to +~ find-in. +~ +~ Having find-in separated out is convenient when working with alternate +~ dictionaries, but the main reason for having it is not convenience but +~ necessity: During Evocation's startup, there is a period before global +~ variables are easily accessible, so there would be no way to implement +~ "find". The warm-start routine (see execution.e and transform.e) has the +~ job of fixing that, and it makes extensive use of find-in to do so. +~ +~ (dictionary pointer, string pointer -- entry pointer or 0) +: find-in + ~ It will be more convenient to have the entry pointer on top. + swap + + { + ~ If the entry pointer is null, exit. + ~ (name pointer to find, current entry pointer) + dup 0 = { swap drop exit } if + + ~ Check this entry's "hidden" flag. + ~ (name pointer to find, current entry pointer) + dup entry-flags@ 0x80 & 0x80 != { + ~ Test whether this entry is a match. + ~ (name pointer to find, current entry pointer) + 2dup 10 + stringcmp 0 = { + ~ If we're here, it's a match. Clean up our working state and exit. + ~ (name pointer to find, current entry pointer) + swap drop exit + } if + } if + + ~ If we're here, it's not a match; traverse the pointer and repeat. + ~ (name pointer to find, current entry pointer) + @ + } forever ; + + +~ This has the same value as the constant control-stack-size, which is +~ defined in execution.e. Everything will break if it doesn't. +~ +~ TODO: remove one of them. Probably the other one. +: log-offset 0x10000 ; ~ 64 KiB + +~ (log address -- log address, "latest" pointer) +: log-load-latest + dup log-offset + 3 8 * + ; +~ (log address -- log address, "latest" pointer) +: log-load-here + dup log-offset + 4 8 * + ; + + +~ This is a helper used by warm-start, which invokes find-in using "latest". +~ It relies on being passed the root address of the log, which is used to find +~ the global variable "latest". It's inconvenient to keep a log pointer around +~ all the time, which is why we stop doing it as soon as possible, but during +~ Evocation's startup there's no alternative. This word is used extensively +~ by code that's been compiled via the log-load transform; see transform.e for +~ details. +~ +~ It would be possible to unload this word after the log is created, but +~ there are rare situations in which it's still useful, such as injecting +~ Evocation into another process's address space. Plus, it's small. So, we +~ keep it around. +~ +~ (log address, string pointer -- log address, entry pointer or 0) +: log-load-find + swap log-load-latest @ swap 3unroll swap find-in ; + diff --git a/transform.e b/transform.e index fc6e69e..ac85e14 100644 --- a/transform.e +++ b/transform.e @@ -2,17 +2,52 @@ ~ ~~ Code transformation facility ~~ ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ -~ TODO explain what problem this is solving and why -~ The process of +~ The process of producing an executable binary out of Evocation involves +~ various bootstrapping phases during which code operates under different +~ constraints, and must be written with different styles. In some cases, +~ substantially the same code must be output multiple times in slightly +~ different ways, and it would be both arduous and verbose to write each of +~ these directly. +~ +~ To solve this problem, this file implements a concept of code +~ transformation. There are two transforms, the label transform and the +~ log-load transform, each of which takes a string containing Evocation source +~ code and produces compiled code that has been modified to operate in a +~ specific way. The transforms rely on the label facility provided by +~ labels.e, and expect to run from within label-loop. ~ ~ The label transform operates on code that compiles itself, and ensures ~ that the result of the compilation is suitable to be included in an -~ executable binary. To achieve this, it makes several changes to the -~ semantics of that code. The transform relies on the label facility, and -~ expects to run from within label-loop. +~ executable binary as words that are statically referenced by their +~ addresses. To achieve this, it causes each newly-defined word to have a +~ corresponding label whose value is the offset of its codeword, and it causes +~ all compiled invocations of other words to be resolved by using these labels. +~ The label transform is suitable for code that must be directly invoked by +~ the warm-start routine provided by execution.e. +~ +~ The log-load transform also operates on code that compiles itself; it +~ produces a compiled routine which, when run, appends the original code to +~ the log. As the routine is run, each reference to another word is resolved +~ by looking up the name of the target word in the log. Furthermore, these +~ lookups are done using log-load-find, defined in log-load.e, which accepts +~ a pointer to the log's base address as a parameter. See that file for more +~ explanation of what the log is and why it's important. Thus, unlike normal +~ accesses to the log, this routine doesn't rely on already having the log's +~ base address hardcoded into it at the time of its own compilation. The +~ log-load transform is suitable for implementing the core responsibilities of +~ the warm-start routine provided by execution.e, relying on only a few +~ specific words that it statically references via labels. +~ +~ The log-load transform may also be useful for experimental tasks such as +~ creating additional, independent logs, or injecting Evocation into another +~ process's address space. +~ +~ +~ About the label transform +~ ~~~~~~~~~~~~~~~~~~~~~~~~~ ~ -~ The most fundamental change is that the label transform separates words -~ that run in compile mode from words that run immediately. There is no +~ The most fundamental technique the label transform performs is to separate +~ words that run in compile mode from words that run immediately. There is no ~ distinction made between words running in immediate mode, and words declared ~ as immediate. Immediate words are looked up and executed based on their ~ "real", currently-executing definitions. Compiled words, including @@ -56,17 +91,34 @@ ~ the rest of Evocation. There's no need to keep it separate like there is ~ with the other variables. This makes it easy to change modes. ~ -~ The transformation and the alternates rely on various labels, all of which -~ must be defined elsewhere, lest the label loop fail to converge: "lit", -~ "origin", "docol", "exit", ":", ";", and ";asm". +~ The label transformation and its alternates rely on various labels, all of +~ which must be defined elsewhere, lest the label loop fail to converge: +~ "lit", "origin", "docol", "exit", ":", ";", and ";asm". ~ ~ All of these limitations result in the compiled code being, in effect, ~ written in a dialect which is like Evocation, but more restricted. This is ~ acceptable, because the label transform is intended for compiling code that ~ is an early part of Evocation itself, and the necessary code has all been ~ written to follow these restrictions. +~ +~ +~ About the log-load transform +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ +~ Much like the label transform, the log-load transform provides alternate +~ versions of certain immediate words used in word definition. Also like the +~ label transform, it provides its own copies of "here" and "latest". +~ +~ The log-load transformation and its alternates rely on the following +~ labels, all of which must be defined elsewhere: TODO +~ Buffer- and address-management helpers +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ +~ The facilities in this section are used as helper code in the +~ implementations of both transforms. + ~ TODO all this buffer stuff should be in its own file ~ (buffer size -- buffer address) : read-to-buffer @@ -212,6 +264,13 @@ allocate-transform-state s" transform-state" variable target-address-space-to-offset offset-to-host-address-space ; +~ Label transform implementation +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ +~ The following code is all part of implementing the label transform. For +~ conceptual overview, see the top of this file. + + ~ This is the alternate version of "create" for use with the label ~ transform. Its code is the same as the regular "create" except as noted ~ below. It is likely to be extremely useful to read and understand "create" @@ -488,3 +547,288 @@ allocate-transform-state s" transform-state" variable exit } if } forever ; + +~ Log-load transform implementation +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ +~ The following code is all part of implementing the log-load transform. +~ For conceptual overview, see the top of this file. + + +~ This is the alternate version of "create" for use with the log-load +~ transform. Its code is the same as the regular "create" except as noted +~ below. It is likely to be extremely useful to read and understand "create" +~ in interpret.e before attempting to understand log-load-create. +: log-load-create + dup stringlen 1 + dup 3unroll + here @ 10 + 3unroll memmove + here @ + + ~ This value of "latest" is going into the generated output, so we need + ~ to map it to the target address space. It's stored in the host address + ~ space to make immediate words work as expected, so the appropriate + ~ conversion is host-address-space-to-target. + latest @ host-address-space-to-target pack64 + 0 pack8 + 0 pack8 + + + 8 packalign + here @ latest ! + + ~ Now we're immediately after the word header, which is where the codeword + ~ will be. This is the value the label should taken on, so we set it. + dup host-address-space-to-offset + here @ 10 + + swap-transform-variables + intern-label set-label + swap-transform-variables + + here ! ; + + +~ This is the alternate version of ":" for use with the log-load transform. +~ Its code is the same as the regular "create" except as noted below. It is +~ likely to be extremely useful to read and understand ":" in interpret.e +~ before attempting to understand "log-load:". +: log-load: + ~ This calls "log-load-create" instead of "create". + word value@ log-load-create dropstring + + ~ This looks up "docol" by label. + swap-transform-variables + L@' docol + L@' origin + swap-transform-variables + + , + + latest @ hide-entry ] ; + + +~ This is the alternate version of ";" for use with the log-load transform. +~ Its code is the same as the regular "create" except as noted below. It is +~ likely to be extremely useful to read and understand ";" in interpret.e +~ before attempting to understand "log-load;". +: log-load; + ~ This looks up "exit" by label. + swap-transform-variables + L@' exit + swap-transform-variables + offset-to-target-address-space , + + latest @ unhide-entry + + ~ Since [ is an immediate word, we have to go to extra trouble to compile + ~ it as part of ;. + [ ' [ entry-to-execution-token , ] + ; make-immediate + + +~ This is the alternate version of ";asm" for use with the log-load +~ transform. Its code is the same as the regular "create" except as noted +~ below. It is likely to be extremely useful to read and understand ";asm" in +~ interpret.e before attempting to understand "log-load;asm". +: log-load;asm + here @ pack-next 8 packalign here ! + latest @ dup unhide-entry entry-to-execution-token + ~ The codeword needs to be transformed to the target address space. + dup 8 + host-address-space-to-target + swap ! + + ~ Since [ is an immediate word, we have to go to extra trouble to compile + ~ it as part of ;asm. + [ ' [ entry-to-execution-token , ] + ; make-immediate + +~ This implements the log-load transform for a single word. It is directly +~ analogous to "interpret", and reading interpret.e may help in understanding +~ it, though it's meant to still make sense on its own. +~ +~ It expects to be called from "log-load-transform", below, which loops. +~ +~ (-- done) +: log-load-transform-one + word + + ~ If no word was returned, exit. + dup 0 = { drop 0 exit } if + + ~ The string is on the top of the stack, so to get a pointer to it we get + ~ the stack address. + ~ (string) + value@ + + ~ If it's the magic word, end the transformation. + dup s" pyrzqxgl" stringcmp 0 = { drop dropstring 1 exit } if + + ~ Check whether it's one of the words we have alternates for, and look up + ~ the alternate if so. + dup 0 swap + ~ (name as stack string, name pointer, placeholder, name pointer) + dup s" create" stringcmp 0 = { swap drop ' log-load-create swap } if + dup s" :" stringcmp 0 = { swap drop ' log-load: swap } if + dup s" ;" stringcmp 0 = { swap drop ' log-load; swap } if + dup s" ;asm" stringcmp 0 = { swap drop ' log-load;asm swap } if + drop swap + ~ (name as stack string, 0 or alternate entry pointer, name pointer) + + ~ If an alternate was found, the alternate will be used in immediate mode. + ~ If not, we look up the word in the regular, non-transformed dictionary + ~ and use that for immediate mode. + over { dup + transform-state transform-state-saved-latest @ swap find-in + 3roll drop swap } unless + ~ (name as stack string, immediate entry pointer, name pointer) + + ~ In regular "interpret", we would check whether we found the word before + ~ checking the mode. However, we have three different places words could + ~ come from, so that's not a simple notion. So, we check the mode first. + interpreter-flags @ 0x01 & { + ~ If we're in compile mode, there's still a chance it's an immediate + ~ word. First check whether we have an immediate entry, then if so, check + ~ that entry's flags. Notice that this means the generated code can't + ~ override an immediate word with a non-immediate word of the same name. + over dup { entry-flags@ 0x01 & not } if + + { + ~ Either there was no immediate entry, or the immediate entry wasn't + ~ flagged as an immediate word. So we check whether this could be a + ~ compilation. + ~ + ~ To do this, we need to look the word up in the output buffer. We + ~ can't easily traverse the next-entry pointers in the output buffer's + ~ dictionary, so we check the label. Since we don't know the word's name + ~ statically, this is a rare scenario where we can't use the abbreviated + ~ label syntax, but that's easy enough. + ~ + ~ Even though we've ruled out the possibility that the word is only + ~ ever used immediately, it is still possible that there's some reason + ~ the word doesn't exist. In particular, it could be an integer literal. + ~ If we were to call use-label first, that would count as a requirement + ~ that the label must eventually be set. We don't want to require that + ~ quite yet, so we call find-label. + ~ + ~ This check is the means by which forward references are disallowed: + ~ On the very first pass, a forward-referenced label won't exist yet, so + ~ transform will give a "no such word" error, which in an ideal world + ~ would prevent there from being a subsequent pass, but at the very + ~ least it will ensure the output isn't a valid ELF. + dup + swap-transform-variables + find-label + swap-transform-variables + { + ~ It exists, so we declare our use of it (that's also the only way to + ~ get a value for it). + swap-transform-variables + intern-label use-label + swap-transform-variables + + ~ Labels point to codewords (because that's what "Lcreate" does), + ~ which is already what we want to output. + ~ + ~ An important caveat: Though it would require something weird to be + ~ happening, such as a forced forward reference, the label may be zero! + ~ We need to allow for that possibility by not examining the contents of + ~ a nonexistent entry. + ~ + ~ Fortunately we don't have to look at it, just append it to the heap + ~ and clean up. + offset-to-target-address-space , drop dropstring 0 exit + } if + } if + } if + ~ (name as stack string, immediate entry pointer, name pointer) + + ~ If we got here, one of three things is true: We're in interpret mode; + ~ the word is immediate; or no word was found. If the immediate entry + ~ pointer is non-zero, run it. + over { + drop dropstring-with-result entry-to-execution-token execute + 0 exit + } if + + ~ If we're still here, it wasn't in the dictionary. Also, we don't need + ~ the immediate entry pointer, either. + drop drop + ~ (name as stack string) + + ~ If it's not in the dictionary, check whether it's an integer literal. As + ~ before, we get the stack address and use it as a string pointer. + value@ read-integer 0 = { + ~ It's a number. + interpreter-flags @ 0x01 & { + ~ We're in compile mode; append first "lit", then the number, to the + ~ heap. The version of "lit" we use is the one that's current when we + ~ ourselves are compiled, hardcoded; doing a dynamic lookup would + ~ require dealing with what happens if it's not found. + ~ TODO this is wrong + dropstring-with-result + + ~ We look up "lit" as a label. + swap-transform-variables L@' lit swap-transform-variables + offset-to-target-address-space + , , + 0 exit + } if + + ~ We're in interpret mode; push the number to the stack. Or at least, that's + ~ what the code we're interpreting will see. Really it's already on the + ~ stack, just clean everything else up and leave it there. + dropstring-with-result + 0 exit + } if + + ~ If it's neither in the dictionary nor a number, just print an error. + s" No such word: " emitstring value@ emitstring dropstring 0 ; + + +~ This implements the log-load transform for all words in a region given as +~ an input string. It is directly analogous to "quit", in interpret.e, but is +~ far more complex. +~ +~ TODO TODO TODO this is just a stub, right now it's just a copy of the label +~ transform +~ (output buffer start, output point, input string pointer +~ -- output buffer start, output point) +: log-load-transform + main-input-buffer dup push-input-buffer + ~ TODO the arguments for this seem to be backwards from the documentation + swap attach-string-to-input-buffer + + ~ Save the old values of "here" and "latest", and set the initial values + ~ of the internal ones. These values need to persist across iterations, + ~ since client code will make its own updates to them and then rely on those + ~ updates having taken effect. So we do the swap just once, here outside the + ~ loop, and set it back when the loop ends. + here @ transform-state transform-state-saved-here ! + latest @ transform-state transform-state-saved-latest ! + over transform-state transform-state-output-buffer-start ! + here ! + 0 latest ! + ~ Now the stack has nothing of ours on it, so client code can do its thing. + + ~ It's important that the stack has nothing of ours on it that persists + ~ across iterations, so that client code can add and remove stuff there as + ~ it sees fit. + { log-load-transform-one + ~ (done) + + ~ When the loop is done, get the real values of "here" and "latest" + ~ back. The internal "here" is also the output point, and will become our + ~ return value. The internal "latest" is discarded. + { here @ + transform-state transform-state-saved-here @ here ! + transform-state transform-state-saved-latest @ latest ! + ~ (output point) + + ~ Though we don't actually use transform-state outside of this + ~ invocation, for tidiness we zero it out. + 0 transform-state transform-state-saved-here ! + 0 transform-state transform-state-saved-latest ! + 0 transform-state transform-state-output-buffer-start ! + + ~ Also put the input source back how it was. + main-input-buffer pop-input-buffer + + exit } if } forever ; + |