~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~~ Code transformation facility ~~ ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ The process of producing an executable binary out of Evocation involves ~ various bootstrapping phases during which code operates under different ~ constraints, and must be written with different styles. In some cases, ~ substantially the same code must be output multiple times in slightly ~ different ways, and it would be both arduous and verbose to write each of ~ these directly. ~ ~ To solve this problem, this file implements a concept of code ~ transformation. There are two transforms, the label transform and the ~ log-load transform, each of which takes a string containing Evocation source ~ code and produces compiled code that has been modified to operate in a ~ specific way. The transforms rely on the label facility provided by ~ labels.e, and expect to run from within label-loop. ~ ~ The label transform produces code that uses one label per word it defines, ~ to statically reference everything. Thus, when output to an executable ~ binary, this code will function without external dependencies. The tradeoff ~ is that it has no way to reference data that exists only at runtime. ~ ~ The log-load transform relies on labels, but doesn't add any of its own. ~ It produces a compiled routine which, when run, dynamically looks up all the ~ references in the log, and appends the original code to the log. This adds ~ work that must be done when the runtime starts up, but the benefit is that ~ it can reference data that doesn't exist at compile-time. Most crucially, ~ it can reference the "here" and "latest" pointers in the log, which are ~ required for all the usual word-definition stuff to work, and whose ~ addresses are not known until runtime. ~ ~ The log-load transform may also be useful for experimental tasks such as ~ creating additional, independent logs, or injecting Evocation into another ~ process's address space. ~ ~ Please notice that both these transforms, in different ways, navigate the ~ same underlying design tension: The Forth compilation model hardcodes ~ references at the time compilation happens, and Evocation makes the choice ~ to not decide the address of the log until runtime. Thus the label transform ~ can't be sufficient on its own. Other Forths avoid this problem by ~ hardcoding an address for the log, or by using OS-provided load-time ~ symbol relocation. Evocation, however, does it on hard mode, mostly for fun. ~ ~ Because it was clear from early on that the label transform couldn't stand ~ alone, and that another one would be necessary, we've refrained from adding ~ too many features to it. Since we have multiple transforms, they should each ~ be kept simple and well-defined, so that they can be composed in creative ~ new ways down the line. When adding additional behavior, always give thought ~ to whether it belongs in an existing transform or a new one. ~ ~ ~ About the label transform ~ ~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ The label transform operates on code that compiles itself, and ensures ~ that the result of the compilation is suitable to be included in an ~ executable binary as words that are statically referenced by their ~ addresses. To achieve this, it causes each newly-defined word to have a ~ corresponding label whose value is the offset of its codeword, and it causes ~ all compiled invocations of other words to be resolved by using these labels. ~ The label transform is suitable for code that must be directly invoked by ~ the warm-start routine provided by execution.e. ~ ~ The most fundamental technique the label transform performs is to separate ~ words that run in compile mode from words that run immediately. There is no ~ distinction made between words running in immediate mode, and words declared ~ as immediate. Immediate words are looked up and executed based on their ~ "real", currently-executing definitions. Compiled words, including ~ literals, are looked up via the label facility. ~ ~ Since the label facility is able to resolve forward references, there is ~ no hard requirement that everything in the file be topologically sorted. ~ However, the transform will refuse to create forward references to compiled ~ words. If you want them, you can create them by hand by calling use-label ~ yourself. This restriction is in place because allowing forward references ~ would be a significant difference from un-transformed code that could easily ~ become confusing, and because it simplifies the implementation a bit. ~ ~ Compilation words do make extensive reference to the global variables ~ "here" and "latest". In particular, flow-control words such as if-else ~ expect the log to have recent compilation outputs on it, and to be able to ~ mutate them in-place. In order to make this work, we provide temporary ~ values of these two variables which point to the location of the output ~ buffer. This allows pointer resolution to work correctly without additional ~ effort, but notice that the buffer's address will differ from the address ~ the resulting program loads itself at. There's no simple way to avoid this ~ concern, since the variables must point to one of those addresses or the ~ other, not both. ~ ~ We resolve the issue by running our own, alternate versions of the words ~ "create", ":", ";", and ";asm" which use the label facility to compute the ~ addresses that will be needed at runtime. These alternates run instead of ~ the normal versions of these words. The code being compiled is responsible ~ for not doing anything else that would rely on "here" and "latest" matching ~ their runtime addresses, though it is otherwise allowed to modify and rely ~ on them in all the usual ways. The alternate versions are defined in this ~ file as their own words, "Lcreate", "L:", "L;", and "L;asm". ~ TODO note L@' and L!' ~ ~ Note that these alternates are applied via a purely lexical ~ transformation: when a word would be looked up in the dictionary to ~ interpret, first check if it's one of these. That means the transformation ~ won't apply to indirect callers of these words, nor to tick-quotes of them. ~ The code being compiled is responsible for not doing either of those things. ~ ~ Notably, the transformation uses the same "interpreter-flags" variable as ~ the rest of Evocation. There's no need to keep it separate like there is ~ with the other variables. This makes it easy to change modes. ~ ~ The label transformation and its alternates rely on various labels, all of ~ which must be defined elsewhere, lest the label loop fail to converge: ~ "lit", "origin", "docol", "exit", ":", ";", and ";asm". ~ ~ All of these limitations result in the compiled code being, in effect, ~ written in a dialect which is like Evocation, but more restricted. This is ~ acceptable, because the label transform is intended for compiling code that ~ is an early part of Evocation itself, and the necessary code has all been ~ written to follow these restrictions. ~ ~ ~ About the log-load transform ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ The log-load transform also operates on code that compiles itself; it ~ produces a compiled routine which, when run, appends the original code to ~ the log. As the routine is run, each reference to another word is resolved ~ by looking up the name of the target word in the log. Furthermore, these ~ lookups are done using log-load-find, defined in log-load.e, which accepts ~ a pointer to the log's base address as a parameter. See that file for more ~ explanation of what the log is and why it's important. Thus, unlike normal ~ accesses to the log, this routine doesn't rely on already having the log's ~ base address hardcoded into it at the time of its own compilation. The ~ log-load transform is suitable for implementing the core responsibilities of ~ the warm-start routine provided by execution.e, relying on only a few ~ specific words that it statically references via labels. ~ ~ Much like the label transform, the log-load transform provides alternate ~ versions of certain immediate words used in word definition. Also like the ~ label transform, it provides its own copies of "here" and "latest". ~ ~ The log-load transform provides alternates for a significantly broader set ~ of words than the label transform, including all the flow-control words such ~ as if-else. It runs its own alternates immediately, but unlike the label ~ transform, immediate execution for the log-load transform is not actually ~ immediate; it is compiled into words which will have those immediate effects ~ at the time the generated routine is run. The generated routine can itself ~ be thought of as a compilation process, producing its output on the log, so ~ doing things later for us still means doing them immediately during the ~ routine. ~ ~ The log-load transform does impose a no-forward-references requirement, ~ though it is applied at the time the routine is run, rather than at the time ~ of the transformation. ~ ~ The log-load transformation and its alternates rely on the following ~ labels, all of which must be defined elsewhere: TODO ~ Buffer- and address-management helpers ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ The facilities in this section are used as helper code in the ~ implementations of both transforms. ~ TODO all this buffer stuff should be in its own file ~ (buffer size -- buffer address) : read-to-buffer dup allocate dup dup ~ (buffer size, buffer address, word start, output point) { key ~ Exit if it's a zero byte. dup not { ~ Make sure to pack the zero to serve as a null terminator. pack8 drop drop swap drop exit } if dup is-space { ~ (buffer size, buffer address, word start, output point, key) ~ Tuck the key out of the way until we've done some stuff. 3unroll ~ If it's a space character, first check if we just consumed the magic ~ word... 2dup swap - 8 = dup { drop ~ Add a null terminator so we can use stringcmp dup 0 swap ! ~ Check for the magic word over s" pyrzqxgl" stringcmp 0 = } if { ~ It's magic, so exit. ~ Make sure to pack a zero to serve as a null terminator. 0 pack8 drop drop drop swap drop exit } { ~ It's not magic, so reset the word start. Of course whitespace is ~ not a word but this will help us keep track of things. 3roll pack8 swap drop dup } if-else } { ~ (buffer size, buffer address, word start, output point, key) ~ Tuck the key out of the way again. 3unroll ~ Check if the word just started and the previous character is space. 2dup = dup { drop dup @ is-space } if { ~ If so, this is the actual first character of the word. drop swap pack8 dup } { ~ If not, leave the word start alone. 3roll pack8 } if-else } if-else } forever ; ~ In logical terms, this modifies an input buffer metadata structure ~ in-place to push a new, zeroed one into the start of the linked list formed ~ through the next-source field. ~ ~ In physical terms, it works by allocating a new structure, copying the ~ fields of the existing one into it, and zeroing the existing one. That's ~ necessary because otherwise we'd need a mutable handle (a pointer to a ~ pointer) to update the start of the list, and there's no way to do that with ~ the main-input-buffer variable working the way it presently does. ~ ~ (input buffer metadata pointer --) : push-input-buffer allocate-input-buffer-metadata ~ (original metadata pointer, new metadata pointer) 2dup swap 6 8 * memcopy ~ (original metadata pointer, new metadata pointer) swap dup zero-input-buffer-metadata input-buffer-next-source ! ; ~ This does the inverse of push-input-buffer. In the event that the ~ next-source field is null, it zeroes the buffer. ~ ~ Note, however, that it doesn't deallocate the memory, because that's not ~ how memory allocation on the log works. If necessary, it can be deallocated ~ with "forget", though as usual that requires careful planning. ~ ~ (input buffer metadata pointer --) : pop-input-buffer dup input-buffer-next-source @ ~ (original metadata pointer, next source metadata pointer) dup { 6 8 * memcopy } { drop zero-input-buffer-metadata } if-else ; ~ TODO rename this to transformation-state : transform-state-saved-here ; : transform-state-saved-latest 8 + ; : transform-state-output-buffer-start 2 8 * + ; : allocate-transform-state 3 8 * allocate dup transform-state-saved-here 0 swap ! dup transform-state-saved-latest 0 swap ! dup transform-state-output-buffer-start 0 swap ! ; allocate-transform-state s" transform-state" variable ~ When calling the label facility during a transformation, it's necessary ~ to use the real, non-wrapped "here" and "latest". : swap-transform-variables here @ transform-state transform-state-saved-here @ here ! transform-state transform-state-saved-here ! latest @ transform-state transform-state-saved-latest @ latest ! transform-state transform-state-saved-latest ! ; ~ We deal with a few address spaces. There's the "host" address space, the ~ space this process performing the compilation is using for itself. There's ~ the "target" address space, the address space that will exist later, when ~ the program we've compiled is running. ~ ~ Then there's "offsets", which are relative to the start of the output ~ buffer. For clarity's sake, we always refer to these as offsets, rather than ~ as addresses. ~ ~ When we define labels for compiled words, we set their values to be ~ offsets pointing to the generated codeword. This is done by "Lcreate". We ~ then need to convert them either to the host or the target address space, ~ depending on how we're using them. ~ ~ There's no approach here that isn't confusing, but the hope is that by ~ using offsets, so that we always have to convert them regardless of what ~ we're doing with them, we won't miss a spot where conversion needs to ~ happen. ~ ~ (output offset -- target address) : offset-to-target-address-space ~ Don't transform null pointers. dup { swap-transform-variables L@' origin swap-transform-variables + } if ; ~ (target address -- output offset) : target-address-space-to-offset ~ Don't transform null pointers. dup { swap-transform-variables L@' origin swap-transform-variables - } if ; ~ (output offset -- host address) : offset-to-host-address-space ~ Don't transform null pointers dup { transform-state transform-state-output-buffer-start @ + } if ; ~ (host address --output offset) : host-address-space-to-offset ~ Don't transform null pointers dup { transform-state transform-state-output-buffer-start @ - } if ; ~ (host address inside the output buffer -- target address) : host-address-space-to-target host-address-space-to-offset offset-to-target-address-space ; ~ (target address -- host address) : target-address-space-to-host target-address-space-to-offset offset-to-host-address-space ; : describe-transformation ." active here " here @ .hex64 space ." latest " latest @ .hex64 newline ." saved here " transform-state transform-state-saved-here @ .hex64 space ." latest " transform-state transform-state-saved-latest @ .hex64 newline ." output start " transform-state transform-state-output-buffer-start @ .hex64 newline ; ~ Label transform implementation ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ The following code is all part of implementing the label transform. For ~ conceptual overview, see the top of this file. ~ This is the alternate version of "create" for use with the label ~ transform. Its code is the same as the regular "create" except as noted ~ below. It is likely to be extremely useful to read and understand "create" ~ in interpret.e before attempting to understand "Lcreate". : Lcreate dup stringlen 1 + dup 3unroll here @ 10 + 3unroll memmove here @ ~ This value of "latest" is going into the generated output, so we need ~ to map it to the target address space. It's stored in the host address ~ space to make immediate words work as expected, so the appropriate ~ conversion is host-address-space-to-target. latest @ host-address-space-to-target pack64 0 pack8 0 pack8 + 8 packalign here @ latest ! ~ Now we're immediately after the word header, which is where the codeword ~ will be. This is the value the label should taken on, so we set it. dup host-address-space-to-offset here @ 10 + swap-transform-variables intern-label set-label swap-transform-variables here ! ; ~ This is the alternate version of ":" for use with the label transform. Its ~ code is the same as the regular "create" except as noted below. It is likely ~ to be extremely useful to read and understand ":" in interpret.e before ~ attempting to understand "L:". : L: ~ This calls "Lcreate" instead of "create". word value@ Lcreate dropstring ~ This looks up "docol" by label. swap-transform-variables L@' docol-codeword-value L@' origin swap-transform-variables + , latest @ hide-entry ] ; ~ This is the alternate version of ";" for use with the label transform. Its ~ code is the same as the regular "create" except as noted below. It is likely ~ to be extremely useful to read and understand ";" in interpret.e before ~ attempting to understand "L;". : L; ~ This looks up "exit" by label. swap-transform-variables L@' exit swap-transform-variables offset-to-target-address-space , latest @ unhide-entry ~ Since [ is an immediate word, we have to go to extra trouble to compile ~ it as part of ;. [ ' [ entry-to-execution-token , ] ; make-immediate ~ This is the alternate version of ";asm" for use with the label transform. ~ Its code is the same as the regular "create" except as noted below. It is ~ likely to be extremely useful to read and understand ";asm" in interpret.e ~ before attempting to understand "L;asm". : L;asm here @ pack-next 8 packalign here ! latest @ dup unhide-entry entry-to-execution-token ~ The codeword needs to be transformed to the target address space. dup 8 + host-address-space-to-target swap ! ~ Since [ is an immediate word, we have to go to extra trouble to compile ~ it as part of ;asm. [ ' [ entry-to-execution-token , ] ; make-immediate ~ TODO there should really be an actual word that this alternate is replacing ~ ~ (string pointer --) : label-keyword-alternate Lcreate here @ dup ~ (self execution token, output point) dup 8 + pack64 swap :rax mov-reg64-imm64 ~ (output point) :rax push-reg64 pack-next 8 packalign here ! ; make-immediate ~ Because docol requires it, we provide a special mini-version of the label ~ system. We only do L@' and L!', because that's all we need. These are real ~ labels; there can be arbitrarily many of them, and they can have forward ~ references. ~ ~ The value that's accepted is in the host address space; the label is set ~ to an offset; and the value that's returned is in the target address space. ~ ~ (-- value) : label-L@'-alternate word value@ swap-transform-variables intern-label use-label swap-transform-variables dropstring-with-result offset-to-target-address-space ; make-immediate ~ (value --) : label-L!'-alternate host-address-space-to-offset word value@ swap-transform-variables intern-label swap-transform-variables dropstring-with-result swap-transform-variables set-label swap-transform-variables ; make-immediate ~ This implements the label transform for a single word. It is directly ~ analogous to "interpret", and reading interpret.e may help in understanding ~ it, though it's meant to still make sense on its own. ~ ~ It expects to be called from "label-transform", below, which loops. ~ ~ (-- done) : label-transform-one word ~ If no word was returned, exit. dup 0 = { drop 0 exit } if ~ The string is on the top of the stack, so to get a pointer to it we get ~ the stack address. ~ (string) value@ ~ If it's the magic word, end the transformation. dup s" pyrzqxgl" stringcmp 0 = { drop dropstring 1 exit } if ~ Check whether it's one of the words we have alternates for, and look up ~ the alternate if so. dup 0 swap ~ (name as stack string, name pointer, placeholder, name pointer) dup s" create" stringcmp 0 = { swap drop ' Lcreate swap } if dup s" :" stringcmp 0 = { swap drop ' L: swap } if dup s" ;" stringcmp 0 = { swap drop ' L; swap } if dup s" ;asm" stringcmp 0 = { swap drop ' L;asm swap } if dup s" L@'" stringcmp 0 = { swap drop ' label-L@'-alternate swap } if dup s" L!'" stringcmp 0 = { swap drop ' label-L!'-alternate swap } if dup s" keyword" stringcmp 0 = { swap drop ' label-keyword-alternate swap } if drop swap ~ (name as stack string, 0 or alternate entry pointer, name pointer) ~ If an alternate was found, the alternate will be used in immediate mode. ~ If not, we look up the word in the regular, non-transformed dictionary ~ and use that for immediate mode. over { dup transform-state transform-state-saved-latest @ swap find-in 3roll drop swap } unless ~ (name as stack string, immediate entry pointer, name pointer) ~ In regular "interpret", we would check whether we found the word before ~ checking the mode. However, we have three different places words could ~ come from, so that's not a simple notion. So, we check the mode first. interpreter-flags @ 0x01 & { ~ If we're in compile mode, there's still a chance it's an immediate ~ word. First check whether we have an immediate entry, then if so, check ~ that entry's flags. Notice that this means the generated code can't ~ override an immediate word with a non-immediate word of the same name. over dup { entry-flags@ 0x01 & not } { not } if-else { ~ Either there was no immediate entry, or the immediate entry wasn't ~ flagged as an immediate word. So we check whether this could be a ~ compilation. ~ ~ To do this, we need to look the word up in the output buffer. We ~ can't easily traverse the next-entry pointers in the output buffer's ~ dictionary, so we check the label. Since we don't know the word's name ~ statically, this is a rare scenario where we can't use the abbreviated ~ label syntax, but that's easy enough. ~ ~ Even though we've ruled out the possibility that the word is only ~ ever used immediately, it is still possible that there's some reason ~ the word doesn't exist. In particular, it could be an integer literal. ~ If we were to call use-label first, that would count as a requirement ~ that the label must eventually be set. We don't want to require that ~ quite yet, so we call find-label. ~ ~ This check is the means by which forward references are disallowed: ~ On the very first pass, a forward-referenced label won't exist yet, so ~ transform will give a "no such word" error, which in an ideal world ~ would prevent there from being a subsequent pass, but at the very ~ least it will ensure the output isn't a valid ELF. dup swap-transform-variables find-label swap-transform-variables { ~ It exists, so we declare our use of it (that's also the only way to ~ get a value for it). swap-transform-variables intern-label use-label swap-transform-variables ~ Labels point to codewords (because that's what "Lcreate" does), ~ which is already what we want to output. ~ ~ An important caveat: Though it would require something weird to be ~ happening, such as a forced forward reference, the label may be ~ zero! We need to allow for that possibility by not examining the ~ contents of a nonexistent entry. ~ ~ Fortunately we don't have to look at it, just append it to the log ~ and clean up. offset-to-target-address-space , drop dropstring 0 exit } if ~ If we got here, we're in compile mode, no label was found, and even ~ if there was a candidate for an immediate word it wasn't flagged as ~ immediate. There are two possibilities: It's genuinely missing, or it's ~ an integer literal. We decline to run the candidate immediate entry, ~ even if it exists, because that's not the correct semantics. ~ ~ If the word is genuinely missing, we want to make sure we make it ~ all the way to the not-found error-handling code at the end, because ~ that will be way easier to debug than doing the wrong thing will. Way, ~ way easier. Far less staring at numbers. ~ ~ Anyway, we no longer need the immediate entry pointer, so we drop ~ it. drop drop } { ~ If we get here, we're in compile mode, but there was a candidate ~ entry for an immediate word, and it was indeed flagged as immediate. ~ So, we run it and exit. drop dropstring-with-result entry-to-execution-token execute 0 exit } if-else ~ This is the end of the compile-mode branch. As you can see by tracing ~ through all the above cases, if we got here, the two possibilities are ~ that the word is genuinely missing, or it's an integer literal. ~ ~ Please notice that these are the same two possibilities remaining at ~ the end of the immediate-mode branch, below. } { ~ If we got here, we're in interpret mode. There are three ~ possibilities: there's an immediate word which we should run; it's an ~ integer literal; or the word is genuinely missing. ~ ~ If the immediate entry pointer is non-zero, run it and exit. over { drop dropstring-with-result entry-to-execution-token execute 0 exit } if ~ There was no immediate word, so either it's an integer literal or ~ the word is genuinely missing. Please notice that these are the same two ~ possibilities remaining at the end of the compile-mode branch, above. ~ ~ We no longer need the immediate-mode pointer, so drop it. drop drop } if-else ~ (name as stack string) ~ If we got here, one of two things is true: the word is an integer ~ literal, or it's genuinely missing. We know this because both the mode ~ cases above end with these as the only two remaining possibilities. So ~ ~ Check whether it's an integer literal. As before, we get the stack ~ address and use it as a string pointer. value@ read-integer 0 = { ~ It's a number. interpreter-flags @ 0x01 & { ~ We're in compile mode; append first "lit", then the number, to the ~ log. The version of "lit" we use is found by label, so it'll be the ~ one that exists when this code is ultimately run. dropstring-with-result ~ We look up "lit" as a label. swap-transform-variables L@' lit swap-transform-variables offset-to-target-address-space , , 0 exit } if ~ We're in interpret mode; push the number to the stack. Or at least, that's ~ what the code we're interpreting will see. Really it's already on the ~ stack, just clean everything else up and leave it there. dropstring-with-result 0 exit } if ~ If it's neither in the dictionary nor a number, just print an error. ~ ~ It's really important, when maintaining this code, to make sure that all ~ the possible ways the word can fail to exist, end up here. Doing anything ~ else is going to result in many hours of trying to untangle the ~ consequences of incorrect behavior, after-the-fact. s" No such word: " emitstring value@ emitstring newline dropstring 0 ; ~ This implements the label transform for all words in a region given as an ~ input string. It is directly analogous to "quit", in interpret.e, but is far ~ more complex. ~ ~ (output buffer start, output point, input string pointer ~ -- output buffer start, output point) : label-transform main-input-buffer dup push-input-buffer ~ TODO the arguments for this seem to be backwards from the documentation swap attach-string-to-input-buffer ~ Save the old values of "here" and "latest", and set the initial values ~ of the internal ones. These values need to persist across iterations, ~ since client code will make its own updates to them and then rely on those ~ updates having taken effect. So we do the swap just once, here outside the ~ loop, and set it back when the loop ends. here @ transform-state transform-state-saved-here ! latest @ transform-state transform-state-saved-latest ! over transform-state transform-state-output-buffer-start ! here ! 0 latest ! ~ Now the stack has nothing of ours on it, so client code can do its thing. ~ It's important that the stack has nothing of ours on it that persists ~ across iterations, so that client code can add and remove stuff there as ~ it sees fit. { label-transform-one ~ (done) ~ When the loop is done, get the real values of "here" and "latest" ~ back. The internal "here" is also the output point, and will become our ~ return value. The internal "latest" is discarded. { here @ transform-state transform-state-saved-here @ here ! transform-state transform-state-saved-latest @ latest ! ~ (output point) ~ Though we don't actually use transform-state outside of this ~ invocation, for tidiness we zero it out. 0 transform-state transform-state-saved-here ! 0 transform-state transform-state-saved-latest ! 0 transform-state transform-state-output-buffer-start ! ~ Also put the input source back how it was. main-input-buffer pop-input-buffer exit } if } forever ; ~ Log-load transform implementation ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ The following code is all part of implementing the log-load transform. ~ For conceptual overview, see the top of this file. ~ This is the alternate version of "create" for use with the log-load ~ transform. This one is quite unlike the regular "create"; rather than ~ creating an entry on the log directly, its job is to output words which, ~ when they're later executed, will do create's job. ~ ~ The implementations of log-load-find-execution-token and log-load-create ~ are in log-load.e. ~ ~ (string pointer --) : log-load-create-alternate swap-transform-variables ~ Looking these up in reverse order saves us some stack juggling. Does it ~ help readability, or hurt it? Who can say... L@' log-load-create L@' litstring swap-transform-variables offset-to-target-address-space , ~ litstring swap here @ swap packstring 8 packalign here ! offset-to-target-address-space , ~ log-load-create ; ~ This is the alternate version of ":" for use with the log-load transform. ~ Its code is the same as the regular ":" except as noted below. It is likely ~ to be extremely useful to read and understand ":" in interpret.e before ~ attempting to understand "log-load-colon-alternate". : log-load-colon-alternate ~ This calls "log-load-create" instead of "create". word value@ log-load-create-alternate dropstring ~ We generate code that looks up "docol" by name, runs it to get the ~ codeword pointer, then finally appends it to the entry. swap-transform-variables ~ As usual, we do these in reverse. L@' log-load-comma L@' execute L@' log-load-find-execution-token L@' litstring swap-transform-variables offset-to-target-address-space , ~ litstring here @ s" docol" packstring 8 packalign here ! offset-to-target-address-space , ~ log-load-find-execution-token offset-to-target-address-space , ~ execute offset-to-target-address-space , ~ log-load-comma ~ This is where we would mark the entry hidden, but we don't do that. It ~ won't shadow anything and it won't be called until the entire log-load ~ routine has finished. ~ Switching between immediate and compile mode is one of the very few ~ things that happens NOW, while the log-load transform is actually running. ] ; ~ This is the alternate version of ";" for use with the log-load transform. ~ Its code is the same as the regular ";" except as noted below. It is ~ likely to be extremely useful to read and understand ";" in interpret.e ~ before attempting to understand "log-load-semicolon-alternate". : log-load-semicolon-alternate ~ We generate code that looks up "exit" by name and appends it to the ~ entry. swap-transform-variables ~ As usual, we do these in reverse. L@' log-load-comma L@' log-load-find-execution-token L@' litstring swap-transform-variables offset-to-target-address-space , ~ litstring here @ s" exit" packstring 8 packalign here ! offset-to-target-address-space , ~ log-load-find-execution-token offset-to-target-address-space , ~ log-load-comma ~ This is where we would unhide the entry, but again, we don't do that. ~ Since [ is an immediate word, we have to go to extra trouble to compile ~ it as part of ;. [ ' [ entry-to-execution-token , ] ; make-immediate ~ This is the alternate version of ";asm" for use with the log-load ~ transform. Its code is the same as the regular "create" except as noted ~ below. It is likely to be extremely useful to read and understand ";asm" in ~ interpret.e before attempting to understand "log-load;asm". : log-load-semicolon-assembly-alternate ~ here @ pack-next 8 packalign here ! ~ latest @ dup unhide-entry entry-to-execution-token ~ ~ The codeword needs to be transformed to the target address space. ~ dup 8 + host-address-space-to-target ~ swap ! ~ ~ Since [ is an immediate word, we have to go to extra trouble to compile ~ ~ it as part of ;asm. ~ [ ' [ entry-to-execution-token , ] ; make-immediate ~ Because docol requires it, we provide a special mini-version of the label ~ system. We only do L@' and L!', because that's all we need. Unlike the ~ version of this feature for the label transform, for the log-load transform, ~ we heavily restrict the use-case. ~ ~ The implementation strategy is that we ignore the label name, and store ~ the value on the stack when the generated log-load routine runs. So, each ~ instance of L@' must be closely followed by a matching instance of L!'. Each ~ label can only ever be used exactly once, and it must be a backward ~ reference. Furthermore, there is a very tight restriction on what can be ~ on the stack. The easiest way to explain it is by showing the interface of ~ these words from the transformed code's perspective: ~ ~ L!' is (preserved value, value of label ~ -- value of label, preserved value) ~ L@' is (value of label, preserved value ~ -- preserved value, value of label) ~ ~ The preserved value is simply another item on the stack, which the label ~ takes pains not to interfere with. ~ ~ There is no adjustment done on the saved value, since it's created in the ~ target address space and then also used in the target address space. It ~ wouldn't actually be necessary to use this at all, since checking "here" ~ would be sufficient, but then the code would have to do something different ~ depending on which transform it's running under, and there'd have to be a ~ mechanism for that. ~ ~ If that sounds super complex: All we actually do is read a label name, ~ ignore it, and output a call to swap. ~ ~ This is sufficient to implement docol, and that's probably the only thing ~ it should be used for. : log-load-L@'-alternate word dropstring swap-transform-variables L@' swap swap-transform-variables offset-to-target-address-space , ~ swap ; make-immediate : log-load-L!'-alternate word dropstring swap-transform-variables L@' swap swap-transform-variables offset-to-target-address-space , ~ swap ; make-immediate ~ This implements the log-load transform for a single word. It is directly ~ analogous to "interpret", and reading interpret.e may help in understanding ~ it, though it's meant to still make sense on its own. ~ ~ It expects to be called from "log-load-transform", below, which loops. ~ ~ (-- done) : log-load-transform-one word ~ If no word was returned, exit. dup 0 = { drop 0 exit } if ~ The string is on the top of the stack, so to get a pointer to it we get ~ the stack address. ~ (string) value@ ~ If it's the magic word, end the transformation. dup s" pyrzqxgl" stringcmp 0 = { drop dropstring 1 exit } if ~ Check whether it's one of the words we have alternates for, and look up ~ the alternate if so. 0 swap ~ (name as stack string, placeholder, name pointer) dup s" create" stringcmp 0 = { swap drop ' log-load-create-alternate swap } if dup s" :" stringcmp 0 = { swap drop ' log-load-colon-alternate swap } if dup s" ;" stringcmp 0 = { swap drop ' log-load-semicolon-alternate swap } if dup s" ;asm" stringcmp 0 = { swap drop ' log-load-semicolon-assembly-alternate swap } if dup s" L@'" stringcmp 0 = { swap drop ' log-load-L@'-alternate swap } if dup s" L!'" stringcmp 0 = { swap drop ' log-load-L!'-alternate swap } if drop ~ (name as stack string, 0 or alternate entry pointer) ~ If we have an alternate, we want to run that now, regardless of what ~ mode we're in. They're all flagged as immediate, but we don't even bother ~ checking, because it doesn't fully describe their behavior anyway. With ~ this transform there's three potential times at which we might execute ~ things, not two. The alternates are more immediate than immediate; they ~ run NOW, during the transformation. dup { dropstring-with-result entry-to-execution-token execute 0 exit } if drop ~ (name as stack string) ~ Now we might have a compiled word, an immediate word, or an integer ~ literal. Recall that the word won't actually be looked up until the ~ routine we're producing is run - that's the whole point - so there's no ~ check we can perform now that will tell us whether the word we have exists ~ in the eventual log. Instead, we invert the usual fallback order and ~ check whether the word could be an integer literal. If it is, we'll ~ handle that; if not, we'll assume it'll eventually exist. ~ ~ This means that code that's run with the log-load transform can't ~ shadow an integer literal with a word definition. Oh, so limiting. value@ read-integer 0 = { ~ It's a number. dropstring-with-result interpreter-flags @ 0x01 & { ~ We're in compile mode, so we want to generate code which will compile ~ the number. swap-transform-variables ~ Just like in log-load-create-alternate, we do these in reverse. L@' log-load-comma L@' log-load-comma L@' litstring swap-transform-variables offset-to-target-address-space , ~ litstring here @ s" lit" packstring 8 packalign here ! offset-to-target-address-space , ~ log-load-comma swap , ~ the value offset-to-target-address-space , ~ log-load-comma 0 exit } if ~ We're in interpret mode, so we want to generate code which will push the ~ number to the stack. swap-transform-variables L@' lit swap-transform-variables offset-to-target-address-space , , 0 exit } if ~ (name as stack string) ~ We know it's a regular word, and we're assuming it will exist at ~ runtime. We of course have no way to check what flags it will have, which ~ means immediate words don't work with this transform. We still treat it ~ differently based on whether we're in compile mode. interpreter-flags @ 0x01 & { ~ We're in compile mode. We compile code that compiles the word. value@ swap-transform-variables ~ Just like in log-load-create-alternate, we do these in reverse. L@' log-load-comma L@' log-load-find-execution-token L@' litstring swap-transform-variables offset-to-target-address-space , ~ litstring 3roll here @ swap packstring 8 packalign here ! offset-to-target-address-space , ~ log-load-find-execution-token offset-to-target-address-space , ~ log-load-comma dropstring 0 exit } if ~ (name as stack string) ~ We're in immediate mode. We compile code that runs the word immediately. ~ We check whether there's a label for the word; if there is, we'll output ~ that. Otherwise we'll output code that looks it up in the log and runs it. ~ ~ Just like in label-transform, we use find-label to check whether a label ~ exists without declaring a dependency on it, then if it does, we do ~ use-label to ask for its value. ~ ~ There's one additional wrinkle to remember here: We're running inside ~ the label loop, and warm-start appears before all the normal words in the ~ executable. So all the labels we'll be checking are forwared references, ~ and on the very first pass they definitely won't be defined. That's fine ~ though, they will exist on all subsequent passes, so things will ~ definitely still converge. ~ ~ The first pass will never accidentally think it succeeded, because even ~ the reference to L' cold-start from the ELF header is a forward reference ~ and won't exist on the first pass. value@ swap-transform-variables find-label swap-transform-variables { ~ Again just like in label-transform, we declare our use of the label ~ and get a value for it. value@ swap-transform-variables intern-label use-label swap-transform-variables ~ Like in label-transform, this is a codeword pointer, so we just output ~ it directly. Also as before, because we don't have to examine it, we ~ don't have to do anything special in the case where it's zero due to the ~ way the label loop works. offset-to-target-address-space , dropstring 0 exit dropstring 0 exit } if ~ There's no label for the word; that means it wasn't statically ~ compiled-in to the target executable. So we output code that looks up the ~ word by name on the log, then calls it. value@ swap-transform-variables ~ This is reverse order again. L@' execute L@' log-load-find-execution-token L@' litstring swap-transform-variables offset-to-target-address-space , ~ litstring 3roll here @ swap packstring 8 packalign here ! offset-to-target-address-space , ~ log-load-find-execution-token offset-to-target-address-space , ~ execute ~ There's no such thing as not finding the word, with this transform. So ~ we just exit. dropstring 0 ; ~ This implements the log-load transform for all words in a region given as ~ an input string. It is directly analogous to "quit", in interpret.e, but is ~ far more complex. ~ ~ (output buffer start, output point, input string pointer ~ -- output buffer start, output point) : log-load-transform main-input-buffer dup push-input-buffer ~ TODO the arguments for this seem to be backwards from the documentation swap attach-string-to-input-buffer ~ Save the old values of "here" and "latest", and set the initial values ~ of the internal ones. These values need to persist across iterations, ~ since client code will make its own updates to them and then rely on those ~ updates having taken effect. So we do the swap just once, here outside the ~ loop, and set it back when the loop ends. here @ transform-state transform-state-saved-here ! latest @ transform-state transform-state-saved-latest ! over transform-state transform-state-output-buffer-start ! here ! 0 latest ! ~ Now the stack has nothing of ours on it, so client code can do its thing. ~ It's important that the stack has nothing of ours on it that persists ~ across iterations, so that client code can add and remove stuff there as ~ it sees fit. { log-load-transform-one ~ (done) ~ When the loop is done, get the real values of "here" and "latest" ~ back. The internal "here" is also the output point, and will become our ~ return value. The internal "latest" is discarded. { here @ transform-state transform-state-saved-here @ here ! transform-state transform-state-saved-latest @ latest ! ~ (output point) ~ Though we don't actually use transform-state outside of this ~ invocation, for tidiness we zero it out. 0 transform-state transform-state-saved-here ! 0 transform-state transform-state-saved-latest ! 0 transform-state transform-state-output-buffer-start ! ~ Also put the input source back how it was. main-input-buffer pop-input-buffer exit } if } forever ;