~ (buffer size -- buffer address) : read-to-buffer dup allocate dup dup ~ (buffer size, buffer address, word start, output point) { key ~ Exit if it's a zero byte. dup not { ~ Make sure to pack the zero to serve as a null terminator. pack8 drop drop swap drop exit } if dup is-space { ~ (buffer size, buffer address, word start, output point, key) ~ Tuck the key out of the way until we've done some stuff. 3unroll ~ If it's a space character, first check if we just consumed the magic ~ word... 2dup swap - 8 = dup { drop ~ Add a null terminator so we can use stringcmp dup 0 swap ! ~ Check for the magic word over s" pyrzqxgl" stringcmp 0 = } if { ~ It's magic, so exit. ~ Make sure to pack a zero to serve as a null terminator. 0 pack8 drop drop drop swap drop exit } { ~ It's not magic, so reset the word start. Of course whitespace is ~ not a word but this will help us keep track of things. 3roll pack8 swap drop dup } if-else } { ~ (buffer size, buffer address, word start, output point, key) ~ Tuck the key out of the way again. 3unroll ~ Check if the word just started and the previous character is space. 2dup = dup { drop dup @ is-space } if { ~ If so, this is the actual first character of the word. drop swap pack8 dup } { ~ If not, leave the word start alone. 3roll pack8 } if-else } if-else } forever ; ~ In logical terms, this modifies an input buffer metadata structure ~ in-place to push a new, zeroed one into the start of the linked list formed ~ through the next-source field. ~ ~ In physical terms, it works by allocating a new structure, copying the ~ fields of the existing one into it, and zeroing the existing one. That's ~ necessary because otherwise we'd need a mutable handle (a pointer to a ~ pointer) to update the start of the list, and there's no way to do that with ~ the main-input-buffer variable working the way it presently does. ~ ~ (input buffer metadata pointer --) : push-input-buffer allocate-input-buffer-metadata ~ (original metadata pointer, new metadata pointer) 2dup swap 6 8 * memcopy ~ (original metadata pointer, new metadata pointer) swap dup zero-input-buffer-metadata input-buffer-next-source ! ; ~ This does the inverse of push-input-buffer. In the event that the ~ next-source field is null, it zeroes the buffer. ~ ~ Note, however, that it doesn't deallocate the memory, because that's not ~ how memory allocation on the log works. If necessary, it can be deallocated ~ with "forget", though as usual that requires careful planning. ~ ~ (input buffer metadata pointer --) : pop-input-buffer dup input-buffer-next-source @ ~ (original metadata pointer, next source metadata pointer) dup { 6 8 * memcopy } { drop zero-input-buffer-metadata } if-else ; : L: ' L' entry-to-execution-token execute { ' set-label entry-to-execution-token , } { set-label } if-else ; make-immediate ~ TODO probably needs to do more : transform-state-saved-here ; : transform-state-saved-latest 8 + ; : allocate-transform-state 2 8 * allocate dup transform-state-saved-here 0 swap ! dup transform-state-saved-latest 0 swap ! ; allocate-transform-state s" transform-state" variable ~ The label transform operates on code that compiles itself, and ensures ~ that the result of the compilation is suitable to be included in an ~ executable binary. To achieve this, it makes several changes to the ~ semantics of that code. The transform relies on the label facility, and ~ expects to run from within label-loop. ~ ~ The most fundamental change is that the label transform separates words ~ that run in compile mode from words that run immediately. There is no ~ distinction made between words running in immediate mode, and words declared ~ as immediate. Immediate words are looked up and executed based on their ~ "real", currently-executing definitions. Compiled words, including ~ literals, are looked up via the label facility. ~ ~ Since the label facility is able to resolve forward references, there is ~ no hard requirement that words be topologically sorted, but forward ~ references should still be kept to a minimum, since that's a significant ~ difference from un-transformed code that could easily become confusing. ~ ~ Compilation words do make extensive reference to the global variables ~ "here" and "latest". In particular, flow-control words such as if-else ~ expect the log to have recent compilation outputs on it, and to be able to ~ mutate them in-place. In order to make this work, we provide temporary ~ values of these two variables which point to the location of the output ~ buffer. This allows pointer resolution to work correctly without additional ~ effort, but notice that the buffer's address will differ from the address ~ the resulting program loads itself at. There's no simple way to avoid this ~ concern, since the variables must point to one of those addresses or the ~ other, not both. ~ ~ We resolve the issue by running our own, alternate versions of the words ~ "create", ":", ";", and ";asm" which use the label facility to compute the ~ addresses that will be needed at runtime. These alternates run instead of ~ the normal versions of these words. The code being compiled is responsible ~ for not doing anything else that would rely on "here" and "latest" matching ~ their runtime addresses, though it is otherwise allowed to modify and rely ~ on them in all the usual ways. The alternate versions are defined in this ~ file as their own words, "Lcreate", "L:", "L;", and "L;asm". ~ ~ (-- done) : transform-one word ~ If no word was returned, exit. dup 0 = { drop 0 exit } if ~ The string is on the top of the stack, so to get a pointer to it we get ~ the stack address. ~ (string) value@ ~ If it's the magic word, end the transformation. dup s" pyrzqxgl" stringcmp 0 = { drop dropstring 1 exit } if transform-state transform-state-saved-latest @ swap find-in ~ Check whether the word was found in the dictionary. dup 0 != { ~ If the word is in the dictionary, check what mode we're in, then... dropstring-with-result ~ (entry pointer) interpreter-flags @ 0x01 & { ~ ... if we're in compile mode, there's still a chance it's an immediate ~ word, in which case we fall through to interpret mode... dup entry-flags@ 1 & 0 = ~ ... but it's a regular word, so append it to the heap. ~ TODO why is ; being treated as a regular word { entry-to-execution-token , 0 exit } if } if ~ ... if we're in interpret mode, or the word is immediate, run it. entry-to-execution-token execute 0 exit } if ~ If it's not in the dictionary, check whether it's a decimal number. drop ~ As before, we get the stack address and use it as a string pointer. ~ (string) value@ read-integer 0 = { ~ It's a number. interpreter-flags @ 0x01 & { ~ We're in compile mode; append first "lit", then the number, to the ~ heap. The version of "lit" we use is the one that's current when we ~ ourselves are compiled, hardcoded; doing a dynamic lookup would ~ require dealing with what happens if it's not found. ~ TODO this is wrong dropstring-with-result [ ' lit entry-to-execution-token literal ] , , 0 exit } if ~ We're in interpret mode; push the number to the stack. Or at least, that's ~ what the code we're interpreting will see. Really it's already on the ~ stack, just clean everything else up and leave it there. dropstring-with-result 0 exit } if ~ If it's neither in the dictionary nor a number, just print an error. s" No such word: " emitstring value@ emitstring dropstring 0 ; ~ ." input " main-input-buffer dup .hex64 newline dup hexdump @ dup .hex64 newline bye ~ 1024 read-to-buffer ~ foo bar baz biff ~ pyrzqxgl ~ stackhex dup hexdump emitstring bye ~ ~ : breakza ~ ." original" newline ~ main-input-buffer dup 6 8 * hexdump-from ~ dup push-input-buffer ~ ." updated original" newline ~ dup 6 8 * hexdump-from ~ ." copy" newline ~ dup input-buffer-next-source @ 6 8 * hexdump-from ~ newline ~ stackhex ~ dup pop-input-buffer ~ ." copied back" newline ~ 6 8 * hexdump-from ~ stackhex ~ bye ; ~ (output point, input string pointer -- output point) : transform main-input-buffer dup push-input-buffer ~ TODO the arguments for this seem to be backwards from the documentation swap attach-string-to-input-buffer ~ Save the old values of "here" and "latest", and set the initial values ~ of the internal ones. These values need to persist across iterations, ~ since client code will make its own updates to them and then rely on those ~ updates having taken effect. So we do the swap just once, here outside the ~ loop, and set it back when the loop ends. here @ transform-state transform-state-saved-here ! latest @ transform-state transform-state-saved-latest ! here ! 0 latest ! ~ Now the stack has nothing of ours on it, so client code can do its thing. ~ It's important that the stack has nothing of ours on it that persists ~ across iterations, so that client code can add and remove stuff there as ~ it sees fit. { transform-one ~ (done) ~ When the loop is done, get the real values of "here" and "latest" ~ back. The internal "here" is also the output point, and will become our ~ return value. The internal "latest" is discarded. { here @ transform-state transform-state-saved-here @ here ! transform-state transform-state-saved-latest @ latest ! ~ (output point) ~ Though we don't actually use transform-state outside of this ~ invocation, for tidiness we zero it out. 0 transform-state transform-state-saved-here ! 0 transform-state transform-state-saved-latest ! ~ Also put the input source back how it was. main-input-buffer pop-input-buffer exit } if } forever ; ~ 1024 allocate dup ~ ." compilation output buffer" newline dup hexdump ~ transform ~ : za ." ZA" 12 13 - . ; ~ : ' word value@ find dropstring-with-result ~ interpreter-flags @ 1 & { literal } if ; make-immediate ~ ~ ' za . newline ~ pyrzqxgl ~ ." back back back " here @ .hex64 newline ~ ~ ." stack after " stackhex ~ ~ 2dup swap hexdump-between ~ ~ : piz ." PIZ" ' za . newline ; piz ~ bye