~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~~ Bootstrapping the log ~~ ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ The log is the main region of memory within which most dynamic allocation ~ happens. It's a single contiguous segment of virtual memory, which is ~ requested from the kernel when Evocation starts up. Almost all of ~ Evocation's dynamic data is kept in the log, including the main dictionary; ~ several important global variables which make it possible to find and ~ allocate other data structures; and the control stack. ~ ~ This file has the task of providing words which are useful for working ~ with the log, and more specifically which are useful for helping to bring ~ the log into existence. Once the log exists, it can be used to manage ~ itself, but there's a bootstrapping challenge in getting there. That ~ challenge is solved by the warm-start routine in execution.e, which relies ~ on the words in this file and should load after it. ~ ~ Some modern Forths, including Jonesforth, refer to the log as the heap. ~ This is a misnomer; a heap is a data structure that allows non-contiguous ~ allocation. Although there are Forths that have true heaps, Evocation is not ~ one of them. Space in the log is allocated by incrementing the "here" ~ variable (one of those important globals), which necessarily can only ~ allocate contiguous blocks; there is no way to compact allocations to ~ reclaim fragmented, unused space in between them. Evocation does allow ~ deallocation using "forget", but this is done by resetting "here" and ~ "latest" to older values, unwinding every allocation that's been done since ~ the point in time they return to. ~ ~ It would be a mistake to confuse this allocation strategy with the ~ more-general facilities for allocation, reallocation, and deallocation of ~ individual memory blocks that many other languages have. To avoid confusion, ~ we stay away from the name "heap", though it may still occasionally be used ~ colloquially because it's familiar from other Forths, and because most ~ programming languages have a heap as the main memory segment they request ~ from the kernel. ~ ~ In the strictest technical sense, the log is a stack: Things are added ~ to the end of it, and removed from that same end. However, Evocation already ~ has two other stacks, the control and value stacks. Adding to the potential ~ confusion, the control stack is actually stored inside the log (as a ~ fixed-size chunk at the bottom). However, the log isn't really that much ~ like a stack when you look at how it's actually used. Unlike Evocation's ~ control and value stacks, data structures on the log tend to be rich and ~ complex, interlinked in various ways through the use of pointers. They also ~ tend to be long-lived, with the log tending to grow over time, whereas the ~ control and value stacks tend to remain roughly the same size through cycles ~ of growth and shrinking. In order to be able to speak precisely about what ~ we're doing, we introduce the name "log" to refer to the entire memory ~ segment and everything stored within it. ~ ~ Another linguistic choice we make is to be clear about dictionaries. A ~ dictionary is a linked list of word entries. Each dictionary has a specific ~ handle, a pointer to a pointer, which is the root of the list. Each ~ word entry begins with a specific data structure, which among other things ~ includes a next-entry pointer, a flags byte, and a string that serves as ~ the entry's name. Older entries in a dictionary seldom change; newer entries ~ are added at the beginning of it, with their next-entry pointers leading to ~ the older entries. It is possible for several dictionaries to exist at once, ~ each with its own dictionary handle. ~ ~ Since dictionaries are managed using pointers to individual entries, there ~ is no specific requirement about the order in which those entries occur in ~ memory or where they are allocated, but usually a new entry is allocated at ~ the end of the log, by incrementing the variable "here", in the same manner ~ as any other allocation. There is one particular dictionary, the main ~ dictionary, whose handle is the variable "latest". The main dictionary holds ~ every executable word that can be used normally via Evocation's interpreter. ~ ~ Since the main dictionary is by far the most important thing in the log, ~ it can be tempting to conflate the log with the main dictionary. This is ~ accurate enough for some purposes, but note that other dictionaries are ~ often interleaved with it, their allocations entwining like grape vines even ~ while each remains separate, reachable only via its own root. See the ~ machine label facility, in labels.e, for an example of how a secondary ~ dictionary can be useful. ~ ~ This may feel tangential, but it's important background and there's no ~ better place to explain it: A handle is a pointer to a pointer. The variable ~ "latest" returns a handle, a fixed address which always holds the pointer to ~ the root entry of the main dictionary. Dereferencing that handle gives you ~ the dictionary pointer, the address of the root entry, which is suitable to ~ pass to find-in and similar words that read the dictionary's contents. When ~ you want to add a new entry to a dictionary, you need the dictionary's ~ handle, so that the root pointer can be changed. When you only want to write ~ it, you only need the regular single pointer. ~ ~ When reading the documentation of words that work with dictionaries, pay ~ close attention to whether their parameters include a dictionary handle, or ~ a dictionary pointer. ~ ~ The term "handle" was widely known in the early days of microcomputing, ~ when memory-safe languages without direct pointer access were less common. ~ Today it is usually considered specific to systems programming, the type of ~ programming which lies beneath other software and deals with topics such as ~ memory management and processes. Evocation is a systems-programming ~ language, in the sense that it takes pains to not introduce mandatory ~ abstractions which would make it difficult or inefficient to work directly ~ with these topics. So, in understanding Evocation, it's important to know ~ about handles. ~ ~ Some of these bootstrap words rely on being able to invoke assembler words ~ that output machine code. Therefore, those words must be available at ~ runtime. Since nothing can be dynamically available at runtime until after ~ we've already run the log-load routine, which relies on the stuff in this ~ file, the assembler words must be statically available via the label ~ transform. That means their definitions in arm64.e must be loaded before ~ this file. ~ This has the same value as the constant control-stack-size, which is ~ defined in execution.e. Everything will break if it doesn't. ~ ~ TODO: remove one of them. Probably the other one. : log-offset 0x10000 ; ~ 64 KiB ~ (log address -- log address, "log" pointer) : log-load-log dup log-offset + ; ~ (log address -- log address, "s0" pointer) : log-load-s0 dup log-offset + 8 + ; ~ (log address -- log address, "r0" pointer) : log-load-r0 dup log-offset + 2 8 * + ; ~ (log address -- log address, "latest" pointer) : log-load-latest dup log-offset + 3 8 * + ; ~ (log address -- log address, "here" pointer) : log-load-here dup log-offset + 4 8 * + ; ~ This is a helper used by warm-start, which invokes find-in using "latest". ~ It relies on being passed the root address of the log, which is used to find ~ the global variable "latest". It's inconvenient to keep a log pointer around ~ all the time, which is why we stop doing it as soon as possible, but during ~ Evocation's startup there's no alternative. This word is used extensively ~ by code that's been compiled via the log-load transform; see transform.e for ~ details. ~ ~ It would be possible to unload this word after the log is created, but ~ there are rare situations in which it's still useful, such as injecting ~ Evocation into another process's address space. Plus, it's small. So, we ~ keep it around. ~ ~ (log address, string pointer -- log address, entry pointer or 0) : log-load-find swap log-load-latest @ swap 3unroll swap find-in ; ~ In the code generated by the log-load transform, it's convenient to have ~ only a single step needed to look up a word's execution token. This helper ~ does log-load-find, then gets the execution token if an entry is found. ~ ~ (log address, string pointer -- log address, execution token or 0) : log-load-find-execution-token dup 3unroll log-load-find dup { 3roll drop entry-to-execution-token } { drop swap ." No such word: " emitstring newline 0 } if-else ; ~ This is the same as "create", from dynamic.e, except that it takes the ~ log's address as a parameter rather than hardcoding it, so that it can be ~ used in situations where the normal compilation process isn't yet available. ~ ~ The requisite stack juggling is kind of finicky, sorry if it's hard to ~ read, but it's doing the same steps in the same order as the regular ~ "create". ~ ~ (log address, string pointer -- log address) : log-load-create dup stringlen 1 + dup 3unroll ~ (log address, name field length, string pointer, name field length) 3 pick log-load-here swap drop @ 10 + 3unroll memmove ~ (log address, name field length) over log-load-here swap drop @ ~ (log address, name field length, output point) 2 pick log-load-latest swap drop @ pack64 ~ (log address, name field length, output point) 0 pack8 0 pack8 + ~ (log address, output point) 8 packalign ~ (log address, output point) over log-load-here swap drop @ ~ (log address, output point, old here value) 2 pick log-load-latest swap drop ! ~ (log address, output point) over log-load-here swap drop ! ; ~ This is the same as ",", from dynamic.e, except that it takes the log's ~ address as a parameter rather than hardcoding it, so that it can be used in ~ situations where the normal compilation process isn't yet available. ~ ~ Again, the stack juggling is kind of a lot, sorry about that. ~ ~ (log address, value -- log address) : log-load-comma swap log-load-here swap 3unroll ~ (log address, value, here) @ swap pack64 ~ (log address, updated here value) swap log-load-here swap 3unroll ~ (log address, updated here value, here) ! ; ~ This is the same as `;asm`, from dynamic.e, except that it takes the ~ log's address as a parameter rather than hardcoding it, so that it can be ~ used in situations where the normal compilation process isn't yet available. ~ ~ Its two main responsibilities are to call `pack-next`, from ~ execution-support.e, and to overwrite the codeword. It also deals with ~ alignment. ~ (log address) : log-load-semicolon-assembly log-load-here @ ~ (log address, output point) pack-next 8 packalign ~ (log address, output point) swap log-load-here swap 3unroll ! ~ (log address) log-load-latest @ ~ (log address, entry pointer) entry-to-execution-token dup 8 + swap ! ; ~ This is the same as "variable", from dynamic.e, except that it takes the ~ log's address as a parameter rather than hardcoding it, so that it can be ~ used in situations where the normal compilation process isn't yet available. ~ ~ (log address, address for new variable word, string pointer -- log address) : log-load-variable 3roll swap log-load-create ~ (address for new variable word, log address) log-load-here swap 3unroll ~ (log address, address for new variable word, here) dup @ ~ (log address, address for new variable word, here, output point) dup 8 + pack64 3roll :rax mov-reg64-imm64 ~ (log address, here, output point) :rax push-reg64 pack-next 8 packalign swap ! ; ~ A keyword is a word that evaluates to its own address, which makes it ~ suitable for use as a constant. See more detail on that in dynamic.e, ~ where "keyword" is defined. ~ ~ Unlike Common Lisp, the lexer doesn't create keywords for us, we have to ~ do it explicitly. If if that were to someday change, the log-load routine ~ would still need a way to do it, which is this. ~ ~ It's kind of a pain to look up the appropriate "docol" from here, so we ~ do it in assembler instead. ~ ~ (log address, string pointer -- log address) : log-load-keyword log-load-create ~ (log address) log-load-here @ dup ~ (log address, self execution token, output point) dup 8 + pack64 ~ (log address, self execution token, output point) swap :rax mov-reg64-imm64 ~ (log address, output point) :rax push-reg64 pack-next 8 packalign ~ (log address, output point) swap log-load-here ~ (output point, log address, here) swap 3unroll ~ (log address, output point, here) ! ; ~ This is a helper used by log-load-string-alternate. It does the usual ~ string packing thing, but at one layer of indirection more than usual. Its ~ responsibility includes alignment, unlike packstring. ~ ~ (log address, string pointer -- log address) : log-load-comma-string swap log-load-here @ 3roll ~ (log address, output point, string pointer) packstring 8 packalign ~ (log address, output point) swap log-load-here 3roll swap ~ (log address, output point, here) ! ; ~ Now we have a bunch of words that are the back-ends for the log-load ~ transform's high-level flow control alternates. These implementations ~ closely parallel the non-transformed versions in flow-control.e, which ~ should be referenced in understanding them. ~ ~ These variants are a bit unusual in their interfaces: They end with the ~ log address at the top of the stack, even when they have values to return. ~ That's because they're really just "talking" to each other; they don't need ~ to interact with anything else, and doing it this way saves the alternates ~ the work of swapping things around after. ~ ~ Notice also that, because these run entirely at log-load time, they are ~ always dealing with target pointers and don't have to convert address ~ spaces. ~ ~ (log address -- start pointer, log address) : log-load-left-curly-brace log-load-here @ swap ; ~ (start pointer, log address -- start pointer, length, log address) : log-load-right-curly-brace swap dup 3roll ~ (start pointer, start pointer, log address) log-load-here @ swap ~ (start pointer, start pointer, end pointer, log address) 3unroll swap - swap ; ~ (start, length, log address -- log address) : log-load-if 3unroll ~ (log address, start, length) 2dup swap dup ~ (log address, start, length, length, start, start) 5 8 * + ~ (log address, start, length, length, start, adjusted start) 3unroll swap ~ (log address, start, length, adjusted start, start, length) memmove ~ (log address, start, length) swap 3roll log-load-here dup @ ~ (length, start, log address, here pointer, old here) swap 4 roll swap ~ (length, log address, old here, start, here pointer) ! ~ (length, log address, old here) 3unroll ~ (old here, length, log address) s" lit" log-load-find entry-to-execution-token log-load-comma 0 log-load-comma s" !=" log-load-find entry-to-execution-token log-load-comma s" 0branch" log-load-find entry-to-execution-token log-load-comma ~ (old here, length, log address) swap dup 3unroll ~ (old here, length, log address, length) 8 + log-load-comma ~ (old here, length, log-address) 3unroll ~ (log address, old here, length) drop 5 8 * + ~ (log address, new here) swap log-load-here ~ (new here, log address, here pointer) 3roll swap ! ; ~ (start, length, log address -- log address) : log-load-unless 3unroll ~ (log address, start, length) 2dup swap dup ~ (log address, start, length, length, start, start) 5 8 * + ~ (log address, start, length, length, start, adjusted start) 3unroll swap ~ (log address, start, length, adjusted start, start, length) memmove ~ (log address, start, length) swap 3roll log-load-here dup @ ~ (length, start, log address, here pointer, old here) swap 4 roll swap ~ (length, log address, old here, start, here pointer) ! ~ (length, log address, old here) 3unroll ~ (old here, length, log address) s" lit" log-load-find entry-to-execution-token log-load-comma 0 log-load-comma s" =" log-load-find entry-to-execution-token log-load-comma s" 0branch" log-load-find entry-to-execution-token log-load-comma ~ (old here, length, log address) swap dup 3unroll ~ (old here, length, log address, length) 8 + log-load-comma ~ (old here, length, log-address) 3unroll ~ (log address, old here, length) drop 5 8 * + ~ (log address, new here) swap log-load-here ~ (new here, log address, here pointer) 3roll swap ! ; ~ (true start, true length, false start, false length, log address ~ -- log address) : log-load-if-else 5 unroll 2dup ~ (log address, true start, true length, false start, false length, ~ false start, false length) swap dup 7 8 * + swap 3roll ~ (log address, true start, true length, false start, false length, ~ adjusted false start, false start, false length) memmove ~ (log address, true start, true length, false start, false length) 4 roll dup 5 unroll ~ (log address, true start, true length, false start, false length, ~ true start) 4 roll dup 5 unroll ~ (log address, true start, true length, false start, false length, ~ true start, true length) swap dup 5 8 * + ~ (log address, true start, true length, false start, false length, ~ true length, true start, adjusted true start) swap 3roll ~ (log address, true start, true length, false start, false length, ~ adjusted true start, true start, true length) memmove ~ (log address, true start, true length, false start, false length) 4 roll dup 5 unroll ~ (log address, true start, true length, false start, false length, ~ true start) 6 roll log-load-here @ 7 unroll ~ (old here, true start, true length, false start, false length, true start, ~ log address) log-load-here 3roll swap ! ~ (old here, true start, true length, false start, false length, ~ log address) s" lit" log-load-find entry-to-execution-token log-load-comma 0 log-load-comma s" !=" log-load-find entry-to-execution-token log-load-comma s" 0branch" log-load-find entry-to-execution-token log-load-comma ~ (old here, true start, true length, false start, false length, ~ log address) 4 roll dup 5 unroll ~ (old here, true start, true length, false start, false length, ~ log address, true length) 3 8 * + log-load-comma ~ (old here, true start, true length, false start, false length, ~ log address) 3unroll ~ (old here, true start, true length, log address, ~ false start, false length) swap dup 3unroll ~ (old here, true start, true length, log address, ~ false start, false length, false start) 5 8 * + ~ (old here, true start, true length, log address, ~ false start, false length, adjusted false start) 4 roll log-load-here 3roll swap ! ~ (old here, true start, true length, ~ false start, false length, log address) s" branch" log-load-find entry-to-execution-token log-load-comma swap 8 + log-load-comma ~ (old here, true start, true length, false start, log address) 4 unroll ~ (old here, log address, true start, true length, false start) drop drop drop ~ (old here, log address) log-load-here 3roll 7 8 * + swap ! ; ~ (start, length, log address -- log address) : log-load-forever s" branch" log-load-find entry-to-execution-token log-load-comma swap 8 + -1 * log-load-comma swap drop ; ~ (test start, test length, body start, body length, log address ~ -- log address) : log-load-while 5 unroll 2dup ~ (log address, test start, test length, body start, body length, ~ body start, body length) swap dup 5 8 * + swap 3roll ~ (log address, test start, test length, body start, body length, ~ adjusted body start, body start, body length) memmove ~ (log address, test start, test length, body start, body length) 5 roll log-load-here @ 6 unroll ~ (old here, test start, test length, body start, body length, log address) log-load-here 4 roll dup 5 unroll swap ! ~ (old here, test start, test length, body start, body length, log address) s" lit" log-load-find entry-to-execution-token log-load-comma 0 log-load-comma s" !=" log-load-find entry-to-execution-token log-load-comma s" 0branch" log-load-find entry-to-execution-token log-load-comma swap dup 3unroll 3 8 * + log-load-comma ~ (old here, test start, test length, body start, body length, log address) log-load-here 5 8 * 8 roll + swap ! ~ (test start, test length, body start, body length, log address) s" branch" log-load-find entry-to-execution-token log-load-comma 5 unroll ~ (log address, test start, test length, body start, body length) 6 8 * + swap drop + swap drop -1 * log-load-comma ;