~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~~ Bootstrapping the log ~~ ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ The log is the main region of memory within which most dynamic allocation ~ happens. It's a single contiguous segment of virtual memory, which is ~ requested from the kernel when Evocation starts up. Almost all of ~ Evocation's dynamic data is kept in the log, including the main dictionary; ~ several important global variables which make it possible to find and ~ allocate other data structures; and the control stack. ~ ~ This file has the task of providing words which are useful for working ~ with the log, and more specifically which are useful for helping to bring ~ the log into existence. Once the log exists, it can be used to manage ~ itself, but there's a bootstrapping challenge in getting there. That ~ challenge is solved by the warm-start routine in execution.e, which relies ~ on the words in this file and should load after it. ~ ~ Some modern Forths, including Jonesforth, refer to the log as the heap. ~ This is a misnomer; a heap is a data structure that allows non-contiguous ~ allocation. Although there are Forths that have true heaps, Evocation is not ~ one of them. Space in the log is allocated by incrementing the "here" ~ variable (one of those important globals), which necessarily can only ~ allocate contiguous blocks; there is no way to compact allocations to ~ reclaim fragmented, unused space in between them. Evocation does allow ~ deallocation using "forget", but this is done by resetting "here" and ~ "latest" to older values, unwinding every allocation that's been done since ~ the point in time they return to. ~ ~ It would be a mistake to confuse this allocation strategy with the ~ more-general facilities for allocation, reallocation, and deallocation of ~ individual memory blocks that many other languages have. To avoid confusion, ~ we stay away from the name "heap", though it may still occasionally be used ~ colloquially because it's familiar from other Forths, and because most ~ programming languages have a heap as the main memory segment they request ~ from the kernel. ~ ~ In the strictest technical sense, the log is a stack: Things are added ~ to the end of it, and removed from that same end. However, Evocation already ~ has two other stacks, the control and value stacks. Adding to the potential ~ confusion, the control stack is actually stored inside the log (as a ~ fixed-size chunk at the bottom). However, the log isn't really that much ~ like a stack when you look at how it's actually used. Unlike Evocation's ~ control and value stacks, data structures on the log tend to be rich and ~ complex, interlinked in various ways through the use of pointers. They also ~ tend to be long-lived, with the log tending to grow over time, whereas the ~ control and value stacks tend to remain roughly the same size through cycles ~ of growth and shrinking. In order to be able to speak precisely about what ~ we're doing, we introduce the name "log" to refer to the entire memory ~ segment and everything stored within it. ~ ~ Another linguistic choice we make is to be clear about dictionaries. A ~ dictionary is a linked list of word entries. Each dictionary has a specific ~ handle, a pointer to a pointer, which is the root of the list. Each ~ word entry begins with a specific data structure, which among other things ~ includes a next-entry pointer, a flags byte, and a string that serves as ~ the entry's name. Older entries in a dictionary seldom change; newer entries ~ are added at the beginning of it, with their next-entry pointers leading to ~ the older entries. It is possible for several dictionaries to exist at once, ~ each with its own dictionary handle. ~ ~ Since dictionaries are managed using pointers to individual entries, there ~ is no specific requirement about the order in which those entries occur in ~ memory or where they are allocated, but usually a new entry is allocated at ~ the end of the log, by incrementing the variable "here", in the same manner ~ as any other allocation. There is one particular dictionary, the main ~ dictionary, whose handle is the variable "latest". The main dictionary holds ~ every executable word that can be used normally via Evocation's interpreter. ~ ~ Since the main dictionary is by far the most important thing in the log, ~ it can be tempting to conflate the log with the main dictionary. This is ~ accurate enough for some purposes, but note that other dictionaries are ~ often interleaved with it, their allocations entwining like grape vines even ~ while each remains separate, reachable only via its own root. See the ~ machine label facility, in labels.e, for an example of how a secondary ~ dictionary can be useful. ~ ~ This may feel tangential, but it's important background and there's no ~ better place to explain it: A handle is a pointer to a pointer. The variable ~ "latest" returns a handle, a fixed address which always holds the pointer to ~ the root entry of the main dictionary. Dereferencing that handle gives you ~ the dictionary pointer, the address of the root entry, which is suitable to ~ pass to find-in and similar words that read the dictionary's contents. When ~ you want to add a new entry to a dictionary, you need the dictionary's ~ handle, so that the root pointer can be changed. When you only want to write ~ it, you only need the regular single pointer. ~ ~ When reading the documentation of words that work with dictionaries, pay ~ close attention to whether their parameters include a dictionary handle, or ~ a dictionary pointer. ~ ~ The term "handle" was widely known in the early days of microcomputing, ~ when memory-safe languages without direct pointer access were less common. ~ Today it is usually considered specific to systems programming, the type of ~ programming which lies beneath other software and deals with topics such as ~ memory management and processes. Evocation is a systems-programming ~ language, in the sense that it takes pains to not introduce mandatory ~ abstractions which would make it difficult or inefficient to work directly ~ with these topics. So, in understanding Evocation, it's important to know ~ about handles. ~ Find-in is the main word that provides the capability to look up words by ~ name, though it's usually used via "find" rather than being called directly. ~ ~ Find-in traverses the linked list formed by a particular dictionary's ~ next-entry pointers, looking for an entry that matches a given name. The ~ dictionary pointer is the pointer (not handle) to the root of the list, ~ which runs from newest to oldest. For example, dereferencing the value of ~ "latest" gives the pointer to the main dictionary, which can be passed to ~ find-in. ~ ~ Having find-in separated out is convenient when working with alternate ~ dictionaries, but the main reason for having it is not convenience but ~ necessity: During Evocation's startup, there is a period before global ~ variables are easily accessible, so there would be no way to implement ~ "find". The warm-start routine (see execution.e and transform.e) has the ~ job of fixing that, and it makes extensive use of find-in to do so. ~ ~ (dictionary pointer, string pointer -- entry pointer or 0) : find-in ~ It will be more convenient to have the entry pointer on top. swap { ~ If the entry pointer is null, exit. ~ (name pointer to find, current entry pointer) dup 0 = { swap drop exit } if ~ Check this entry's "hidden" flag. ~ (name pointer to find, current entry pointer) dup entry-flags@ 0x80 & 0x80 != { ~ Test whether this entry is a match. ~ (name pointer to find, current entry pointer) 2dup 10 + stringcmp 0 = { ~ If we're here, it's a match. Clean up our working state and exit. ~ (name pointer to find, current entry pointer) swap drop exit } if } if ~ If we're here, it's not a match; traverse the pointer and repeat. ~ (name pointer to find, current entry pointer) @ } forever ; ~ This has the same value as the constant control-stack-size, which is ~ defined in execution.e. Everything will break if it doesn't. ~ ~ TODO: remove one of them. Probably the other one. : log-offset 0x10000 ; ~ 64 KiB ~ (log address -- log address, "log" pointer) : log-load-log dup log-offset + ; ~ (log address -- log address, "s0" pointer) : log-load-s0 dup log-offset + 8 + ; ~ (log address -- log address, "r0" pointer) : log-load-r0 dup log-offset + 2 8 * + ; ~ (log address -- log address, "latest" pointer) : log-load-latest dup log-offset + 3 8 * + ; ~ (log address -- log address, "here" pointer) : log-load-here dup log-offset + 4 8 * + ; ~ This is a helper used by warm-start, which invokes find-in using "latest". ~ It relies on being passed the root address of the log, which is used to find ~ the global variable "latest". It's inconvenient to keep a log pointer around ~ all the time, which is why we stop doing it as soon as possible, but during ~ Evocation's startup there's no alternative. This word is used extensively ~ by code that's been compiled via the log-load transform; see transform.e for ~ details. ~ ~ It would be possible to unload this word after the log is created, but ~ there are rare situations in which it's still useful, such as injecting ~ Evocation into another process's address space. Plus, it's small. So, we ~ keep it around. ~ ~ (log address, string pointer -- log address, entry pointer or 0) : log-load-find swap log-load-latest @ swap 3unroll swap find-in ; ~ In the code generated by the log-load transform, it's convenient to have ~ only a single step needed to look up a word's execution token. This helper ~ does log-load-find, then gets the execution token if an entry is found. ~ ~ (log address, string pointer -- log address, execution token or 0) : log-load-find-execution-token log-load-find dup { entry-to-execution-token } if ; ~ This is the same as "create", from interpret.e, except that it takes the ~ log's address as a parameter rather than hardcoding it, so that it can be ~ used in situations where the normal compilation process isn't yet available. ~ ~ The requisite stack juggling is kind of finicky, sorry if it's hard to ~ read, but it's doing the same steps in the same order as the regular ~ "create". ~ ~ (log address, string pointer -- log address) : log-load-create dup stringlen 1 + dup 3unroll ~ (log address, name field length, string pointer, name field length) 3 pick log-load-here swap drop @ 10 + 3unroll memmove ~ (log address, name field length) over log-load-here swap drop @ ~ (log address, name field length, output point) 2 pick log-load-latest swap drop @ pack64 ~ (log address, name field length, output point) 0 pack8 0 pack8 + ~ (log address, output point) 8 packalign ~ (log address, output point) over log-load-here swap drop @ ~ (log address, output point, old here value) 2 pick log-load-latest swap drop ! ~ (log address, output point) over log-load-here swap drop ! ; ~ This is the same as ",", from interpret.e, except that it takes the log's ~ address as a parameter rather than hardcoding it, so that it can be used in ~ situations where the normal compilation process isn't yet available. ~ ~ Again, the stack juggling is kind of a lot, sorry about that. ~ ~ (log address, value -- log address) : log-load-comma swap log-load-here swap 3unroll ~ (log address, value, here) @ swap pack64 ~ (log address, updated here value) 3roll log-load-here swap 3unroll ~ (log address, updated here value, here) ! ; ~ This is the same as "variable", from interpret.e, except that it takes the ~ log's address as a parameter rather than hardcoding it, so that it can be ~ used in situations where the normal compilation process isn't yet available. ~ ~ (log address, address for new variable word, string pointer -- log address) : log-load-variable 3roll swap log-load-create ~ (address for new variable word, log address) log-load-here 3unroll ~ (log address, address for new variable word, here) dup @ ~ (log address, address for new variable word, here, output point) dup 8 + pack64 3roll :rax mov-reg64-imm64 ~ (log address, here, output point) ~ :rax push-reg64 pack-next 8 packalign swap ! ;