From eec336dea3d86e176c4bd86c435e6be35fec64e2 Mon Sep 17 00:00:00 2001 From: Irene Knapp Date: Mon, 18 May 2026 16:36:08 -0700 Subject: okay there's a strat for making the here/latest variables now it doesn't work yet, but it's gonna the reason it doesn't work is that the new helper log-load-variable relies on the assembly-definition words being statically available, and they aren't yet that's fine though, this is still a huge change, worth checking in. why? well, it represents like 16 hours of debugging which culminated in some very minor changes to the semantics of the label transform, in order to make missing words easier to notice and debug. see comments for details. woooooo :D Force-Push: yes Change-Id: Id8334819d165ba9e3156ef2bf32008af748eac29 --- execution.e | 143 ++++++++++++++++++++---------------------------------------- 1 file changed, 47 insertions(+), 96 deletions(-) (limited to 'execution.e') diff --git a/execution.e b/execution.e index daacddb..1b9e84d 100644 --- a/execution.e +++ b/execution.e @@ -484,16 +484,11 @@ 3unroll 8 packalign current-offset L!' warm-start - 3roll - - log-load-transform - - ~ TODO this is tied to the specific example in evoke - ~ L@' happy-path L@' origin + pack64 + ~ (input string pointer, output buffer start, output point) ~ Before handing off to us, cold-start pushed a single value onto the - ~ stack, a pointer to the beginning of the heap. Now, we load our entire - ~ Forth implementation onto that heap, beginning with the minimal set of + ~ stack, a pointer to the beginning of the log. Now, we load our entire + ~ Forth implementation onto that log, beginning with the minimal set of ~ words needed to define more words. We do this because we need variables as ~ infrastructure so we can eventually have dynamic definitions. ~ @@ -514,117 +509,73 @@ ~ That choice does mean we have the hard version of this bootstrapping ~ problem, and copying ourselves to the heap is how we solve it. ~ - ~ We do have the heap address right now, though that won't last. In case + ~ We do have the log address right now, though that won't last. In case ~ it's unclear why not: keeping it on the stack would require all future ~ references to walk the stack, and somehow know when they've reached the ~ bottom. The stack is a good place to keep things with clearly delimited ~ lifetimes and visibility, but when we want something to live for our ~ entire program and be easy to find from any code within it, we need to ~ do something else. Anyway, since we have the address, we can use it for - ~ the next little bit of setup. + ~ the next little bit of setup. We have a bunch of helper words, from + ~ log-load.e, which make this easier. ~ ~ The first few words we define are our variables, which hardcode the ~ addresses they will return - but since we're doing this at runtime, - ~ "hardcoding" can reflect where our heap is. This is the fundamental - ~ trick that makes the heap usable. + ~ "hardcoding" can reflect where our log is. This is the fundamental + ~ trick that makes the log usable. ~ ~ One more thing to notice: We already allocated the backing stores of - ~ these variables, and populated their initial values, in _start. The + ~ these variables, and populated their initial values, in cold-start. The ~ words we're defining return those same addresses for the same backing ~ stores. So, we have continuity: Stuff defined in terms of the ~ variable-words we're defining now will interoperate with the stuff that - ~ we define in the "early" way, which includes those very words. Both the - ~ early code and the later code are dealing with the same data structures, - ~ they're just using a different technique to find them. + ~ we define using the log-load helpers, which includes those very words. + ~ Both the log-load code and the later code are dealing with the same data + ~ structures, they're just using a different technique to find them. ~ ~ This is the only hardcoding we need to do; by building on top of it, ~ we will soon reach a point where the rest of the system can be defined ~ within itself. - ~ TODO These need to, like, exist first. Also they need to be referenced - ~ as labels. - ~ dq early_heap, litstring, "heap", early_variable - ~ dq early_s0, litstring, "s0", early_variable - ~ dq early_r0, litstring, "r0", early_variable - ~ dq early_latest, litstring, "latest", early_variable - ~ dq early_here, litstring, "here", early_variable - ; + L@' log-load-log offset-to-target-address-space pack64 + L@' litstring offset-to-target-address-space pack64 + s" log" packstring 8 packalign + L@' log-load-variable offset-to-target-address-space pack64 -~ (previous entry address, output point, name string pointer -~ -- new entry address, output point) -: output-create - 3roll dup 4 roll swap pack64 - ~ (string pointer, new entry address, output point) - 0 pack8 - 0 pack8 - roll3 packstring - ~ (new entry address, output point) - 8 packalign - ; + L@' log-load-s0 offset-to-target-address-space pack64 + L@' litstring offset-to-target-address-space pack64 + s" s0" packstring 8 packalign + L@' log-load-variable offset-to-target-address-space pack64 + L@' log-load-r0 offset-to-target-address-space pack64 + L@' litstring offset-to-target-address-space pack64 + s" r0" packstring 8 packalign + L@' log-load-variable offset-to-target-address-space pack64 -~ Routine docol -~ ~~~~~~~~~~~~~ -~ -~ Reference this via its label as the codeword of a word to make it an -~ "interpreted" word. Concretely, it saves rsi (the "instruction pointer") -~ to the control stack, takes the address of the codeword from rax and -~ increments it in-place to form the new instruction pointer, and copies -~ that to rsi. -~ -~ Having then done this, we're now in the state that normal execution -~ expects, so docol ends by it using "next" to begin the callee's execution, -~ kicking off a nested call. -~ -~ The name is said to be short for "do colon", because Forth high-level -~ code begins word definitions with a colon. -~ -~ Registers in: -~ -~ * rsi is the caller's instruction pointer -~ * rbp is the control stack pointer -~ * rax is the address of the callee's codeword -~ -~ Registers out: -~ -~ * rsi is the callee's instruction pointer -~ * rbp is the control stack pointer -~ -~ (previous entry address, output point) -: output-docol - s" docol" output-create + L@' log-load-latest offset-to-target-address-space pack64 + L@' litstring offset-to-target-address-space pack64 + s" latest" packstring 8 packalign + L@' log-load-variable offset-to-target-address-space pack64 - ~ Evaluated as a word, docol is a constant which returns a pointer. - L@' docol :rax mov-reg64-imm64 - :rax push-reg64 - pack-next - 8 packalign + L@' log-load-here offset-to-target-address-space pack64 + L@' litstring offset-to-target-address-space pack64 + s" here" packstring 8 packalign + L@' log-load-variable offset-to-target-address-space pack64 - ~ Since docol is not a normal word, the label points to the value we care - ~ about from the assembly side of things, which is the address we use as the - ~ codeword. - current-offset L!' docol - :rsi pack-pushcontrol - 8 :rax add-reg64-imm8 - :rax :rsi mov-reg64-reg64 - pack-next - 8 packalign - ; + ~ Having done that, nothing else needs to be defined in an unusual way, so + ~ we can go ahead and dispatch to the log-load transform, and do the rest of + ~ the code through that. + + ~ (input string pointer, output buffer start, output point) + 3roll + log-load-transform ; -~ This is the mechanism to "return" from a word interpreted by docol. -~ We pop the control stack, and then, since this is threaded execution, we -~ do the next thing the caller wants to do, by inlining "next". -~ -~ This word would work fine with the label transformation, so we could put -~ it in core.e, but we choose to define it here because it's easier to -~ understand when it's close to the rest of the execution stuff. -~ -~ (previous entry address, output point -~ -- new entry address, output point) -: output-exit - s" exit" output-create - current-offset L!' exit - :rsi pack-popcontrol - pack-next - ; + +~ Where next? +~ ~~~~~~~~~~~ +~ +~ The definitions of "docol" and "exit" are very tightly bound up with the +~ execution model. They're defined and explained in core.e, because they need +~ to be part of the build process in two different ways, like the other core +~ functionality. -- cgit 1.4.1