implement log-load-create; add all the pack/unpack stuff to core.e

Force-Push: yes Change-Id: I04dd65a9eec71f9b50c8875bdcbe5d4be59888d5
author: Irene Knapp <ireneista@irenes.space> 2026-05-17 17:29:09 -0700
committer: Irene Knapp <ireneista@irenes.space> 2026-05-17 17:29:09 -0700
commit: 3b41dbfa2338c11dd8398026c00922f20f32dc81 (patch)
tree: 1761f2b8f2da972b830cf617d194c7c5be5b1535 /transform.e
parent: 4f1a07da9c87a1560da34b8a96a9de4cdc90f1fc (diff)
1 files changed, 129 insertions, 82 deletions
diff --git a/transform.e b/transform.e
index 5ddac78..42901f4 100644
--- a/transform.e
+++ b/transform.e
@@ -16,36 +16,51 @@
 ~ specific way. The transforms rely on the label facility provided by
 ~ labels.e, and expect to run from within label-loop.
 ~
-~   The label transform operates on code that compiles itself, and ensures
-~ that the result of the compilation is suitable to be included in an
-~ executable binary as words that are statically referenced by their
-~ addresses. To achieve this, it causes each newly-defined word to have a
-~ corresponding label whose value is the offset of its codeword, and it causes
-~ all compiled invocations of other words to be resolved by using these labels.
-~ The label transform is suitable for code that must be directly invoked by
-~ the warm-start routine provided by execution.e.
-~
-~   The log-load transform also operates on code that compiles itself; it
-~ produces a compiled routine which, when run, appends the original code to
-~ the log. As the routine is run, each reference to another word is resolved
-~ by looking up the name of the target word in the log. Furthermore, these
-~ lookups are done using log-load-find, defined in log-load.e, which accepts
-~ a pointer to the log's base address as a parameter. See that file for more
-~ explanation of what the log is and why it's important. Thus, unlike normal
-~ accesses to the log, this routine doesn't rely on already having the log's
-~ base address hardcoded into it at the time of its own compilation. The
-~ log-load transform is suitable for implementing the core responsibilities of
-~ the warm-start routine provided by execution.e, relying on only a few
-~ specific words that it statically references via labels.
+~   The label transform produces code that uses one label per word it defines,
+~ to statically reference everything. Thus, when output to an executable
+~ binary, this code will function without external dependencies. The tradeoff
+~ is that it has no way to reference data that exists only at runtime.
+~
+~   The log-load transform relies on labels, but doesn't add any of its own.
+~ It produces a compiled routine which, when run, dynamically looks up all the
+~ references in the log, and appends the original code to the log. This adds
+~ work that must be done when the runtime starts up, but the benefit is that
+~ it can reference data that doesn't exist at compile-time. Most crucially,
+~ it can reference the "here" and "latest" pointers in the log, which are
+~ required for all the usual word-definition stuff to work.
 ~
 ~   The log-load transform may also be useful for experimental tasks such as
 ~ creating additional, independent logs, or injecting Evocation into another
 ~ process's address space.
 ~
+~   Please notice that both these transforms, in different ways, navigate the
+~ same underlying design tension: The Forth compilation model hardcodes
+~ references at the time compilation happens, and Evocation makes the choice
+~ to not decide the address of the log until runtime. Thus the label transform
+~ can't be sufficient on its own. Other Forths avoid this problem by
+~ hardcoding an address for the log, or by using OS-provided load-time
+~ symbol relocation. Evocation, however, does it on hard mode, mostly for fun.
+~
+~   Because it was clear from early on that the label transform couldn't stand
+~ alone, and that another one would be necessary, we've refrained from adding
+~ too many features to it. Since we have multiple transforms, they should each
+~ be kept simple and well-defined, so that they can be composed in creative
+~ new ways down the line. When adding additional behavior, always give thought
+~ to whether it belongs in an existing transform or a new one.
+~
 ~
 ~ About the label transform
 ~ ~~~~~~~~~~~~~~~~~~~~~~~~~
 ~
+~   The label transform operates on code that compiles itself, and ensures
+~ that the result of the compilation is suitable to be included in an
+~ executable binary as words that are statically referenced by their
+~ addresses. To achieve this, it causes each newly-defined word to have a
+~ corresponding label whose value is the offset of its codeword, and it causes
+~ all compiled invocations of other words to be resolved by using these labels.
+~ The label transform is suitable for code that must be directly invoked by
+~ the warm-start routine provided by execution.e.
+~
 ~   The most fundamental technique the label transform performs is to separate
 ~ words that run in compile mode from words that run immediately.  There is no
 ~ distinction made between words running in immediate mode, and words declared
@@ -105,6 +120,19 @@
 ~ About the log-load transform
 ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ~
+~   The log-load transform also operates on code that compiles itself; it
+~ produces a compiled routine which, when run, appends the original code to
+~ the log. As the routine is run, each reference to another word is resolved
+~ by looking up the name of the target word in the log. Furthermore, these
+~ lookups are done using log-load-find, defined in log-load.e, which accepts
+~ a pointer to the log's base address as a parameter. See that file for more
+~ explanation of what the log is and why it's important. Thus, unlike normal
+~ accesses to the log, this routine doesn't rely on already having the log's
+~ base address hardcoded into it at the time of its own compilation. The
+~ log-load transform is suitable for implementing the core responsibilities of
+~ the warm-start routine provided by execution.e, relying on only a few
+~ specific words that it statically references via labels.
+~
 ~   Much like the label transform, the log-load transform provides alternate
 ~ versions of certain immediate words used in word definition. Also like the
 ~ label transform, it provides its own copies of "here" and "latest".
@@ -223,7 +251,7 @@ allocate-transform-state s" transform-state" variable
 
 
 ~   When calling the label facility during a transformation, it's necessary
-~ to use the real, non-wrapped "heap" and "latest".
+~ to use the real, non-wrapped "here" and "latest".
 : swap-transform-variables
   here @ transform-state transform-state-saved-here @
   here ! transform-state transform-state-saved-here !
@@ -458,11 +486,11 @@ allocate-transform-state s" transform-state" variable
         ~ which is already what we want to output.
         ~
         ~   An important caveat: Though it would require something weird to be
-        ~ happening, such as a forced forward reference, the label may be zero!
-        ~ We need to allow for that possibility by not examining the contents of
-        ~ a nonexistent entry.
+        ~ happening, such as a forced forward reference, the label may be
+        ~ zero! We need to allow for that possibility by not examining the
+        ~ contents of a nonexistent entry.
         ~
-        ~   Fortunately we don't have to look at it, just append it to the heap
+        ~   Fortunately we don't have to look at it, just append it to the log
         ~ and clean up.
         offset-to-target-address-space , drop dropstring 0 exit
       } if
@@ -489,7 +517,7 @@ allocate-transform-state s" transform-state" variable
     ~ It's a number.
     interpreter-flags @ 0x01 & {
       ~ We're in compile mode; append first "lit", then the number, to the
-      ~ heap. The version of "lit" we use is found by label, so it'll be the
+      ~ log. The version of "lit" we use is found by label, so it'll be the
       ~ one that exists when this code is ultimately run.
       dropstring-with-result
 
@@ -572,30 +600,32 @@ allocate-transform-state s" transform-state" variable
 ~ below. It is likely to be extremely useful to read and understand "create"
 ~ in interpret.e before attempting to understand log-load-create.
 : log-load-create
-  dup stringlen 1 + dup 3unroll
-  here @ 10 + 3unroll memmove
-  here @
-
-  ~   This value of "latest" is going into the generated output, so we need
-  ~ to map it to the target address space. It's stored in the host address
-  ~ space to make immediate words work as expected, so the appropriate
-  ~ conversion is host-address-space-to-target.
-  latest @ host-address-space-to-target pack64
-  0 pack8
-  0 pack8
-  +
-  8 packalign
-  here @ latest !
-
-  ~   Now we're immediately after the word header, which is where the codeword
-  ~ will be. This is the value the label should taken on, so we set it.
-  dup host-address-space-to-offset
-  here @ 10 +
+  ~ dup stringlen 1 + dup 3unroll
+  ~ here @ 10 + 3unroll memmove
+  ~ here @
+
+  ~ ~   This value of "latest" is going into the generated output, so we need
+  ~ ~ to map it to the target address space. It's stored in the host address
+  ~ ~ space to make immediate words work as expected, so the appropriate
+  ~ ~ conversion is host-address-space-to-target.
+  ~ latest @ host-address-space-to-target pack64
+  ~ 0 pack8
+  ~ 0 pack8
+  ~ +
+  ~ 8 packalign
+  ~ here @ latest !
+
+  ~ ~   Now we're immediately after the word header, which is where the codeword
+  ~ ~ will be. This is the value the label should taken on, so we set it.
+  ~ dup host-address-space-to-offset
+  ~ here @ 10 +
+  0 swap ~ DO NOT SUBMIT
   swap-transform-variables
   intern-label set-label
   swap-transform-variables
 
-  here ! ;
+  ~ here !
+  ;
 
 
 ~   This is the alternate version of ":" for use with the log-load transform.
@@ -603,17 +633,19 @@ allocate-transform-state s" transform-state" variable
 ~ likely to be extremely useful to read and understand ":" in interpret.e
 ~ before attempting to understand "log-load:".
 : log-load:
-  ~ This calls "log-load-create" instead of "create".
+  ~ ~ This calls "log-load-create" instead of "create".
   word value@ log-load-create dropstring
 
   ~ This looks up "docol" by label.
-  swap-transform-variables
-  L@' docol
-  L@' origin
-  swap-transform-variables
-  + ,
+  ~ swap-transform-variables
+  ~ L@' docol
+  ~ L@' origin
+  ~ swap-transform-variables
+  ~ + ,
 
-  latest @ hide-entry ] ;
+  ~ TODO note no hiding the entry
+  ]
+  ;
 
 
 ~   This is the alternate version of ";" for use with the log-load transform.
@@ -621,16 +653,16 @@ allocate-transform-state s" transform-state" variable
 ~ likely to be extremely useful to read and understand ";" in interpret.e
 ~ before attempting to understand "log-load;".
 : log-load;
-  ~ This looks up "exit" by label.
-  swap-transform-variables
-  L@' exit
-  swap-transform-variables
-  offset-to-target-address-space ,
+  ~ ~ This looks up "exit" by label.
+  ~ swap-transform-variables
+  ~ L@' exit
+  ~ swap-transform-variables
+  ~ offset-to-target-address-space ,
 
-  latest @ unhide-entry
+  ~ latest @ unhide-entry
 
-  ~   Since [ is an immediate word, we have to go to extra trouble to compile
-  ~ it as part of ;.
+  ~ ~   Since [ is an immediate word, we have to go to extra trouble to compile
+  ~ ~ it as part of ;.
   [ ' [ entry-to-execution-token , ]
   ; make-immediate
 
@@ -640,15 +672,15 @@ allocate-transform-state s" transform-state" variable
 ~ below. It is likely to be extremely useful to read and understand ";asm" in
 ~ interpret.e before attempting to understand "log-load;asm".
 : log-load;asm
-  here @ pack-next 8 packalign here !
-  latest @ dup unhide-entry entry-to-execution-token
-  ~ The codeword needs to be transformed to the target address space.
-  dup 8 + host-address-space-to-target
-  swap !
-
-  ~   Since [ is an immediate word, we have to go to extra trouble to compile
-  ~ it as part of ;asm.
-  [ ' [ entry-to-execution-token , ]
+  ~ here @ pack-next 8 packalign here !
+  ~ latest @ dup unhide-entry entry-to-execution-token
+  ~ ~ The codeword needs to be transformed to the target address space.
+  ~ dup 8 + host-address-space-to-target
+  ~ swap !
+
+  ~ ~   Since [ is an immediate word, we have to go to extra trouble to compile
+  ~ ~ it as part of ;asm.
+  ~ [ ' [ entry-to-execution-token , ]
   ; make-immediate
 
 ~   This implements the log-load transform for a single word. It is directly
@@ -669,8 +701,6 @@ allocate-transform-state s" transform-state" variable
   ~ (string)
   value@
 
-  dup emitstring newline
-
   ~ If it's the magic word, end the transformation.
   dup s" pyrzqxgl" stringcmp 0 = { drop dropstring 1 exit } if
 
@@ -695,6 +725,7 @@ allocate-transform-state s" transform-state" variable
     dropstring-with-result entry-to-execution-token execute
     0 exit
   } if
+  drop
   ~ (name as stack string)
 
   ~   Now we might have a compiled word, an immediate word, or an integer
@@ -711,6 +742,8 @@ allocate-transform-state s" transform-state" variable
     ~ It's a number.
     dropstring-with-result
 
+    drop ~ TODO placeholder
+
     interpreter-flags @ 0x01 & {
       ~ We're in compile mode, so we want to generate code which will compile
       ~ the number.
@@ -721,24 +754,40 @@ allocate-transform-state s" transform-state" variable
     ~ We're in interpret mode, so we want to generate code which will push the
     ~ number to the stack.
     ~ TODO
+    swap-transform-variables L@' lit swap-transform-variables
+    offset-to-target-address-space , ,
     0 exit
   } if
-  drop
   ~ (name as stack string)
 
   ~   We know it's a regular word, and we're assuming it will exist at
   ~ runtime. We of course have no way to check what flags it will have, which
   ~ means immediate words don't work with this transform. We still treat it
   ~ differently based on whether we're in compile mode.
-  interpreter-flags @ 0x01 & {
-    ~ We're in compile mode. We compile code that compiles the word.
-    ~ TODO
-    dropstring 0 exit
-  } if
+  ~ interpreter-flags @ 0x01 & {
+  ~   ~ We're in compile mode. We compile code that compiles the word.
+  ~   ~ TODO
+  ~   dropstring 0 exit
+  ~ } if
   ~ (name as stack string)
 
-  ~ We're in immediate mode. We compile code that runs the word immediately.
+  ~   We're in immediate mode. We compile code that runs the word immediately.
+  ~ We check whether there's a label for the word; if there is, we output
+  ~ that. Otherwise we output code that looks it up and runs it.
   ~ TODO
+  value@
+  swap-transform-variables
+  ~   Looking these up in reverse order saves us some stack juggling. Does
+  ~ help readability, or hurt it? Who can say...
+  L@' execute
+  L@' log-load-find-execution-token
+  L@' litstring
+  swap-transform-variables
+  offset-to-target-address-space ,     ~ litstring
+  3roll here @ swap packstring 8 packalign here !
+  offset-to-target-address-space ,     ~ log-load-find-execution-token
+  offset-to-target-address-space ,     ~ execute
+
 
   ~ There's no such thing as not finding the word, with this transform. So
   ~ we just exit.
@@ -749,8 +798,6 @@ allocate-transform-state s" transform-state" variable
 ~ an input string. It is directly analogous to "quit", in interpret.e, but is
 ~ far more complex.
 ~
-~ TODO TODO TODO this is just a stub, right now it's just a copy of the label
-~ transform
 ~ (output buffer start, output point, input string pointer
 ~  -- output buffer start, output point)
 : log-load-transform
author	Irene Knapp <ireneista@irenes.space>	2026-05-17 17:29:09 -0700
committer	Irene Knapp <ireneista@irenes.space>	2026-05-17 17:29:09 -0700
commit	3b41dbfa2338c11dd8398026c00922f20f32dc81 (patch)
tree	1761f2b8f2da972b830cf617d194c7c5be5b1535 /transform.e
parent	4f1a07da9c87a1560da34b8a96a9de4cdc90f1fc (diff)