From 1fdeeb54b127fcd600d16c48c3b1b90e91f2ca28 Mon Sep 17 00:00:00 2001 From: Irene Knapp Date: Thu, 7 May 2026 18:56:23 -0700 Subject: document labels.e; also clean up elf.e the documentation in labels.e is entirely new, synthesized from informal private discussions. this is also intended as a final pass to make sure all the comments and nuances in the ELF code from quine.asm are incorporated in elf.e. also this uses the new `L@'` and `L!'` facilities for terseness Force-Push: yes Change-Id: Ieabb2bb26f4b83260f0072dcdcd0950f9aa9fab2 --- labels.e | 128 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 121 insertions(+), 7 deletions(-) (limited to 'labels.e') diff --git a/labels.e b/labels.e index 6ece87e..be601a3 100644 --- a/labels.e +++ b/labels.e @@ -1,15 +1,63 @@ -~ current output point, string pointer +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ ~~ Machine label facility ~~ +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +~ Compilers and assemblers always have a need to resolve symbolic names for +~ parts of their code to numeric values. Usually, these symbolic names are +~ called labels. +~ +~ It's a surprisingly deep problem. The output itself can end up depending +~ on the actual values, for a variety of reasons. For example, the size of +~ a relative-jump instruction may vary depending on how far away its target +~ is, which changes the position of everything after. Or, there may be a size +~ limit on some particular segment and code may have to be entirely +~ reorganized to make things fit, splitting it into pieces or adding +~ trampolines. +~ +~ Often, this winds up being a hidden layer with immense complexity, +~ implemented as part of a linker, that even compiler maintainers find obscure +~ and confusing. +~ +~ It's Irenes' position that we should engage with complexity rather than +~ encysting it. So, in Evocation we take a different approach, heavily +~ inspired by the semantics of a tool called flatassembler. We provide +~ downstream code with the basic operations set-label and use-label which it +~ can combine in whatever order it wants. We also provide a harness, +~ label-loop, which takes an execution token for the downstream code +~ generator and runs it over and over, computing the mathematical fixed point +~ of the label assignments. +~ +~ On the first pass, label-loop guesses a value of zero for any label that's +~ used before it's set. On subsequent passes, each label starts with the value +~ it had on the previous pass. When we have a pass where labels end with the +~ same value they started with, we say they've converged, and we announce +~ success. If a hundred passes go by without convergence, we fail instead. +~ +~ Most of the time, the shorthand words L@' and L!' will be all you need to +~ use from your own code. + +~ TODO this should go somewhere else :) +~ (current output point, string pointer --) : pack-raw-string { unpack8 dup } { 3roll swap pack8 swap } while drop drop ; -: origin 0x08000000 ; - -~ Status is a bit field: +~ The labels data structure is a linked list of dictionary entries, with +~ the same header format as the main Evocation dictionary, but instead of +~ being executable, each holds two words of data: status bits, and a value, +~ in that order. There's no "docol" pointer or anything of the sort. +~ +~ Status is a bit field: ~ bit zero is whether it's been used ~ bit one is whether it's been defined ~ bit two is whether it was used before being defined ~ bit three is whether the guessed value wound up equaling the actual value +~ +~ Just as the Evocation dictionary uses the global variable "latest" as a +~ handle (a pointer to a pointer) beginning a linked list of entries, so the +~ label dictionary also uses a handle variable, named "labels". +~ TODO we should just do this in immediate mode, but right now the word +~ "variable" steps on the same scratch space that s" uses, so we can't. : init-labels 8 allocate s" labels" variable 0 s" labels" find entry-to-execution-token execute ! @@ -17,6 +65,7 @@ ~ This needs to happen now because otherwise the word "labels" won't exist. init-labels +~ This is analogous to word-heading, but prints label information. ~ (entry pointer --) : label-heading dup entry-to-name dup emitstring space @@ -24,6 +73,7 @@ init-labels entry-to-execution-token dup 8 + @ .hex64 space @ .hex8 newline ; +~ TODO this should go elsewhere ~ (dictionary handle) : oldest-entry-in dup @@ -31,6 +81,7 @@ init-labels dup 3roll = { drop 0 } if ; +~ TODO this should go elsewhere ~ (entry pointer, dictionary handle) : next-newer-entry-in @ @@ -38,21 +89,42 @@ init-labels { dup { 2dup @ != } if } { @ } while swap drop ; +~ This prints the headings for all the labels that have been created. Note +~ that labels that have been created once stay in the dictionary forever, even +~ if subsequent passes neither use nor define them. That's because the control +~ loop has no way to know if they're still important; it's up to downstream +~ code to decide that. Everything has been carefully set up so that disused +~ labels don't hurt anything. : list-labels labels oldest-entry-in { dup } { dup label-heading labels next-newer-entry-in } while drop ; +~ This creates a new label given a name for it, initializing its value and +~ status to zero and adding it to the dictionary. This is responsible for the +~ initial guess of zero on the first pass. +~ ~ (name string pointer -- ) : new-label labels create-in 0 , 0 , ; +~ These helpers take a label entry pointer and return pointers to the status +~ and value fields. : label-status entry-to-execution-token ; : label-value entry-to-execution-token 8 + ; +~ This looks up a label by name if it exists, or creates it if it doesn't. +~ Either way, it returns an entry pointer. It's named after the function +~ "intern" that many Lisp dialects have. +~ +~ (name string pointer -- entry pointer) : intern-label dup labels swap find-in dup { swap drop } { drop dup new-label labels swap find-in } if-else ; +~ This returns the value of a label, also doing all necessary status checks +~ and updates to keep track of the circumstances under which it was used. The +~ label loop relies on all read accesses going through this word. +~ ~ (label entry pointer -- label value) : use-label ~ If it hasn't been defined yet, mark it used-before-set. @@ -65,6 +137,10 @@ init-labels label-value @ ; ; +~ This overwrites the value of a label, also doing all necessary status +~ checks and jupdates to keep track of the cirumstances under which it was +~ set. The label loop relies on all write accesses going through this word. +~ ~ (new label value, label entry pointer --) : set-label ~ We always set the defined bit to true. We leave the other status bits @@ -84,10 +160,15 @@ init-labels label-value ! label-status ! ; +~ This is a convenience helper which downstream code can use to check how +~ many bytes it has output thus-far. ~ (output memory start, current output point ~ -- output memory start, current output point, offset) : current-offset 2dup swap - ; +~ This is a concise syntax for referencing the entry pointer of a label, +~ when you know its name statically. It reads a word of text input and calls +~ intern-label on it. : L' word value@ interpreter-flags @ 0x01 & @@ -98,6 +179,8 @@ init-labels { intern-label dropstring-with-result } if-else ; make-immediate +~ This is a concise syntax for calling use-label with a label whose name +~ you know statically. It performs L', then calls use-label. : L@' ' L' entry-to-execution-token execute interpreter-flags @ 1 & @@ -105,6 +188,8 @@ init-labels { use-label } if-else ; make-immediate +~ This is a concise syntax for calling set-label with a label whose name +~ you know statically. It performs L', then calls set-label. : L!' ' L' entry-to-execution-token execute interpreter-flags @ 1 & @@ -112,15 +197,20 @@ init-labels { set-label } if-else ; make-immediate - -~ For a label to have "converged", at least one of the following must be true: +~ This is an internal helper that label-loop uses to check if a specific +~ label entry has "converged", based on its status bits. Downstream code won't +~ need to call this directly. +~ +~ For a label to have "converged", at least one of the following must be +~ true: ~ ~ 1. The label must never have been used (bit zero clear); ~ 2. The label was both used and defined, but not used before it was defined ~ (bits zero and one set; bit two clear); ~ 3. The label was both used and defined, and the guessed value equalled the ~ actual value (bits zero, one, and three set). -~ (label entry pointer) +~ +~ (label entry pointer -- boolean) : check-label-converged label-status @ dup 0x01 & not swap @@ -128,17 +218,41 @@ init-labels 0x0b & 0x0b = || || ; +~ This is an internal helper that label-loop uses to check if the overall +~ label assignments have converged, based on their status bits. Downstream +~ code won't need to call this directly. +~ +~ This returns true if and only if check-label-converged returns true for +~ all the labels that have been created. +~ +~ (-- boolean) : check-labels-converged 1 labels @ { dup } { dup check-label-converged 3roll && swap @ } while drop ; +~ This is an internal helper that label-loop calls between passes, to update +~ the status bits and get ready for the next one. Downstream code won't need +~ to call this directly. +~ : reset-labels labels @ { dup } { dup label-status 0 swap ! @ } while drop ; +~ This is the top-level word that invokes the entire label system. All the +~ code generation happens inside it. +~ +~ The execution token it's passed should have the interface: +~ +~ (output start, current output pointer +~ -- output start, current output pointer) +~ +~ In general, this is the same interface that code generation words should +~ use to communicate with each other. For example, all the words in elf.e +~ use this interface. +~ ~ (execution token -- output start, output length) : label-loop 0 swap -- cgit 1.4.1