1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
|
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ ~~ More system calls for Linux ~~
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~ Everything that takes a struct is here, because that lets us define the
~ system call next to the allocation stuff, for ease of reference. There are
~ also higher-level facilities, later in the file.
~ Since the only way to know the struct layout is by reading the kernel
~ source, we put that definition here, as well.
~ (-- struct pointer)
: allocate-timespec 2 8 * allocate ;
~ (struct pointer -- field pointer)
: timespec-seconds ;
: timespec-nanoseconds 8 + ;
~ (nanoseconds -- result code)
: nanosleep
1000000000 /%
~ (nanoseconds, seconds)
allocate-timespec
dup timespec-seconds 3roll swap !
dup timespec-nanoseconds 3roll swap !
allocate-timespec
dup timespec-seconds 0 swap !
dup timespec-nanoseconds 0 swap !
dup 3unroll
35 ~ syscall number
syscall-2
drop drop ;
~ The old name was sigaction(), but per the glibc manpages, that was
~ replaced by rt_sigaction() quite some time ago, to allow larger signal set
~ bitmaps for the benefit of realtime signals. Both are still present; we use
~ rt_sigaction(). There are also two versions of rt_sigaction; grep
~ ODD_RT_SIGACTION in the kernel source for more details. Neither x86
~ architecture has the "odd" version of rt_sigaction(), so we use the regular
~ one, not to be confused with the "compat" one. Its definition is in
~ kernel/signal.c, in case you need to check the struct sizes and layouts.
~
~ The parameters are: An integer signal identifier; an optional pointer to a
~ struct describing the new action to bind (or unbind if NULL); an optional
~ pointer to another of the same struct to hold a copy of the old action; and
~ a 64-bit word which must describe the correct size of sigset_t, which is a
~ C type used for the mask field of that struct. Any other value for the size
~ field is an error.
~
~ Mandatory size fields start to make sense when you have a compatibility
~ situation this convoluted... Think of it as the caller promising they know
~ which version of the API they're calling. The unmarked version of
~ rt_sigaction() takes a 64-bit sigset_t, and the parameter wants the size in
~ bytes, so we use a value of 8.
~
~ Anyway, we aren't C and don't have POSIX naming obligations, so we just
~ call it sigaction.
~
~ (signal number, new action pointer, old action pointer -- result code)
: sys-sigaction
8 ~ size of sigset_t in bytes
13 ~ syscall number
syscall-4 ;
~ (new stack pointer, old stack pointer -- result code)
: sys-sigaltstack
131 ~ syscall number
syscall-2 ;
~ Since the only way to know the struct layout is by reading the kernel
~ source, we put that definition here, as well.
~
~ There are MANY versions of this struct; this is the appropriate one for
~ amd64.
~
~ (-- struct pointer)
: allocate-sigaction here @ 128 packalign here ! 4 8 * allocate ;
~ (struct pointer -- field pointer)
: sigaction-action ;
: sigaction-flags 8 + ;
: sigaction-restorer 2 8 * + ;
: sigaction-mask 3 8 * + ;
: allocate-sigaltstack here @ 128 packalign here ! 3 8 * allocate ;
: sigaltstack-pointer ;
: sigaltstack-flags 8 + ;
: sigaltstack-size 2 8 * + ;
~ High-level facilities
~ ~~~~~~~~~~~~~~~~~~~~~
~ It's possible to set up an alternate stack for signal handlers. We don't,
~ though, so it's possible this code has bitrotted. At the very least, it
~ should be more configurable than this.
: prepare-signal-stack
here @ 2048 packalign here !
1024 1024 * 4 * allocate
~ (stack address)
allocate-sigaltstack
~ (stack address, struct pointer)
2dup sigaltstack-pointer !
dup sigaltstack-flags 0 swap !
dup sigaltstack-size 1024 1024 * 32 * swap !
~ (stack address, struct pointer)
allocate-sigaltstack
sys-sigaltstack
drop ;
~ On amd64, and on no other architecture, the Linux kernel requires that
~ the language runtime use a "restorer" when binding signal handlers. This
~ is that restorer. The kernel wants a raw pointer that it can use as a
~ C-style return address on the C stack, which is our value stack. So, this
~ has to be an assembly word, it can't use docol. When the data structure is
~ populated by bind-signal, we'll dereference the codeword and pass that
~ value, but there's no chance to give it the usual callee address in rax, so
~ we can't use a Forth-style interpreter codeword, it has to be a
~ self-codeword.
~
~ The purpose of the trampoline is to be invoked by our signal handler. The
~ manner of that invocation MUST be returning into it with the "ret"
~ instruction; otherwise the C stack won't be right for cleanup to work
~ properly.
~
~ When invoked, the trampoline invokes the syscall "sigreturn", whose sole
~ purpose is to be called from this trampoline. That syscall won't return in
~ a conventional way, so we don't bother handling the scenario where it does.
~
~ According to commentary in the Go compiler's internals[1][2][3], gdb
~ recognizes the trampoline based on its exact byte values, since the intent
~ is only to be compatible with glibc. We intend to be our own debugger
~ anyway, so we don't worry about that. We're not seeking fame and we don't
~ have a corporate image to uphold, so that level of fragility and contortion
~ is just too much. The thing about compatibility constraints is knowing when
~ to work with them and when to walk away.
~
~ Experimentally, it is also possible to avoid using this trampoline by
~ faking it: set the "restorer" bit in the action flags, but pass a null
~ pointer as the restorer, then have the handler pop 8 bytes from the stack
~ and invoke sigreturn directly. The actual requirement seems to be that rsp
~ points at the saved state, at the time of invoking sigreturn. We're not
~ doing that, because signal handling is not intended to be
~ performance-critical and it feels like asking for trouble, but the
~ possibility is noted here against future use.
~
~ This doesn't have an interface definition comment, because it doesn't use
~ the Forth execution model.
~
~ [1] https://go.googlesource.com/go/+/refs/heads/master/src/runtime/sys_linux_amd64.s#472
~ [2] https://go.googlesource.com/go/+/refs/heads/master/src/runtime/os_linux.go#476
~ [3] https://go.googlesource.com/go/+/refs/heads/master/src/runtime/defs_linux_amd64.go#118
: signal-return-trampoline
[ here @
15 :rax mov-reg64-imm64 ~ sigreturn
syscall
here ! ] ;asm
~ This accepts an execution token. It creates a hidden word on the log which
~ wraps that execution token with necessary setup and teardown to run as a
~ Unix signal handler, and returns the execution token of the wrapper.
~
~ Specifically, on invocation, the wrapper ensures that rsi points to its
~ second half and rbp points to the top of the control stack; loads the target
~ execution token into rax; then indirectly calls it. This is the usual
~ interface of a normal call in the Forth execution model, so the wrapped word
~ can be based on docol, on a self-codeword, or on any other interpreter word
~ it wants. It can also freely call whatever Forth things it wants.
~
~ We don't have to do anything about rsp because the invariants for our use
~ of it as the value stack are a subset of the invariants for C's use of it as
~ its only stack. It's already working the way we need it to.
~
~ When the wrapped word returns, it uses the rsi the wrapper provided to do
~ so, which places control in the second half of the wrapper. This second half
~ simply executes a "ret" instruction, which is the necessary invocation of
~ the signal return trampoline (see above). This will transfer control back
~ to the kernel, and will ultimately result in Forth execution resuming where
~ it left off before the signal was delivered.
~
~ Crucially, the wrapper relies on the kernel preserving the value of rbp
~ that existed at the moment before the control transfer began. Signal
~ delivery is an UNCONTROLLED control transfer, meaning that we as the
~ language runtime do not have an opportunity to execute any cleanup before it
~ happens. If it were a controlled transfer, we would be able to save rbp to
~ a global variable somewhere, and restore it in the wrapper. It's not, so we
~ don't have that chance.
~
~ Notionally we could freshly allocate a new control stack somewhere else,
~ and set rbp to point to it, but it would be challenging to do that without
~ relying on the control stack, and inefficient to execute, and the call for
~ now is that that's not worth it.
~
~ The only situation in which this limitation will become a practical
~ concern is if, at the time of signal delivery, something outside the Forth
~ execution model is happening. In that case, the wrapper will likely crash.
~
~ As a long term strategy, the way to mitigate this would be to make sure
~ that all non-signal transfers from within the Forth execution model to
~ outside it are controlled, and that they save global state that can be
~ reconstructed here. For now, we leave this as future work.
~
~ (execution token -- execution token)
: wrap-signal-handler
~ We generate a word entry for the wrapper, and hide it. Since it's
~ hidden, the name doesn't have to be unique. This keeps the log clean, so
~ that all the space on it will always be attributable to some specific
~ word. Remember kids, keeping the log clean is everyone's responsibility!
s" signal-handler-wrapper" create make-hidden
~ This self-codeword will be consumed by bind-signal.
self-codeword
here @ dup
~ (inner execution token, saved location, output point)
~ It's our responsibility as a caller to set rsi to point to the address
~ of an execution token, which will pick up where we left off. That token
~ will be our own second half, whose address we don't yet know, so we
~ output a placeholder opcode here and overwrite it once we do. That's why
~ we've saved the current location.
0 :rsi mov-reg64-imm64
~ We also need to make sure that rbp points to an area that can be treated
~ as the top of the control stack (there's no need to ever unwind past it,
~ so it doesn't have to be the "real" one). Fortunately, it comes to us
~ already valid and we don't have to do anything about that. Plus it even
~ does happen to be the real one, which will let stack tracing code run in
~ a handler, and we do care about that.
~ We also need to set DF = 0, since that's also part of our ABI.
cld
~ Compare this snippet to "execute" in core.e. Instead of taking rax from
~ the stack, we set a hardcoded value picked at the time we generate the
~ wrapper. We then do the same indirect jump via the codeword it points to,
~ which allows the codeword's implementation to take advantage of rax
~ pointing to the callee; that's the property docol cares about.
3roll :rax mov-reg64-imm64
:rax jmp-abs-indirect-reg64
~ (saved location, output point)
8 packalign
here !
~ (saved location)
~ Now we have our second half, which has another codeword that rsi will
~ point to for our callee's benefit. This half runs after the wrapped word,
~ and has the responsibility of cleaning up and returning control to the
~ kernel, which it does by returning to the restorer trampoline. Yes, this
~ is a trampoline which passes control to another trampoline.
~
~ Although we run under the log-load transform, we won't ever actually be
~ invoked until at least log-load time, if not ultimate runtime. Both of
~ those are in the target address space. So there's no address translation
~ going on behind our back. Nonetheless, we avoid directly outputting any
~ address except what we get via self-codeword, which would be recommended
~ practice under the transforms. Yeah, it's a little convoluted, perhaps
~ unnecessarily so.
here @
self-codeword
@ 8 -
~ (saved location, second half execution token)
~ Something subtle here: That above "codeword" was actually a word
~ pointer. See, because we're pretending the wrapper is a Forth word, even
~ though we're writing it in assembly, so "returning" to it means invoking
~ the next word pointer in the word pointer array that is its compiled form.
~ Instead of creating a separate memory area though, we just put the pointer
~ target right here, as another codeword...
self-codeword
~ Now we treat the saved location as an output point, and re-output the
~ mov instruction that we stubbed out above. Because our assembler words
~ always output a specific, exact form of the instruction, we know it will
~ take up the same number of bytes.
:rsi mov-reg64-imm64
drop
~ Having done that, we can get on to the body of our second half. Happily,
~ it's quite short.
here @
ret
8 packalign
here !
~ Our caller wants an execution token that invokes all this. Since we
~ used "create" above, that's easy to get.
latest @ entry-to-execution-token ;
~ This accepts an execution token and a Unix signal number, and binds the
~ token to be the handler for the signal. It also does other necessary setup,
~ including picking appropriate flags for the binding and attaching the
~ return trampoline (see above).
~
~ Typically, you will want this execution token to be one returned by
~ wrap-signal-handler. Doing this will allow the handler to use the Forth
~ execution model in any way it wants, including calling both docol words and
~ assembly words, and working with both the control and value stacks at will.
~
~ There is an important limitation of wrap-signal-handler, described in more
~ detail above: Its wrapper only functions correctly when Forth code was
~ running at the time of the control transfer. For example, if Forth had
~ called into C, and then that C were interrupted by a signal, the signal
~ handler would have no way of finding the top of the control stack.
~
~ If your program involves many callbacks back-and-forth between C and
~ Forth, you may wish to forego the use of the wrapper and provide an
~ execution token meant to be invoked directly by the kernel. In this case,
~ bear in mind that its execution must not use the control stack - that is,
~ it must not rely on having been given sensible values of rsi or rbp. This
~ means it can't call other Forth words (unless it does something about that
~ on its own).
~
~ Regardless of whether the execution token is a copy of the wrapper or not,
~ it must be an assembly word, not a docol word. The kernel wants a raw
~ pointer that it can simulate a C-style call to, so we dereference the
~ codeword and pass that value, just as we do with the return trampoline. Also
~ as with the return trampoline, there is no way to pass the callee in rax,
~ which is the usual interface docol and other interpreter words expect. So,
~ it needs to be a self-codeword.
~
~ (execution token, signal number --)
: bind-signal
allocate-sigaction
dup sigaction-action 4 roll @ swap !
dup sigaction-mask 0 swap !
dup sigaction-flags 0x04000000 swap !
dup sigaction-restorer
' signal-return-trampoline entry-to-execution-token @ swap !
~ (signal number, struct pointer)
allocate-sigaction
sys-sigaction drop ;
: handle-crash ." CRASH" newline list-callers 1 sys-exit ;
: install-crash-handler
' handle-crash entry-to-execution-token wrap-signal-handler
11 bind-signal ;
~ There are scenarios where someone might want to disable this, for example
~ if calling back and forth between C and Evocation, but for now we always
~ enable it.
install-crash-handler
|