Memory Safe Context Switching
Support for ucontext APIs is new since release 0.680. If you want to play with setcontext, getcontext, makecontext, and swapcontext then you have to build from source.
This document describes how Fil-C supports longjmp, setjmp, setcontext, getcontext, makecontext, and swapcontext in a totally memory-safe way. In particular, no misuse of those APIs in Fil-C can lead to stack corruption or any other violation of Fil-C's capability model.
These APIs are widely used:
longjmpandsetjmpare used in C programs to implement exception handling. It's especially common to use them to implement exceptions "thrown" from signal handlers.getcontext,setcontext,makecontext, andswapcontext(aka theucontextAPIs) are used to implement coroutines and fibers. For example, Boost usesucontextas part of its fiber implementation.
The ucontext APIs are less commonly used than longjmp/setjmp and some OSes (like Darwin) have deprecated them. However, they remain well supported in glibc.
Implementing these APIs in a way that preserves memory safety is hard since their misuse can result in restoring a dangling stack. For example, you could either setjmp or getcontext within some function, and then do any of the following things:
Return from that function. At this point, the context that was saved will attempt to restore a stack frame that no longer exists.
Exit from the thread. At this point, the context that was saved will attempt to restore execution on a stack that has been freed.
Even more friendly APIs like makecontext and swapcontext can be straightforwardly misused:
You can use
makecontextto create a context that points to some stack, then free that stack, and then eitherswapcontextorsetcontextto that context. In Yolo-C, this will result in running on a dangling stack. Fil-C makes this not an error.You can call
swapcontextwith the second argument being the context that is currently executing. This might happen if you confuse the first and second arguments. In Yolo-C, in the best case, this will behave like alongjmp; in the worst case, it will result in executing on a dangling stack. In Fil-C, this is a safety error that panics your program.
In Yolo-C, execution on a dangling stack results in the most confusing kinds of crashes, since the debugger won't even be able to print a stack trace! Worse, if the program has subtle bugs in its handling of contexts, then an attacker could exploit those bugs to cause the program to do whatever the attacker likes. In Fil-C, execution on a dangling stack is not possible: all such cases are either panics at the point where you misused longjmp or one of the ucontext APIs, or they are reliably legal execution because of how Fil-C manages stacks.
Fil-C implements setjmp/longjmp and the ucontext APIs quite differently.
Making setjmp/longjmp Memory Safe
There is an impressive amount of depth to the depravity of setjmp. Before going into the details of how Fil-C implements setjmp/longjmp, we need to discuss exactly what makes this function so amazingly evil.
setjmp saves the context as it was at the moment when it was called so that when longjmp is called later, setjmp will return a second time. It is the fact that it returns twice that makes it so vile, and so we need to understand the implications precisely.
An Example
Consider this simple program:
#include <setjmp.h>
#include <stdio.h>
int main(int argc, char** argv)
{
volatile int x = 42;
jmp_buf jb;
if (setjmp(jb)) {
printf("x = %d\n", x);
return 0;
}
x = 666;
longjmp(jb, 1);
printf("Should not get here.\n");
return 1;
}
This program prints:
x = 666
And then exits. The flow is:
- On the first call to
setjmp, it returns 0 and saves its caller's context injb. - Then we set
xto 666 andlongjmptojbwith the value 1. setjmpreturns 1, so weprintfand exit.
Note that we have to mark x as volatile for the program to reliably print 666. Otherwise, the compiler is allowed to optimize the access to x and have it return 42 instead. This might happen in the following ways:
- The compiler could constant fold
xto 42. This will happen in the example if we removevolatileand use any optimization level above-O0. Thenx = 42gets printed. - Say that constant folding doesn't happen, maybe because we insert a
asm("" : "+r"(x))right after the definition ofx. In that case, the compiler could register-allocatexin a callee-save register, in which case the register ends up saved bysetjmp. This also leads tox = 42being printed. - Say that we experience register pressure for some reason, and
xdoesn't make it into a callee-save register, but instead gets spilled. At any optimization level above-O0, the compiler will splitxinto two variables: one forx = 42and one forx = 666, and the printf will reference the first one (sincex = 42dominates theprintf). Those two variables will almost always get separate spill slots. Hence, when we come out of thesetjmpthe second time, readingxwill still give 42.
Three things to reflect upon:
- To get the property that
x's value is observed to be 666 in the printf, we need to make sure that the compiler treatsxas a stack allocation rather than a variable. Usingvolatileachieves this. Also, passing a pointer toxto anywhere is likely to accomplish this. - Spill slots are not the same as stack allocations. If a variable is stack-allocated, then it will get one stack allocation. If a variable is spilled, it may get multiple spills (often, a separate spill per assignment).
- The compiler is allowed to analyze the lifetime of spill slots and stack allocations. It's allowed to reuse spill slots. How does the compiler know that the
x = 42spill slot should stay alive until thelongjmphappens? How come it won't get reused, resulting inxhaving either 666 or any random garbage when we fall out of thesetjmpa second time?
Here's a more diabolical version of the example that triggers spilling of x to two different spill slots (one for 42 and one for 666) in gcc, clang, and filcc.
#include <setjmp.h>
#include <stdio.h>
int main(int argc, char** argv)
{
int x = 42;
asm volatile("" : "+r"(x));
jmp_buf jb;
int a = 1, b = 2, c = 3, d = 4, e = 5, f = 6, g = 7, h = 9, i = 10;
/* Force some spilling */
asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
if (setjmp(jb)) {
asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
printf("x = %d\n", x);
return 0;
}
x = 666;
void (*jump)(jmp_buf, int) = longjmp;
asm volatile("" : "+r"(x));
asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
jump(jb, 1);
asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
asm volatile("" : "+r"(x));
printf("Should not get here.\n");
return 1;
}
This program will print x = 42 even though x is not constant folded or register-allocated.
Note that all of the examples so far work in Fil-C. Even the inline assembly that we're using to obfuscate variable values works in Fil-C, and has the desired effect.
What Is Even Happening
Let's take a look at how simple setjmp is by looking at the musl implementation on x86_64:
__setjmp:
_setjmp:
setjmp:
mov %rbx,(%rdi) /* rdi is jmp_buf, move registers onto it */
mov %rbp,8(%rdi)
mov %r12,16(%rdi)
mov %r13,24(%rdi)
mov %r14,32(%rdi)
mov %r15,40(%rdi)
lea 8(%rsp),%rdx /* this is our rsp WITHOUT current ret addr */
mov %rdx,48(%rdi)
mov (%rsp),%rdx /* save return addr ptr for new rip */
mov %rdx,56(%rdi)
xor %eax,%eax /* always return 0 */
ret
This is only saving the callee-save registers, plus the stack pointer and instruction pointer as they were at the callsite. It's not saving the stack itself.
Later, when longjmp is called, the register state is restored with only one difference: %eax (the return value register) will get the argument passed to longjmp.
Hence, the most basic safety issue with setjmp is that if we call it and then return from the function that had called it, the context saved by setjmp is not valid to longjmp to. Jumping to such a context will result in a torn machine state:
The callee-save registers, stack pointer, and instruction pointer will be exactly as they had been at the time that
setjmphad been called.The stack contents - in particular, the frame that the stack pointer is pointing at - will be whatever they were at the time that
longjmphad been called.
longjmp is only safe if it's called at a time when the stack frame used by setjmp could not have possibly been overwritten, since that is the only way to guarantee that the register state restored by longjmp matches the stack frame that the stack pointer points to. The easiest way to guarantee this is to ensure that longjmp is only called from within the function that called setjmp, or from some function called by the function that called setjmp (transitively).
But that's not all!
The compiler has to know that setjmp returns twice to ensure that spill slots are not reused unsoundly. In fact, compilers detect calls to setjmp and treat the functions that call it specially by disabling any optimization that would lead to a reuse of spill slots. This is surfaced a bit to compiler users with the returns_twice attribute.
Let's consider our diabolical example, but with the setjmp call obfuscated:
#include <setjmp.h>
#include <stdio.h>
int main(int argc, char** argv)
{
int x = 42;
asm volatile("" : "+r"(x));
jmp_buf jb;
int a = 1, b = 2, c = 3, d = 4, e = 5, f = 6, g = 7, h = 9, i = 10;
asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
int (*setjump)(jmp_buf) = setjmp;
asm volatile("" : "+r"(setjump));
if (setjump(jb)) {
asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
printf("x = %d\n", x);
return 0;
}
x = 666;
void (*jump)(jmp_buf, int) = longjmp;
asm volatile("" : "+r"(x));
asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
jump(jb, 1);
asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
asm volatile("" : "+r"(x));
printf("Should not get here.\n");
return 1;
}
Now, the results I see are:
- With gcc version 11.4.0, the program prints
x = 666. - With clang version 14.0.0, the program prints garbage like
x = -291233296. - filcc refuses to compile the program. In fact, it ICEs. (Fil-C has a longstanding bug that it emits compile-time diagnostics with internal compiler errors instead of printing something useful.)
The unsafe thing that is happening (and that Fil-C prevents by refusing to compile this program) is that if the compiler compiles a call to setjmp without knowing that it's calling setjmp then the spill slot used by x = 42 might get reused by some other variable in the code after the if (setjmp) { ... }.
Putting It All Together
Fil-C makes longjmp/setjmp memory safe by ensuring that:
- The
jmp_bufjust contains a pointer to an opaquezjmp_bufobject. The contents of this object cannot be accessed from Fil-C. Only the Fil-C runtime can manipulate it. This works because most code never inspects the innards ofjmp_buf(and code that does will not work in Fil-C). Note that if you do overwritejmp_buf, then you'll most likely cause a Fil-C panic when you try tolongjmp, because the internal implementation oflongjmpwill check that it can load azjmp_buffrom thejmp_buf. There is no way to spoof azjmp_buffrom Fil-C. - It's only possible to mention the
setjmpsymbol by calling it directly; anything else will ICE the compiler. (And in the future, it might even cause the compiler to emit a proper diagnostic.) This ensures that the compiler's machinery for recognizingsetjmp(and inhibiting spill slot reuse) always works. - The
setjmpcall is compiled to allocate a newzjmp_bufopaque object and to register thezjmp_bufwith the stack frame. Each stack frame can tell you the weak set ofzjmp_bufsthat are valid jump targets for that frame. (It's weak because if thezjmp_buf's are otherwise unreachable, they are removed from that set.) longjmppanics unless it is called from a stack frame that is an ancestor of a stack frame that considers thezjmp_bufto be valid. This is accomplished by walking the stack and asking each stack frame: do you have myzjmp_bufin your set? Note that repeatedsetjmps on the samejmp_bufcreate newzjmp_bufs, and thezjmp_bufis immutable. Hence, membership in the weak set really means that thelongjmpcall is from an ancestor frame.- The Fil-C implementations of
longjmpandsetjmpsave and restore a lot of internal runtime state, including the state needed to track GC roots. In particular,zjmp_bufholds a copy of the GC roots of the frame at the time thatsetjmpwas called. So long as thezjmp_bufis live, we'll continue to mark those roots.
The sketchiest part of this is that the Fil-C runtime strongly assumes that if a pointer variable was materialized as an SSA value in LLVM IR at the time that the FilPizlonator runs, then longjmping restores that value to the state it had at the time of the setjmp, so long as the setjmp is flagged as returning twice. I have so far confirmed that this is the case, but it's extremely confusing - if there was a bug in my longjmp/setjmp, this is where it would be, and it would manifest as follows: after the longjmp, the GC's view of the stack frame's roots is as if all of the local pointers were restored to their values before the setjmp, but some pointer's value was not restored and has a new value from after the setjmp call but before the longjmp call. Note that you cannot trigger the bug with something like making a pointer volatile, since that causes the pointer to be a stack allocation, not an SSA value - and in that case, my transformation does the right thing (the "pointer" really ends up being an object in the heap, and the SSA value is a pointer to that pointer box).
Assuming my analysis of this hideous abomination is right, these rules are sufficient to allow almost all safe uses of longjmp/setjmp while prohibiting any possible use that corrupts the stack or causes any possible violation of the Fil-C capability model.
Making ucontext Memory Safe
It's almost possible to use setjmp/longjmp in to implement fibers in Yolo-C. But two problems arise if we try to do this:
- Fibers need a context switch that simultaneously restores some state (the
longjmp) while saving the the state (thesetjmp). It's extremely confusing to write this in terms ofsetjmp/longjmp. - It's not obvious how to bootstrap when we start a new fiber. We want to allocate a stack and produce a
jmp_bufthat we canlongjmpto so that we start running the main function of the newly created fiber.
It turns out you can do this with the sigaltstack hack, but as brilliant as this hack is, folks usually prefer to use the much nicer ucontext APIs:
getcontext snapshots the current state into a context. This is like setjmp, though it's rarely used that way; it's mostly used for prepopulating a ucontext_t before calling makecontext.
setcontext is a one-way context switch to a context (it does not save the state before switching). This is mostly just used for exiting a fiber.
makecontext creates a new context that is bootstrapped to call some main function. In a bizarre twist of history, this function's contract requires a prior call to getcontext even though it mostly overwrites all of the state snapshotted by getcontext. Most modern uses of getcontext are just due to this twist.
swapcontext is a context switch that simultaneously saves the current context to one ucontext_t and switches to another ucontext_t.
Here's an example of how to use this API from the Linux man pages (I made some small changes to reduce its size):
#include <ucontext.h>
#include <stdio.h>
#include <stdlib.h>
static ucontext_t uctx_main, uctx_func1, uctx_func2;
static void func1(void)
{
printf("func1: swapcontext(&uctx_func1, &uctx_func2)\n");
swapcontext(&uctx_func1, &uctx_func2);
printf("func1: returning\n");
}
static void func2(void)
{
printf("func2: swapcontext(&uctx_func2, &uctx_func1)\n");
swapcontext(&uctx_func2, &uctx_func1);
printf("func2: returning\n");
}
int main()
{
char func1_stack[16384];
char func2_stack[16384];
getcontext(&uctx_func1);
uctx_func1.uc_stack.ss_sp = func1_stack;
uctx_func1.uc_stack.ss_size = sizeof(func1_stack);
uctx_func1.uc_link = &uctx_main;
makecontext(&uctx_func1, func1, 0);
getcontext(&uctx_func2);
uctx_func2.uc_stack.ss_sp = func2_stack;
uctx_func2.uc_stack.ss_size = sizeof(func2_stack);
uctx_func2.uc_link = &uctx_func1;
makecontext(&uctx_func2, func2, 0);
printf("main: swapcontext(&uctx_main, &uctx_func2)\n");
swapcontext(&uctx_main, &uctx_func2);
printf("main: exiting\n");
return 0;
}
This program prints:
main: swapcontext(&uctx_main, &uctx_func2)
func2: swapcontext(&uctx_func2, &uctx_func1)
func1: swapcontext(&uctx_func1, &uctx_func2)
func2: returning
func1: returning
main: exiting
Some notes:
- We aren't using
setcontext. Lots of users of this API never usesetcontext.setcontextis only useful if you're usinggetcontext/setcontextas replacements forlongjmp/setjmp(which I've never seen any code do in the wild), or if you're doing a final context switch away from a context, and so you don't need to save your context. getcontextonly serves one purpose: to initialize those parts of theucontext_tthatmakecontextdoesn't initialize. The contract here is bizarre. If we don't callgetcontext, then we would have to somehow initialize all of the parts ofucontext_tthat we're not supposed to know about (fields that are defined in the system header, but that aren't part of the published API). Also, we'd have to remember to initializeuc_sigmask(getcontextinitializes it to the current sigmask). On Linux/X86_64, the other fields thatgetcontextinitializes are mostly to do with FPU exception state. The fact that this contract is so opaque is going to help us make this API memory-safe.- Once we start using
func1_stackandfunc2_stackin the contexts, we aren't supposed to rely on their contents anymore. We could read it, but then we aren't guaranteed anything about what is in there. And if we write to it, then all bets are off. In this example, we aren't giving these any guard pages, so these stacks are strictly less secure than the normal kind of stack that a thread gets. - It's purely convention that the first argument of
swapcontextis the context that we were running at the time we called it. We could have passed any context that is safe to overwrite. We can see this in the example with the initialswapcontext(&uctx_main, &uctx_func2)call. Here,uctx_mainhas not been initialized prior to that call, and we're picking it as the context to hold the main thread's context. This is fine. - Note the use of
uc_link- this is the context tosetcontextto when a context's main function returns.
For making this API memory-safe, we'll focus on the idiom above where getcontext is only for initializing ucontext before a call to makecontext.
Laws For Safe ucontext
Let's enumerate the laws we will enforce for ucontext. Note that these laws are more restrictive than what is strictly necessary to make ucontext memory-safe, but I wanted to start with the most conservative possible implementation that is useful to real users of the API.
Opaque state. We'll repeat the trick we used to make jmp_buf safe: inside the ucontext_t, we'll have a pointer to an opaque zfiber_context object. Fil-C code cannot access zfiber_context except by calling its API in pizlonated_runtime.h.
ss_sp doesn't matter. The implementation completely ignores the stack you provided in the ss_sp field. Internally, zfiber_context will allocate a stack that you cannot see. It will use your ss_size as the size of that stack (but it will add some padding that's necessary for Fil-C's stack overflow handling to work). The stack is allocated when you call makecontext.
zfiber_context has a restricted state machine. The states are:
- uninitialized - this is the initial state after a
zfiber_contextis allocated. It's only legal to callzfiber_context_getcontextor use the context as thefromargument (the first argument) tozfiber_context_swapcontext. - after_getcontext - this is the state after
getcontextreturns. In this state, it's only legal to callzfiber_context_makecontextor use the context as thefromargument tozfiber_context_swapcontext. - runnable - these states are the result of calling
makecontextor after you pass the context as thefromargument toswapcontext. When in this state, it's only legal to callsetcontextor use the context as thetoargument toswapcontext. - running - this state happens after you start running the context using
setcontextorswapcontext. It's only legal to pass the context as thefromargument toswapcontextprovided that this is the currently running context on the calling thread (each thread tracks the currently running context). Note that some context could be running but we switch away from it either usingsetcontextorswapcontextwith thefromargument being some other uninitialized context; in that case the context will remain in the running state forever, since the only thing you can do with a running context is swap from it and that only works if the running context is the currently running one according to the calling thread.
This state machine forbids using ucontext for longjmp/setjmp because you cannot switch to a after_getcontext context. You can only switch to a runnable context, and the only way to get one is to either makecontext a new one or to save the current context using swapcontext.
Thread affinity. The Fil-C ABI threads the filc_thread* through every function call and because the compiler is allowed to expect that no function call can ever change the filc_thread* that we're running on. This means that we cannot allow ucontext to cause a stack that had run on one thread to run on any other thread. Hence, zfiber_context tracks which zthread it was created on and disallows any calls into any zfiber_context API from other threads.
GC Integration
When the GC asks the zfiber_context object to mark its outgoing pointers during the mark phase and the context is runnable, the zfiber_context has to do the equivalent of what threads do when a stack scan is requested during a soft handshake.
But what if the following happens during a single GC mark phase:
- The GC marks a runnable
zfiber_contextand puts it on the mark stack. - The GC pops the
zfiber_contextfrom the mark stack and marks its outgoing pointers. Let's say that the context is still runnable. So, we scan its stack. - Mutator switches to that
zfiber_contextusing eithersetcontextorswapcontext. Now, thezfiber_contextis running, so its stack is not visible to the GC. This is fine, since the stack is owned by a thread, and the GC uses grey stacks; i.e. it will always rescan the stacks before declaring termination. - Mutator switches away from that
zfiber_contextusingswapcontext, making thezfiber_contextrunnable again.
Now we have a problem! The GC was expecting that whatever was on the stack doesn't need to be actively tracked by any barriers because we'll just rescan the grey stacks before termination. But now, that stack is no longer owned by any thread; instead it's owned by a runnable zfiber_context. Worse, that zfiber_context is black: we not only set its mark bit but we already popped it off the mark stack and marked its outgoing pointers - so the GC will not visit it again!
The way we solve this is by tracking grey zfiber_contexts. When we swapcontext from a context during marking, if the context is not already grey, then we add it to the current thread's grey_fibers list and set its grey bit. Whenever a thread is asked to rescan its stack, it reruns the stack walk of every grey fiber in its list, clears the grey bits of those fibers, and clears the list.
There's a fun almost-race at termination that may happen due to the use of soft handshakes. In an on-the-fly GC, we may have the following sequence of events:
- The GC runs out of work, so it triggers a soft handshake to scan all stacks.
- Thread 1 performs a stack scan, including walking and resetting its grey fibers, and finds no new objects.
- Thread 1
swapcontexts to a different context, causing its grey fibers to be nonempty. - Thread 2 performs a stack scan and finds no new objects.
- There are no other threads, and since none of the threads found any new objects, the GC declares termination even though thread 1 has a grey fiber.
Currently, we just reset the grey fiber lists after termination. Reason: if thread 1 found no new objects to mark in step 2, thread 2 also found no new live objects, and the GC was out of work in step 1, then there's no way that any other unmarked objects could have been introduced into the context before we swapped from it. This is true because:
- Newly allocated objects are marked.
- Objects stored into the heap are marked by the barrier at time of store.
- The GC having run out of work means all objects that had been marked had all of their outgoing pointers marked.
There's simply no way that thread 1 could have loaded an unmarked pointer from the heap in this scenario!
That being said, if there was a bug in my ucontext implementation, this is where it would be.
Putting It All Together
We only support ucontext APIs in the glibc build of Fil-C. Hence, you get it in /opt/fil and Pizlix, but not in the pizfix. It's implemented as follows:
getcontext allocates a new zfiber_context with zfiber_context_new and calls zfiber_context_bind_sigset (to cause zfiber_context to replicate its internal sigmask with the user-visible uc_sigmask) and then zfiber_context_getcontext.
setcontext calls zfiber_context_setcontext.
makecontext creates its own trampoline that manages passing arguments to the user-passed main function. It also manages handling fiber exit (switching to uc_link or calling exit). Other than that, it just calls zfiber_context_makecontext.
swapcontext mostly just does a zfiber_context_swapcontext. If the from context (aka the oucp) was not initialized, then it allocates a zfiber_context for it using zfiber_context_new and calls zfiber_context_bind_sigset.
Note that Fil-C does not allow using longjmp/setjmp as an alternate context switch path for ucontext. Some software mixes ucontext with longjmp/setjmp, though all cases of this that I've found (OpenSSL, I'm looking at you) has flags to disable the mixing, because Fil-C isn't the only security technology that breaks if you do that.
Conclusion
Fil-C supports memory-safe context switches using either the longjmp/setjmp style and the ucontext style. The ucontext style is new since after version 0.680, so you'll need to build from source to play with it (and it's not yet thoroughly tested). The longjmp/setjmp implementation is older, and probably quite rugged by now.
As this document shows, it's possible to have memory-safe C even if you make the effort to support even the most depraved features!