Memory Safe Context Switching

Support for ucontext APIs is new since release 0.680. If you want to play with setcontext, getcontext, makecontext, and swapcontext then you have to build from source.

This document describes how Fil-C supports longjmp, setjmp, setcontext, getcontext, makecontext, and swapcontext in a totally memory-safe way. In particular, no misuse of those APIs in Fil-C can lead to stack corruption or any other violation of Fil-C's capability model.

These APIs are widely used:

longjmp and setjmp are used in C programs to implement exception handling. It's especially common to use them to implement exceptions "thrown" from signal handlers.
getcontext, setcontext, makecontext, and swapcontext (aka the ucontext APIs) are used to implement coroutines and fibers. For example, Boost uses ucontext as part of its fiber implementation.

The ucontext APIs are less commonly used than longjmp/setjmp and some OSes (like Darwin) have deprecated them. However, they remain well supported in glibc.

Implementing these APIs in a way that preserves memory safety is hard since their misuse can result in restoring a dangling stack. For example, you could either setjmp or getcontext within some function, and then do any of the following things:

Return from that function. At this point, the context that was saved will attempt to restore a stack frame that no longer exists.
Exit from the thread. At this point, the context that was saved will attempt to restore execution on a stack that has been freed.

Even more friendly APIs like makecontext and swapcontext can be straightforwardly misused:

You can use makecontext to create a context that points to some stack, then free that stack, and then either swapcontext or setcontext to that context. In Yolo-C, this will result in running on a dangling stack. Fil-C makes this not an error.
You can call swapcontext with the second argument being the context that is currently executing. This might happen if you confuse the first and second arguments. In Yolo-C, in the best case, this will behave like a longjmp; in the worst case, it will result in executing on a dangling stack. In Fil-C, this is a safety error that panics your program.

In Yolo-C, execution on a dangling stack results in the most confusing kinds of crashes, since the debugger won't even be able to print a stack trace! Worse, if the program has subtle bugs in its handling of contexts, then an attacker could exploit those bugs to cause the program to do whatever the attacker likes. In Fil-C, execution on a dangling stack is not possible: all such cases are either panics at the point where you misused longjmp or one of the ucontext APIs, or they are reliably legal execution because of how Fil-C manages stacks.

Fil-C implements setjmp/longjmp and the ucontext APIs quite differently.

Making `setjmp`/`longjmp` Memory Safe

There is an impressive amount of depth to the depravity of setjmp. Before going into the details of how Fil-C implements setjmp/longjmp, we need to discuss exactly what makes this function so amazingly evil.

setjmp saves the context as it was at the moment when it was called so that when longjmp is called later, setjmp will return a second time. It is the fact that it returns twice that makes it so vile, and so we need to understand the implications precisely.

An Example

Consider this simple program:

#include <setjmp.h>
#include <stdio.h>

int main(int argc, char** argv)
{
    volatile int x = 42;
    jmp_buf jb;
    if (setjmp(jb)) {
        printf("x = %d\n", x);
        return 0;
    }
    x = 666;
    longjmp(jb, 1);
    printf("Should not get here.\n");
    return 1;
}

This program prints:

x = 666

And then exits. The flow is:

On the first call to setjmp, it returns 0 and saves its caller's context in jb.
Then we set x to 666 and longjmp to jb with the value 1.
setjmp returns 1, so we printf and exit.

Note that we have to mark x as volatile for the program to reliably print 666. Otherwise, the compiler is allowed to optimize the access to x and have it return 42 instead. This might happen in the following ways:

The compiler could constant fold x to 42. This will happen in the example if we remove volatile and use any optimization level above -O0. Then x = 42 gets printed.
Say that constant folding doesn't happen, maybe because we insert a asm("" : "+r"(x)) right after the definition of x. In that case, the compiler could register-allocate x in a callee-save register, in which case the register ends up saved by setjmp. This also leads to x = 42 being printed.
Say that we experience register pressure for some reason, and x doesn't make it into a callee-save register, but instead gets spilled. At any optimization level above -O0, the compiler will split x into two variables: one for x = 42 and one for x = 666, and the printf will reference the first one (since x = 42 dominates the printf). Those two variables will almost always get separate spill slots. Hence, when we come out of the setjmp the second time, reading x will still give 42.

Three things to reflect upon:

To get the property that x's value is observed to be 666 in the printf, we need to make sure that the compiler treats x as a stack allocation rather than a variable. Using volatile achieves this. Also, passing a pointer to x to anywhere is likely to accomplish this.
Spill slots are not the same as stack allocations. If a variable is stack-allocated, then it will get one stack allocation. If a variable is spilled, it may get multiple spills (often, a separate spill per assignment).
The compiler is allowed to analyze the lifetime of spill slots and stack allocations. It's allowed to reuse spill slots. How does the compiler know that the x = 42 spill slot should stay alive until the longjmp happens? How come it won't get reused, resulting in x having either 666 or any random garbage when we fall out of the setjmp a second time?

Here's a more diabolical version of the example that triggers spilling of x to two different spill slots (one for 42 and one for 666) in gcc, clang, and filcc.

#include <setjmp.h>
#include <stdio.h>

int main(int argc, char** argv)
{
    int x = 42;
    asm volatile("" : "+r"(x));
    jmp_buf jb;
    int a = 1, b = 2, c = 3, d = 4, e = 5, f = 6, g = 7, h = 9, i = 10;
    /* Force some spilling */
    asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    if (setjmp(jb)) {
        asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
        printf("x = %d\n", x);
        return 0;
    }
    x = 666;
    void (*jump)(jmp_buf, int) = longjmp;
    asm volatile("" : "+r"(x));
    asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    jump(jb, 1);
    asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    asm volatile("" : "+r"(x));
    printf("Should not get here.\n");
    return 1;
}

This program will print x = 42 even though x is not constant folded or register-allocated.

Note that all of the examples so far work in Fil-C. Even the inline assembly that we're using to obfuscate variable values works in Fil-C, and has the desired effect.

What Is Even Happening

Let's take a look at how simple setjmp is by looking at the musl implementation on x86_64:

__setjmp:
_setjmp:
setjmp:
    mov %rbx,(%rdi)         /* rdi is jmp_buf, move registers onto it */
    mov %rbp,8(%rdi)
    mov %r12,16(%rdi)
    mov %r13,24(%rdi)
    mov %r14,32(%rdi)
    mov %r15,40(%rdi)
    lea 8(%rsp),%rdx        /* this is our rsp WITHOUT current ret addr */
    mov %rdx,48(%rdi)
    mov (%rsp),%rdx         /* save return addr ptr for new rip */
    mov %rdx,56(%rdi)
    xor %eax,%eax           /* always return 0 */
    ret

This is only saving the callee-save registers, plus the stack pointer and instruction pointer as they were at the callsite. It's not saving the stack itself.

Later, when longjmp is called, the register state is restored with only one difference: %eax (the return value register) will get the argument passed to longjmp.

Hence, the most basic safety issue with setjmp is that if we call it and then return from the function that had called it, the context saved by setjmp is not valid to longjmp to. Jumping to such a context will result in a torn machine state:

The callee-save registers, stack pointer, and instruction pointer will be exactly as they had been at the time that setjmp had been called.
The stack contents - in particular, the frame that the stack pointer is pointing at - will be whatever they were at the time that longjmp had been called.

longjmp is only safe if it's called at a time when the stack frame used by setjmp could not have possibly been overwritten, since that is the only way to guarantee that the register state restored by longjmp matches the stack frame that the stack pointer points to. The easiest way to guarantee this is to ensure that longjmp is only called from within the function that called setjmp, or from some function called by the function that called setjmp (transitively).

But that's not all!

The compiler has to know that setjmp returns twice to ensure that spill slots are not reused unsoundly. In fact, compilers detect calls to setjmp and treat the functions that call it specially by disabling any optimization that would lead to a reuse of spill slots. This is surfaced a bit to compiler users with the returns_twice attribute.

Let's consider our diabolical example, but with the setjmp call obfuscated:

#include <setjmp.h>
#include <stdio.h>

int main(int argc, char** argv)
{
    int x = 42;
    asm volatile("" : "+r"(x));
    jmp_buf jb;
    int a = 1, b = 2, c = 3, d = 4, e = 5, f = 6, g = 7, h = 9, i = 10;
    asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    int (*setjump)(jmp_buf) = setjmp;
    asm volatile("" : "+r"(setjump));
    if (setjump(jb)) {
        asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
        printf("x = %d\n", x);
        return 0;
    }
    x = 666;
    void (*jump)(jmp_buf, int) = longjmp;
    asm volatile("" : "+r"(x));
    asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    jump(jb, 1);
    asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    asm volatile("" : "+r"(x));
    printf("Should not get here.\n");
    return 1;
}

Now, the results I see are:

With gcc version 11.4.0, the program prints x = 666.
With clang version 14.0.0, the program prints garbage like x = -291233296.
filcc refuses to compile the program. In fact, it ICEs. (Fil-C has a longstanding bug that it emits compile-time diagnostics with internal compiler errors instead of printing something useful.)

The unsafe thing that is happening (and that Fil-C prevents by refusing to compile this program) is that if the compiler compiles a call to setjmp without knowing that it's calling setjmp then the spill slot used by x = 42 might get reused by some other variable in the code after the if (setjmp) { ... }.

Putting It All Together

Fil-C makes longjmp/setjmp memory safe by ensuring that:

The jmp_buf just contains a pointer to an opaque zjmp_buf object. The contents of this object cannot be accessed from Fil-C. Only the Fil-C runtime can manipulate it. This works because most code never inspects the innards of jmp_buf (and code that does will not work in Fil-C). Note that if you do overwrite jmp_buf, then you'll most likely cause a Fil-C panic when you try to longjmp, because the internal implementation of longjmp will check that it can load a zjmp_buf from the jmp_buf. There is no way to spoof a zjmp_buf from Fil-C.
It's only possible to mention the setjmp symbol by calling it directly; anything else will ICE the compiler. (And in the future, it might even cause the compiler to emit a proper diagnostic.) This ensures that the compiler's machinery for recognizing setjmp (and inhibiting spill slot reuse) always works.
The setjmp call is compiled to allocate a new zjmp_buf opaque object and to register the zjmp_buf with the stack frame. Each stack frame can tell you the weak set of zjmp_bufs that are valid jump targets for that frame. (It's weak because if the zjmp_buf's are otherwise unreachable, they are removed from that set.)
longjmp panics unless it is called from a stack frame that is an ancestor of a stack frame that considers the zjmp_buf to be valid. This is accomplished by walking the stack and asking each stack frame: do you have my zjmp_buf in your set? Note that repeated setjmps on the same jmp_buf create new zjmp_bufs, and the zjmp_buf is immutable. Hence, membership in the weak set really means that the longjmp call is from an ancestor frame.
The Fil-C implementations of longjmp and setjmp save and restore a lot of internal runtime state, including the state needed to track GC roots. In particular, zjmp_buf holds a copy of the GC roots of the frame at the time that setjmp was called. So long as the zjmp_buf is live, we'll continue to mark those roots.

The sketchiest part of this is that the Fil-C runtime strongly assumes that if a pointer variable was materialized as an SSA value in LLVM IR at the time that the FilPizlonator runs, then longjmping restores that value to the state it had at the time of the setjmp, so long as the setjmp is flagged as returning twice. I have so far confirmed that this is the case, but it's extremely confusing - if there was a bug in my longjmp/setjmp, this is where it would be, and it would manifest as follows: after the longjmp, the GC's view of the stack frame's roots is as if all of the local pointers were restored to their values before the setjmp, but some pointer's value was not restored and has a new value from after the setjmp call but before the longjmp call. Note that you cannot trigger the bug with something like making a pointer volatile, since that causes the pointer to be a stack allocation, not an SSA value - and in that case, my transformation does the right thing (the "pointer" really ends up being an object in the heap, and the SSA value is a pointer to that pointer box).

Assuming my analysis of this hideous abomination is right, these rules are sufficient to allow almost all safe uses of longjmp/setjmp while prohibiting any possible use that corrupts the stack or causes any possible violation of the Fil-C capability model.

Making `ucontext` Memory Safe

It's almost possible to use setjmp/longjmp in to implement fibers in Yolo-C. But two problems arise if we try to do this:

Fibers need a context switch that simultaneously restores some state (the longjmp) while saving the the state (the setjmp). It's extremely confusing to write this in terms of setjmp/longjmp.
It's not obvious how to bootstrap when we start a new fiber. We want to allocate a stack and produce a jmp_buf that we can longjmp to so that we start running the main function of the newly created fiber.

It turns out you can do this with the sigaltstack hack, but as brilliant as this hack is, folks usually prefer to use the much nicer ucontext APIs:

getcontext snapshots the current state into a context. This is like setjmp, though it's rarely used that way; it's mostly used for prepopulating a ucontext_t before calling makecontext.

setcontext is a one-way context switch to a context (it does not save the state before switching). This is mostly just used for exiting a fiber.

makecontext creates a new context that is bootstrapped to call some main function. In a bizarre twist of history, this function's contract requires a prior call to getcontext even though it mostly overwrites all of the state snapshotted by getcontext. Most modern uses of getcontext are just due to this twist.

swapcontext is a context switch that simultaneously saves the current context to one ucontext_t and switches to another ucontext_t.

Here's an example of how to use this API from the Linux man pages (I made some small changes to reduce its size):

#include <ucontext.h>
#include <stdio.h>
#include <stdlib.h>

static ucontext_t uctx_main, uctx_func1, uctx_func2;

static void func1(void)
{
    printf("func1: swapcontext(&uctx_func1, &uctx_func2)\n");
    swapcontext(&uctx_func1, &uctx_func2);
    printf("func1: returning\n");
}

static void func2(void)
{
    printf("func2: swapcontext(&uctx_func2, &uctx_func1)\n");
    swapcontext(&uctx_func2, &uctx_func1);
    printf("func2: returning\n");
}

int main()
{
    char func1_stack[16384];
    char func2_stack[16384];

    getcontext(&uctx_func1);
    uctx_func1.uc_stack.ss_sp = func1_stack;
    uctx_func1.uc_stack.ss_size = sizeof(func1_stack);
    uctx_func1.uc_link = &uctx_main;
    makecontext(&uctx_func1, func1, 0);

    getcontext(&uctx_func2);
    uctx_func2.uc_stack.ss_sp = func2_stack;
    uctx_func2.uc_stack.ss_size = sizeof(func2_stack);
    uctx_func2.uc_link = &uctx_func1;
    makecontext(&uctx_func2, func2, 0);

    printf("main: swapcontext(&uctx_main, &uctx_func2)\n");
    swapcontext(&uctx_main, &uctx_func2);

    printf("main: exiting\n");
    return 0;
}

This program prints:

main: swapcontext(&uctx_main, &uctx_func2)
func2: swapcontext(&uctx_func2, &uctx_func1)
func1: swapcontext(&uctx_func1, &uctx_func2)
func2: returning
func1: returning
main: exiting

Some notes:

We aren't using setcontext. Lots of users of this API never use setcontext. setcontext is only useful if you're using getcontext/setcontext as replacements for longjmp/setjmp (which I've never seen any code do in the wild), or if you're doing a final context switch away from a context, and so you don't need to save your context.
getcontext only serves one purpose: to initialize those parts of the ucontext_t that makecontext doesn't initialize. The contract here is bizarre. If we don't call getcontext, then we would have to somehow initialize all of the parts of ucontext_t that we're not supposed to know about (fields that are defined in the system header, but that aren't part of the published API). Also, we'd have to remember to initialize uc_sigmask (getcontext initializes it to the current sigmask). On Linux/X86_64, the other fields that getcontext initializes are mostly to do with FPU exception state. The fact that this contract is so opaque is going to help us make this API memory-safe.
Once we start using func1_stack and func2_stack in the contexts, we aren't supposed to rely on their contents anymore. We could read it, but then we aren't guaranteed anything about what is in there. And if we write to it, then all bets are off. In this example, we aren't giving these any guard pages, so these stacks are strictly less secure than the normal kind of stack that a thread gets.
It's purely convention that the first argument of swapcontext is the context that we were running at the time we called it. We could have passed any context that is safe to overwrite. We can see this in the example with the initial swapcontext(&uctx_main, &uctx_func2) call. Here, uctx_main has not been initialized prior to that call, and we're picking it as the context to hold the main thread's context. This is fine.
Note the use of uc_link - this is the context to setcontext to when a context's main function returns.

For making this API memory-safe, we'll focus on the idiom above where getcontext is only for initializing ucontext before a call to makecontext.

Laws For Safe `ucontext`

Let's enumerate the laws we will enforce for ucontext. Note that these laws are more restrictive than what is strictly necessary to make ucontext memory-safe, but I wanted to start with the most conservative possible implementation that is useful to real users of the API.

Opaque state. We'll repeat the trick we used to make jmp_buf safe: inside the ucontext_t, we'll have a pointer to an opaque zfiber_context object. Fil-C code cannot access zfiber_context except by calling its API in pizlonated_runtime.h.

ss_sp doesn't matter. The implementation completely ignores the stack you provided in the ss_sp field. Internally, zfiber_context will allocate a stack that you cannot see. It will use your ss_size as the size of that stack (but it will add some padding that's necessary for Fil-C's stack overflow handling to work). The stack is allocated when you call makecontext.

zfiber_context has a restricted state machine. The states are:

uninitialized - this is the initial state after a zfiber_context is allocated. It's only legal to call zfiber_context_getcontext or use the context as the from argument (the first argument) to zfiber_context_swapcontext.
after_getcontext - this is the state after getcontext returns. In this state, it's only legal to call zfiber_context_makecontext or use the context as the from argument to zfiber_context_swapcontext.
runnable - these states are the result of calling makecontext or after you pass the context as the from argument to swapcontext. When in this state, it's only legal to call setcontext or use the context as the to argument to swapcontext.
running - this state happens after you start running the context using setcontext or swapcontext. It's only legal to pass the context as the from argument to swapcontext provided that this is the currently running context on the calling thread (each thread tracks the currently running context). Note that some context could be running but we switch away from it either using setcontext or swapcontext with the from argument being some other uninitialized context; in that case the context will remain in the running state forever, since the only thing you can do with a running context is swap from it and that only works if the running context is the currently running one according to the calling thread.

This state machine forbids using ucontext for longjmp/setjmp because you cannot switch to a after_getcontext context. You can only switch to a runnable context, and the only way to get one is to either makecontext a new one or to save the current context using swapcontext.

Thread affinity. The Fil-C ABI threads the filc_thread* through every function call and because the compiler is allowed to expect that no function call can ever change the filc_thread* that we're running on. This means that we cannot allow ucontext to cause a stack that had run on one thread to run on any other thread. Hence, zfiber_context tracks which zthread it was created on and disallows any calls into any zfiber_context API from other threads.

GC Integration

When the GC asks the zfiber_context object to mark its outgoing pointers during the mark phase and the context is runnable, the zfiber_context has to do the equivalent of what threads do when a stack scan is requested during a soft handshake.

But what if the following happens during a single GC mark phase:

The GC marks a runnable zfiber_context and puts it on the mark stack.
The GC pops the zfiber_context from the mark stack and marks its outgoing pointers. Let's say that the context is still runnable. So, we scan its stack.
Mutator switches to that zfiber_context using either setcontext or swapcontext. Now, the zfiber_context is running, so its stack is not visible to the GC. This is fine, since the stack is owned by a thread, and the GC uses grey stacks; i.e. it will always rescan the stacks before declaring termination.
Mutator switches away from that zfiber_context using swapcontext, making the zfiber_context runnable again.

Now we have a problem! The GC was expecting that whatever was on the stack doesn't need to be actively tracked by any barriers because we'll just rescan the grey stacks before termination. But now, that stack is no longer owned by any thread; instead it's owned by a runnable zfiber_context. Worse, that zfiber_context is black: we not only set its mark bit but we already popped it off the mark stack and marked its outgoing pointers - so the GC will not visit it again!

The way we solve this is by tracking grey zfiber_contexts. When we swapcontext from a context during marking, if the context is not already grey, then we add it to the current thread's grey_fibers list and set its grey bit. Whenever a thread is asked to rescan its stack, it reruns the stack walk of every grey fiber in its list, clears the grey bits of those fibers, and clears the list.

There's a fun almost-race at termination that may happen due to the use of soft handshakes. In an on-the-fly GC, we may have the following sequence of events:

The GC runs out of work, so it triggers a soft handshake to scan all stacks.
Thread 1 performs a stack scan, including walking and resetting its grey fibers, and finds no new objects.
Thread 1 swapcontexts to a different context, causing its grey fibers to be nonempty.
Thread 2 performs a stack scan and finds no new objects.
There are no other threads, and since none of the threads found any new objects, the GC declares termination even though thread 1 has a grey fiber.

Currently, we just reset the grey fiber lists after termination. Reason: if thread 1 found no new objects to mark in step 2, thread 2 also found no new live objects, and the GC was out of work in step 1, then there's no way that any other unmarked objects could have been introduced into the context before we swapped from it. This is true because:

Newly allocated objects are marked.
Objects stored into the heap are marked by the barrier at time of store.
The GC having run out of work means all objects that had been marked had all of their outgoing pointers marked.

There's simply no way that thread 1 could have loaded an unmarked pointer from the heap in this scenario!

That being said, if there was a bug in my ucontext implementation, this is where it would be.

Putting It All Together

We only support ucontext APIs in the glibc build of Fil-C. Hence, you get it in /opt/fil and Pizlix, but not in the pizfix. It's implemented as follows:

getcontext allocates a new zfiber_context with zfiber_context_new and calls zfiber_context_bind_sigset (to cause zfiber_context to replicate its internal sigmask with the user-visible uc_sigmask) and then zfiber_context_getcontext.

setcontext calls zfiber_context_setcontext.

makecontext creates its own trampoline that manages passing arguments to the user-passed main function. It also manages handling fiber exit (switching to uc_link or calling exit). Other than that, it just calls zfiber_context_makecontext.

swapcontext mostly just does a zfiber_context_swapcontext. If the from context (aka the oucp) was not initialized, then it allocates a zfiber_context for it using zfiber_context_new and calls zfiber_context_bind_sigset.

Note that Fil-C does not allow using longjmp/setjmp as an alternate context switch path for ucontext. Some software mixes ucontext with longjmp/setjmp, though all cases of this that I've found (OpenSSL, I'm looking at you) has flags to disable the mixing, because Fil-C isn't the only security technology that breaks if you do that.

Conclusion

Fil-C supports memory-safe context switches using either the longjmp/setjmp style and the ucontext style. The ucontext style is new since after version 0.680, so you'll need to build from source to play with it (and it's not yet thoroughly tested). The longjmp/setjmp implementation is older, and probably quite rugged by now.

As this document shows, it's possible to have memory-safe C even if you make the effort to support even the most depraved features!

Memory Safe Context Switching

Making setjmp/longjmp Memory Safe

An Example

What Is Even Happening

Putting It All Together

Making ucontext Memory Safe

Laws For Safe ucontext

GC Integration

Putting It All Together

Conclusion

Making `setjmp`/`longjmp` Memory Safe

Making `ucontext` Memory Safe

Laws For Safe `ucontext`