When functions dissolve
In this post we are going to explore how the notion of function loses its significance as an abstracted module of program logic when we compile a higher level imperative language to the assembly code.
In imperative programming, functions isolate pieces of program logic. For example, in C-like language, the following function would increase its argument by one and return its value.
int f( int x ) { return x + 1; }
Ideally, each function should have a single, well-defined goal and only contain the code both necessary and sufficient to accomplish the said goal.
Modularity
Imagine we have several programs running in parallel on the same computer. Each program is a component of this computer system.
If these programs are threads of the same process, they are sharing the same address space. Therefore, one program can affect the functioning of another by corrupting its memory structures, either deliberately or by mistake. On the other hand, if the each program is running in a separate process, it has its own virtual address space. Then an error in one process will not influence the memory of another process, provided they are not engaged in any other interactions. There are of course details to that, like what happens when a process crashes while another one is waiting for its response, but such situations can be dealt with in a reasonable way e.g. through timeouts.
There is a concept of modularity in system design. It means that to construct a system we start with smaller building blocks, which are built in isolation. Then we assemble the bigger system from these blocks, setting up connections between them.
In the example, we see two degrees of modularity. When programs share their address space, the modularity is softer, it does not prevent unintended interactions from happening. An error in one program (module) can then affect other programs, disrupt their functioning and lead to more errors. This error propagation is out of control and leads to unpredictable consequences.
On the other hand, programs in different processes are less fragile. Then the modularity is stronger.
To sum it up:
- weak, soft modularity
- errors inside one module can propagate to other modules bypassing interfaces.
- strong modularity
- modules are more isolated and the error propagation is limited to the interfaces.
These two notions are pure and extreme, real systems often fall somewhere in between.
Functions as modules
We are used to think about functions as the building blocks of program code. Functions abstract away the complexity: a caller provides them with arguments, and the function yields the computation result plus side-effects, like file output.
In a high level programming language, functions give us some reasonable
modularity (if we do not use global variables or resources).
Any function can communicate failure to its caller through returning errors:
NULL
, instances of optional types, or through exceptions.
Higher level functions and lower level subroutines
The source code is eventually compiled into machine code. Without too much oversimplification, functions in higher level languages are translated into functions in machine code. To make a distinction between them we will call these lower level assembly functions subroutines.
Subroutines should support two kinds of operations:
- we should be able to call subroutines from anywhere, and
- subroutines should be able to return to the place where they have been called.
On common architectures, there are instructions or instruction patterns that are
used to call subroutines and return from them, like call
/ ret
on Intel 64, or
bl
/ pop pc
on ARM.
Subroutines are different from functions in one important way: subroutines only operate in a common memory space and do not have their own isolated piece of memory. Functions have local variables and arguments; subroutines can allocate space on stack just for themselves, but are forced to use CPU registers, which are global and shared among them.
To make an analogy with higher-level language, imagine that:
- all your functions accept zero arguments;
- you are not allowed to create local variables;
- you have a limited amount of global variables;
- you have to repurpose the same global variables in different functions.
Because of this, modularity supported by subroutines is even "softer" than the function modularity. I wonder if, in this case, we can talk about any modularity at all (except that each subroutine is written in one place in the source code which makes a subroutine a structural module). Maybe the better way to think about subroutines is to accept that they do not exist. Once we get there, we may better understand how assembly programs are written, how they function and what possibilities it offers to us.
The rest of this post uses the assembly code as a device to tell about two
interesting concepts: tail-call elimination and coroutines.
We are going to optimize the subroutine print_newline
in several stages up to
the point that it is going to lose its shape as an isolated subroutine and get
reduced to one instruction.
; Gets symbol code in rdi and writes it in stdout print_char: ... ; the code of print_char is not important to us ... ret ; Prints newline character print_newline: mov rdi, 0xA ; code call print_char ret
The subroutine print_newline
is just an adapter for print_char
:
void print_newline() { print_char(0xA); }
Tail call optimization
Tail call happens when the subroutine ends with a call to another subroutine. In the higher level language this happens in cases such as:
void g(); void f() { g(); return; }
This is also a tail call, because we just return what another function returns:
int g(int x); int f() { return g(42); }
This is not a tail call: after calling function we have to multiply its result
to another number.
We wait until fact(n-1)
completes its execution, and then use its result in
other computations.
Note that in this example the function calls itself rather than some other
function, but this is unimportant.
int fact( int n ) { if (n < 2) return 1; return n * fact(n-1); }
On the assembly level, a tail call corresponds to the following pattern of instructions:
f: ... call other ret
This pair of instructions is located in a function which we are going to call f
.
The control reaches this function from some other function, say, f_caller
.
Let us follow the state of stack through the execution of f_caller
, f
and other
.
For that we expand this code to include f_caller
and other
functions.
f_caller: ... call f <next instruction> ; mark 4 ... f: ; mark 1 ... call other ret ; mark 3 ... other: ; mark 2 ... ret
- When we start executing
f
the stack holds the return address insidef_caller
(we reach mark 1). - When we call
other
the stack holds the return addresses forf_caller
, thenf
on top (we reach mark 2). - The subroutine
other
returns too, in this moment we have only return address forf_caller
on top of the stack (we reach mark 3). - The subroutine
f
returns, popping return address off_caller
from the stack (we reach mark 4).
The last two steps were essentially popping two return addresses from stack consecutively.
It suggests that the first one (the return address to f
) is useless and we do not need to store it.
We are indeed right.
The call other
instruction is equivalent to the pair of pseudoinstructions:
push rip ; rip is the program counter register jmp other
If we do not need to push return address to f
, then we can just substitute this instruction for jmp
and get rid of one ret
:
f_caller: ... call f <next instruction> ... f: ... jmp other ... other: ... ret ; will return directly to f_caller
The subroutine other
becomes essentially the continuation of the subroutine f
; we pass from f
to other
via a simple branching instruction.
Coming back to our example, we can rewrite it like that:
print_char: ... ret print_newline: mov rdi, 0xA ; code jmp print_char
Tail recursion
What if other
and f
are the same function?
Then the jmp
is performed to the start of f
making the whole thing into a loop:
f_caller: ... call f <next instruction> ... f: ... jmp f
This is what we mean when we say that the compiler optimizes tail recursion into a loop.
Coroutines
Our current example is the implementation of print_newline
in two instructions:
print_char: ... ret print_newline: mov rdi, 0xA ; code jmp print_char
When the example is written like this it is easy to guess the next step. If the execution is sequential why bother with a jump here? The control routinely falls from one instruction to the next one anyway.
print_newline: mov rdi, 0xA ; code print_char: ... ret
The subroutines have merged here, and it does not look like there are
two separate functions anymore.
We got a subroutine with two entry points labeled print_newline
and print_char
.
We call such a thing a coroutine.
Coroutines are a generalization of subroutines, they may have a state and
multiple entry points. You may often encounter them or their variations in
modern languages e.g. generators in Python or lazy IEnumerable
's in C#. Our
coroutine does not have a state, however. It is funny to note that coroutines
appeared first as a assembly programming pattern and is not a fancy new thing
like cubical type theory provers.
If your language supports continuations, you can implement coroutines easily.
Conclusion
I do like to draw connections between seemingly distant notions and areas. I also enjoy learning new points of view on things I already know. My students inspired me to write this post because for a lot of them, coming from higher level languages, the functions were entities with well shaped bodies, having a visible start and end. Well, in assembly they do not have to.
It also supports a way of thinking about compiled code as a "soup" of sorts, where everything is blended together, reordered, precomputed, optimized until only the actual actions and their order suggest how the original program looked like. I think this is especially useful when reasoning about parallel programs and lock-free algorithms.