When functions dissolve

In this post we are going to explore how the notion of function loses its significance as an abstracted module of program logic when we compile a higher level imperative language to the assembly code.

In imperative programming, functions isolate pieces of program logic. For example, in C-like language, the following function would increase its argument by one and return its value.

  int f( int x ) {
     return x + 1;
  }

Ideally, each function should have a single, well-defined goal and only contain the code both necessary and sufficient to accomplish the said goal.

Modularity

Imagine we have several programs running in parallel on the same computer. Each program is a component of this computer system.

If these programs are threads of the same process, they are sharing the same address space. Therefore, one program can affect the functioning of another by corrupting its memory structures, either deliberately or by mistake. On the other hand, if the each program is running in a separate process, it has its own virtual address space. Then an error in one process will not influence the memory of another process, provided they are not engaged in any other interactions. There are of course details to that, like what happens when a process crashes while another one is waiting for its response, but such situations can be dealt with in a reasonable way e.g. through timeouts.

There is a concept of modularity in system design. It means that to construct a system we start with smaller building blocks, which are built in isolation. Then we assemble the bigger system from these blocks, setting up connections between them.

In the example, we see two degrees of modularity. When programs share their address space, the modularity is softer, it does not prevent unintended interactions from happening. An error in one program (module) can then affect other programs, disrupt their functioning and lead to more errors. This error propagation is out of control and leads to unpredictable consequences.

On the other hand, programs in different processes are less fragile. Then the modularity is stronger.

To sum it up:

weak, soft modularity: errors inside one module can propagate to other modules bypassing interfaces.
strong modularity: modules are more isolated and the error propagation is limited to the interfaces.

These two notions are pure and extreme, real systems often fall somewhere in between.

Functions as modules

We are used to think about functions as the building blocks of program code. Functions abstract away the complexity: a caller provides them with arguments, and the function yields the computation result plus side-effects, like file output.

In a high level programming language, functions give us some reasonable modularity (if we do not use global variables or resources). Any function can communicate failure to its caller through returning errors: NULL, instances of optional types, or through exceptions.

Higher level functions and lower level subroutines

The source code is eventually compiled into machine code. Without too much oversimplification, functions in higher level languages are translated into functions in machine code. To make a distinction between them we will call these lower level assembly functions subroutines.

Subroutines should support two kinds of operations:

we should be able to call subroutines from anywhere, and
subroutines should be able to return to the place where they have been called.

On common architectures, there are instructions or instruction patterns that are used to call subroutines and return from them, like call / ret on Intel 64, or bl / pop pc on ARM.

Subroutines are different from functions in one important way: subroutines only operate in a common memory space and do not have their own isolated piece of memory. Functions have local variables and arguments; subroutines can allocate space on stack just for themselves, but are forced to use CPU registers, which are global and shared among them.

To make an analogy with higher-level language, imagine that:

all your functions accept zero arguments;
you are not allowed to create local variables;
you have a limited amount of global variables;
you have to repurpose the same global variables in different functions.

Because of this, modularity supported by subroutines is even "softer" than the function modularity. I wonder if, in this case, we can talk about any modularity at all (except that each subroutine is written in one place in the source code which makes a subroutine a structural module). Maybe the better way to think about subroutines is to accept that they do not exist. Once we get there, we may better understand how assembly programs are written, how they function and what possibilities it offers to us.

The rest of this post uses the assembly code as a device to tell about two interesting concepts: tail-call elimination and coroutines. We are going to optimize the subroutine print_newline in several stages up to the point that it is going to lose its shape as an isolated subroutine and get reduced to one instruction.

; Gets symbol code in rdi and writes it in stdout
print_char:
   ...
   ; the code of print_char is not important to us
   ...
   ret

; Prints newline character
print_newline:
    mov rdi, 0xA          ; code
    call print_char
    ret

The subroutine print_newline is just an adapter for print_char:

  void print_newline() {
    print_char(0xA);
      }

Tail call optimization

Tail call happens when the subroutine ends with a call to another subroutine. In the higher level language this happens in cases such as:

void g();

void f() {
    g();
    return;
}

This is also a tail call, because we just return what another function returns:

int g(int x);

int f() {
    return g(42);
}

This is not a tail call: after calling function we have to multiply its result to another number. We wait until fact(n-1) completes its execution, and then use its result in other computations. Note that in this example the function calls itself rather than some other function, but this is unimportant.

int fact( int n ) {
  if (n < 2) return 1;
  return n * fact(n-1);
}

On the assembly level, a tail call corresponds to the following pattern of instructions:

f:
  ...
  call other
  ret

This pair of instructions is located in a function which we are going to call f. The control reaches this function from some other function, say, f_caller.

Let us follow the state of stack through the execution of f_caller, f and other. For that we expand this code to include f_caller and other functions.

f_caller:

...
   call f
   <next instruction> ; mark 4
...

f:            ; mark 1
  ...
  call other
  ret         ; mark 3
...
other:        ; mark 2
   ...
   ret

When we start executing f the stack holds the return address inside f_caller (we reach mark 1).
When we call other the stack holds the return addresses for f_caller, then f on top (we reach mark 2).
The subroutine other returns too, in this moment we have only return address for f_caller on top of the stack (we reach mark 3).
The subroutine f returns, popping return address of f_caller from the stack (we reach mark 4).

The last two steps were essentially popping two return addresses from stack consecutively. It suggests that the first one (the return address to f) is useless and we do not need to store it. We are indeed right.

The call other instruction is equivalent to the pair of pseudoinstructions:

push rip    ; rip is the program counter register
jmp other

If we do not need to push return address to f, then we can just substitute this instruction for jmp and get rid of one ret:

f_caller:

...
   call f
   <next instruction>
...

f:
  ...
  jmp other
...
other:
   ...
   ret     ; will return directly to f_caller

The subroutine other becomes essentially the continuation of the subroutine f; we pass from f to other via a simple branching instruction.

Coming back to our example, we can rewrite it like that:

print_char:
   ...
   ret

print_newline:
    mov rdi, 0xA          ; code
    jmp print_char

Tail recursion

What if other and f are the same function? Then the jmp is performed to the start of f making the whole thing into a loop:

f_caller:

...
   call f
   <next instruction>
...

f:
  ...
  jmp f

This is what we mean when we say that the compiler optimizes tail recursion into a loop.

Coroutines

Our current example is the implementation of print_newline in two instructions:

print_char:
   ...
   ret

print_newline:
    mov rdi, 0xA          ; code
    jmp print_char

When the example is written like this it is easy to guess the next step. If the execution is sequential why bother with a jump here? The control routinely falls from one instruction to the next one anyway.

print_newline:
    mov rdi, 0xA          ; code
print_char:
   ...
   ret

The subroutines have merged here, and it does not look like there are two separate functions anymore. We got a subroutine with two entry points labeled print_newline and print_char. We call such a thing a coroutine.

Coroutines are a generalization of subroutines, they may have a state and multiple entry points. You may often encounter them or their variations in modern languages e.g. generators in Python or lazy IEnumerable's in C#. Our coroutine does not have a state, however. It is funny to note that coroutines appeared first as a assembly programming pattern and is not a fancy new thing like cubical type theory provers.

If your language supports continuations, you can implement coroutines easily.

Conclusion

I do like to draw connections between seemingly distant notions and areas. I also enjoy learning new points of view on things I already know. My students inspired me to write this post because for a lot of them, coming from higher level languages, the functions were entities with well shaped bodies, having a visible start and end. Well, in assembly they do not have to.

It also supports a way of thinking about compiled code as a "soup" of sorts, where everything is blended together, reordered, precomputed, optimized until only the actual actions and their order suggest how the original program looked like. I think this is especially useful when reasoning about parallel programs and lock-free algorithms.