rubber duck typing

Argument from recollection in Phaedo is self-illustratory

Thu, 18 Dec 2025 00:00:00 +0100

Plato impresses me not just with his ideas, but by his writing craftsmanship. In line with his definition of philosophy as a practice, a process, both content and form of his writing contribute to its message. What's said is important, but how it's said adds another layer to the meaning, sometimes illustrative, sometimes adding a twist of depth. In this post I'd like to share some of these findings in Phaedo.

Phaedo is a dialogue about the immortality of the soul, capturing the final hours of Socrates. Within the dramatic time of the dialogue, he is convincing his friends Simmias and Cebes that philosophy is a "practice for dying", preparing us to leave our bodies and be released from its inconveniences, enabling us to finally see things for what they are, clearly. So, the soul is immortal, death is not to be feared, but welcomed when it comes (not sooner though, later Socrates adds that gods should decide when our time is over).

Socrates gives three arguments for the immortality of the soul, here I'd like to focus on the argument from recollection, as it has a fascinating quality of illustrating itself on three levels. To reveal how exactly, let's start by recollecting the argument itself.

Suppose you are looking at two stones, and they seem so similar to you that you are inclined to judge them as two equal stones. However, in other context, from another angle, you might judge them unequal. Maybe they have equal weight, but differ by size, or shape, or color. In sensible world, comparisons are inherently imperfect: they are unstable, context-dependent, approximate.

But to tell that the equality in sensible world is imperfect means to think about a standard – some perfect equality, compared to which the "real world equality" falls short. But what is this standard, the perfect equality, and where did we get an idea of it?

Socrates argues that the idea of perfect equality can not be based on experience, because a perfect equality is an abstraction. It does not exist in the sensible reality, and it belongs to an entirely different ontological category. He calls such objects Forms.

From this, Socrates concludes, that we should have known equality from before our birth, but then apparently forgotten. Your soul knows Equality, and your soul existed before your mortal body. As you live and gather sensory experience, this experience reminds you of the forgotten knowledge through the similarity – like the lackluster sensory equality reminds you of a perfect equality. This enables the recollection.

The Equality is one of the Forms – eternal, unchangeable objects belonging to the ideal realm of Forms. Each Form is, in layman's terms, like an essence of a kind of things on Earth. In Plato's writings, Socrates repeatedly mentions that our souls possess all knowledge of Forms but they have forgotten it as they got embodied on Earth. So, we need to be reminded of it.

Socrates notes, how asking Simmias or Cebes a well formulated question about the nature of the Form of Equality produced an immmediate answer. He engages them in Socratic questioning: asking and guiding them to the knowledge of universal truths they have already had in their souls. When Simmias and Cebes are performing the recollection themselves, under his guidance, the dialogue illustrates the point of recollection on the second level. Show, and tell.

Funnily enough, before the argument starts, in 73b, Simmias claims that he knows the argument but asks Socrates directly to remind him of it – very tongue in cheek from Plato. Socrates indeed reminds Simmias of the argument, but in two different senses: directly, by stating the argument that Simmias heard but "forgot"; and by teaching it to Simmias, which is equivalent to reminding Simmias of the truths his soul knew all along. Unfortunately, as I do not read ancient Greek, I can't be sure to which extent the wordplay was intentional.

This second possible meaning of remind raises an interesting question: if Simmias is recollecting the lost knowledge, it must have to do with one of the Forms, but which one? At first I was inclined to think that the knowledge is about the world of Forms, therefore assuming that the world of Forms is a Form itself. This, however, seems to be a debatable point among Plato scholars, and for a good reason. Plato did not leave us a clear account on it, so we would have to deal with nuances such as: does the Form of Forms contain itself (think Russel's paradox)? Do we need a Form to explain what's in common between the Form of the World of Forms and individual forms, and thus start an infinite regress?

A better answer might be to say that Simmias learns something about the world of Forms, which is related to all Forms, therefore gets into the domain of knowledge accessible by recollection. Or, perhaps, it's about the Form of Man or Form of Human Being, specifically, which was mentioned in another dialogue, Parmenides. Throughout the entirety of the dialogue, Simmias and Cebes learn about what it is to be a man, how we are related to our bodies and souls, what is it to live and to die. The argument from recollection is a part of the picture, elucidating the mechanism by which we recollect the forgotten knowledge of Forms, accessible to our soul. Simmias and Cebes learn about it, but also experience it firsthand.

Through the dialogue form, the readers are witnessing the process, but also guided through the same recollection process as Simmias and Cebes. You too did, in your soul, know the argument all along, you just needed a reminder. This is the third level, and here the text breaks the fourth wall.

Indeed, we are unable to engage with the text interactively, as we could with a teacher. Socrates critiques the stale quality of texts in a different dialogue, Phaedrus, which is, ironically, a written text itself. In Phaedo, however, we see clearly, that even a written text can actively engage reader in a process that leads to interesting outcomes for them, very much like a teacher does.

The recollection argument has a unique property: the very act of understanding this argument is itself an act of recollection. To grasp why sensible equals are deficient compared to the Form of Equality, the reader must already possess and use the very concept of perfect Equality that the argument claims we've recollected from prenatal existence. Within this framework, the understanding is not about recollection – it is recollection.

Unlike pedagogical texts that invite optional engagement, or rhetorical texts that persuade through technique, the Phaedo makes its central claim self-exemplifying: understanding requires exercising the very capacity for abstract thought that the argument attributes to recollection. The form doesn't just illustrate the content – it is the content first revealed, and then made an embodied experience. Perhaps, the highest expression of the art of writing.

Compiler correctness

Thu, 19 Sep 2024 00:00:00 +0200

This post explores the notions of bug identity, bug identification applied to compilers. I will try to highlight why compilers are so difficult to check for correctness.

Context: Verification and Validation of Software

Model of a program

Most programs can be modeled as a black box that reacts to events. In simple cases, the only event is feeding the program input data, and the only reaction is producing output.

The choice of events and reactions for the model is contextual. We do not need to include every aspect of the program's execution in the model.

If only computation results matter, we may limit the model to describing output files. This is prevalent in non-interactive programs; think scientific computing.
If the execution speed, or the reaction time matters, we may include events reflecting how the results are obtained. This way, we can track memory consumption or response delays. Think high-frequency trading or robotics.

What makes a program correct?

We usually think of program correctness in a vague, informal way. A correct program does what we expect it to do, but what exactly do we expect?

To make the notion of correctness clear for a given program, we should describe our understanding of its behavior. In other words, we need to make explicit the expected reactions to the input events. This description is called a specification and can be written in plain English, in some formal language, or a mix of the two.

There are usually many ways to write code that satisfies a proposed specification. First, there exist infinitely many programs with exactly the same observable behavior. Second, the specification can be quite permissive, which means many different behaviors are considered correct.

To illustrate the permissiveness point, let's take the specification:

The program outputs all odd numbers in range from 0 to 100.

This specification does not impose any order on these numbers, nor does it prohibit duplicates. Therefore, a program that produces one of the following outputs will fit this specification:

Outputs [1,3,5,...,99].
Outputs [1,5,3,...,99].
Outputs [99,97, ..., 1, 3], or any other permutation of these elements.
Outputs the list [1,3,5,...,99] twice, and so on.

Verification is making an argument that the program behavior matches the specification. This argument should be backed by reasoning, tests, and formal methods.

Another step that justifies our belief in the software correctness is called validation. While verification checks if our understanding of what the program should do matches the actual program behavior, validation checks if we have the right idea about what it should do, i.e. if the specification makes sense. Validation is performed for the real use cases.

The meaning of the correctness usually implies that it is both verified and validated. Bugs are evidence against the system correctness.

How are compilers different from other software?

In the previous section, we provided an example of a program specification: “output all even numbers in the range from 0 to 100”. If a program outputs a list of numbers without any special properties, it is easy to verify its output – just make sure it contains all the expected numbers. All outputs that contain the required numbers are equivalent when we think about the program correctness. This equivalence is easy to define, and computers can verify it.

Compilers are way more complicated. To decide if a compiler is correct for a certain input program, we need to show that the observable behavior of an input program written in X is equivalent to the observable behavior of the output program, written in Y.

Unlike comparing two lists of numbers, equivalence of program is not just harder – it is undecidable in a general case. Imagine needing to put programs in every imaginable context and see how they behave – there is no algorithm that can do that for all programs, in a finite time. This follows from Rice's theorem.

How we define an observable behavior is situational. For example, if we are interested in how exactly the program is executed, not just its results, we may include the memory allocation events. Otherwise we can completely ignore all operational aspects of a program, reducing it to its output.

CompCert, the first verified C compiler, defines an event in a C program as a system call or read/write to volatile memory; then an observable behavior is either a finite trace of events, or an infinite trace (repeating the same sequence of events or diverging), or an error. This approach can be traced to systems theory:

As we have seen, systems of equations […] may have three different kinds of solution. The system in question may asymptotically attain a stable stationary state with increasing time; it may never attain such state; or there may be periodic oscillations.

— "General Systems Theory", Ludwig von Bertalanfy

Identifying bugs in compilers

Let us simplify the task: instead of verifying the correctness of the compiler for all meaningful programs in X, let's focus on identifying specific bugs in the compiler. While testing it on different programs in X, we may encounter a program that behaves differently after compilation. This could seem like like a strong evidence for a compiler bug. Surprisingly, this is still not sufficient to identify a bug in compiler. But why?

To compare the behaviors of two programs, we need to answer several complicated questions.

What is a program’s behavior? This, in turn, requires defining:
- The language semantic for X;
- The language semantic for Y;
- Observable behavior i.e. which events are visible.
Can a program in X exhibit multiple behaviors? What about compiled programs in Y?
- If yes, is it acceptable to lose some of X’s behaviors in the compiled programs?
Can a program in X exhibit undefined behavior? Additionally:
- Can undefined behavior be reliably identified?
- Does the compiler guarantee anything about the behavior of the compiled program in such cases?

These issues hint to the first major problem: bug identification/definition. Given a compiled program that behaves unexpectedly, what is the underlying cause?

No formal language specification

Many popular languages lack a formal, unambiguous specification. In its absence, the compiler effectively becomes the language's de facto specification and a part of the ultimate source of truth for determining program behavior.

It gets worse: compilers themselves are often written in a language that lacks a formal specification. For example, suppose we have a compiler written in C++. The compiler's behavior may then be compromised by:

The quirks of C++ semantics, which make it harder to interpret the compiler's code.
C++'s compile-time non-determinism means that after recompiling the new executable of X's compiler may compile programs differently.
C++'s many undefined behaviors may carry over into programs in X, even if X is not supposed to have any.
Bugs of the C++ compiler itself may affect the compiler of X.
Implementation bugs of the C++ compiler may introduce bugs in the specification of language X and creep into the programs written in X.

Thus, different behaviors in a program in X and its compiled version in Y do not necessarily indicate a bug in the compiler.

Same bug or different bugs?

If we are unsure whether we have found a bug or if the unexpected behavior stems from an incomplete language specification, distinguishing one bug from another becomes even more difficult. How can we tell whether two miscompilations result from the same compiler bug or from different ones? This is the bug identity problem.

Why does bug identity matter? Can't we just fix bugs one by one? Sometimes, fixing one bug will inadvertently fix another, suggesting that they have a common root cause.

First, if you have 100 issues in your bug tracker, it helps to know whether any of them are related. This aids in planning, as it helps estimate how many patches do we really need to resolve these 100 issues.

Second, there are multiple ways to incentivize the community to report bugs for some sort of compensation. When two people report the same bug, how do we ensure fair compensation? It is hard to say whether they have reported the same bug at all! All reward strategies seem flawed:

A "first-come, first-rewarded" approach does not work because new reports may be related to already reported but unresolved bugs.
Rewarding both reporters equally is problematic because one can spam the system by generating numerous programs triggering the same bug, receiving disproportionately large rewards or reducing the reward for the original reporter if the total reward pool is fixed.

Futhermore, compiler users are often application developers, not experts in programming language semantics or programming language theory. They may not understand why, when they report something that is clearly a bug to them, the compiler experts scratch their heads and come up with an excuse – obviously because they don't want to pay or are ignorant.

Languages like C and C++ are rife of "bugs" of this kind. Many rather arcane behaviors are only recognized by a selected few, well versed in the language standard. For instance, using unions for type punning, where one union field is written and another is read, falls into undefined behavior according to the language standard.

union array {
        char[8] as_chars;
        uint64_t as_uint64;
};


union array my_union;
my_union.as_uint64 = 42;

// accessing the 3rd byte of the variable holding 42.
char n = my_union.as_chars[3];

In this case, the C standard deems such practices as undefined behavior, meaning the developer should not expect consistent behavior. Therefore if a developer finds that the variable n equals 999 in their code, regardless of the value written to my_union.as_uint64, jokes on them – the language standard is on the compiler's side.

One potential way to define bug identity is to say that two bugs are the same if one cannot reasonably be fixed without also fixing the other. This definition is imperfect, as the term "reasonably" is vague. In theory, we could patch specific cases without addressing the root cause, but this is impractical for compilers, as similar bugs will continue to arise.

In the end, bug identity is hard to formalize. Calling in an expert may be the best option, relying on open criteria to judge whether two inputs trigger the same bug or different ones. Some cases are clearly not bugs, while most fall into a gray area.

Bug severity

After distinguishing between two bugs, the next step is to prioritize which one to fix first. This is the bug severity issue.

Compilers are usually tested using a code base containing both synthetic tests and real projects. It makes sense to use the real projects affected by the bug as a part of work prioritization.

However, the identity problem complicates this process: decision-making requires comparing bugs across different input programs. How do we identify duplicates? How do we tell if the same bug is affecting two programs or if two different bugs are at play?

As mentioned earlier, automating bug identification and comparison is not feasible, meaning expert judgment is necessary. Malicious users who understand compiler bugs may submit thousands of inputs that trigger the bug, flooding developers with AI-generated bug reports. This can amount to a real-world denial-of-service attack on the development team. Fairly rewarding bug reports under these circumstances becomes nearly impossible.

Conclusion

Most programming languages lack formal, unambiguous specifications, making it difficult to define correct program behavior and identify bugs in compilers.

Comparing bugs is challenging because they stem from program behavior. Programs may have multiple possible behaviors and the compiler will select one of them. Since program equivalence is not decidable for Turing-complete languages, we can't reliably detect when the compiler's output changes but the behavior remains the same.

In the next post, I will explore the challenges faced by compilers for platforms where computations are paid for with gas, particularly when the compiler has multiple backends with different gas models.

My note-taking process

Thu, 19 Jan 2023 00:00:00 +0100

I love being efficient and there is nothing like a feeling of wholesomeness at the end of the day when you realize that you have not wasted it. I am not incredibly productive, but I am more productive then some people, including myself for practically my whole previous life.

I attribute my recent increase of productivity to my knowledge acquisition system. It is not a perfect system indeed, and always a work-in-progress, but it stuck with me for months now, which is a good sign. Additionally, it has ordered my life, eliminated much friction and helped systematize all my intellectual activities. Maybe this post will inspire you to create a similar system, or improve yours.

Overview

The main components of my system are:

Zettelkasten, stored in a git repository and organized in Emacs with org-mode and org-roam.
Book storage: cloud storage with digital articles and books, and a bookshelf with physical books.
An Android tablet with a stylus to keep a working set of documents and annotate them.
An e-ink reader, which mirrors a part of my collection of books.

The main stages of my workflow are:

Reading/watching and making notes as I go.
Revising, copying notes to my long-term note storage.

Now, to the details.

Data flow

Most of the time, I study by reading books, articles, and blog posts. Watching videos is less convenient, but sometimes I have no choice. I will illustrate the workflow on an example of a physical book.

Introduction to the source

First, I need to get a general idea of what is happening in the book. Linear reading is inconvenient for the task, so we need to approach the text structurally.

To begin, I am looking at the table of contents, skimming through pages, reading the section headers, sometimes section introductions. It is enjoyable doing this with a physical book!

Often, the book is either not entirely new for me, or only some chapters are useful — it is necessary to identify them before reading. This is most important if the book is not an easy reading: sometimes, getting through every single page may require an extensive research and an exhausting thinking.

My goals at this stage are:

Familiarize myself with the book structure.
Ask Google/ChatGPT about the meaning of unknown technical terms.
Identify my goals in reading this specific book.
Think for a moment about the questions that appear in my head after skimming through the book.

Some books are going an extra mile to guide the reader, for example, "The Network State" starts with sections "The book in one sentence", "The book in one image", "The book in one thousand words", and "The book in one essay". This is a didactic approach that I am also using with my students: guide to ideas through a series of increasingly detailed approximations.

Annotating the source

Once I have a general idea of what to expect from the book, I start reading and annotating it. As a kid, I was reading fast but memorized little, so now if I am serious about working through the text, I am annotating it. I use a set of four pencils and four highlighters of colors: violet, blue, red, and green. When something picks my interest, I highlight a fragment of text with one of these colors and write text on margins in the same color. This way a commentary is matched to a fragment.

There is a logic to my color selection:

Violet is for summarizing. It is well visible on paper and also distinguishable from black, so it does not fuse with the text.

It is not useful to highlight without commenting. Writing summary is mandatory, highlighting is optional ¹.
Blue is for connections. It reminds me of hyperlinks on Web pages. With blue I mark reasonable connections to other topics, writers, or books. Connections greatly enhance retention.
Red is for questions and critique. It attracts the most attention and is associated with danger. I use it if I do not agree with the author, or if I do not understand a fragment.
Green is for everything else. It is a safe, a bit disconnected color. If my curiosity suggests me a possible connection to my previous readings, if I get an idea on how to solve some problem, an analogy, a metaphor, then it is worth noting it. Basically, this is the color for everything else.

I am never waiting to finish one book before starting another, so my working set of sources is usually large. Reading many books on the same topic helps establishing more connections. Sometimes, it also enhances understanding, since I am exposed to several explanations of the same concepts, often from different points of view.

Every now and then I revise the notes to add information to my permanent note storage.

Storing notes

The core of my system is Zettelkasten. It is a personal Wiki with small subjective articles and annotated subjective connections between them. Each "wiki article" is a note on something: a concept, an idea, a book, an article etc. After some reading, it is time to revise my notes, skim through the book again, and put the bits of notes into the appropriate notes in Zettelkasten.

I like taking notes in two stages:

It forces me to come back to notes in a systemic way. I re-read, re-discover things that I missed, and memorize information better.
It combines the advantages of hand-writing and digital note-taking.
- Handwriting is good for retention; sheets are more "personal", it stimulates thinking.
- Digital note-taking is fast, notes are easier to systematize, much easier to edit and illustrate, they can benefit from the version control systems etc.

I use emacs with org-roam to keep my Zettelkasten organized, easily search through them and explore them with org-roam-ui.

Here is an example of a note:

… and its source code:

:PROPERTIES:
:ID:       22AF419C-C79F-42A6-B7E9-CF7A27DAD5E0
:END:
#+title: System
#+filetags: #system-engineering

* Definition
A system is something that consists of parts, functions and has emergent properties.

*  Other definitions

#+begin_quote
\dots any organized *assembly* of resources and procedures united and regulated by
*interaction of interdependence* to accomplish a set of specific functions
-- DoDAS (Department of Defense Architecture Framework)
#+end_quote


#+begin_quote
\dots a construct or collection of different entities that together produce results
not obtainable by the entities alone.
-- INCOSE
#+end_quote



Human-crafted systems have goals.

* Related

- In order to create systems it is necessary to engage in
  "[[id:4C0E5272-CC97-40B5-BA56-FD15F83B23B8][Systems thinking]]"

#+setupfile: org-header.org
#+created: [2022-11-18 Fri]
#+last_modified: [2022-11-18 Fri 12:16]

The note contains:

An identifier for org-roam (generated automatically)

:PROPERTIES:
:ID:       22AF419C-C79F-42A6-B7E9-CF7A27DAD5E0
:END:

Title
Tags (we can then search by them)
Sections with content. Text supports markup: underline/bold/italic, monospace, strike-through and alike; it may contain pictures (there is a convenient way of including shot of a part of the screen through org-download-screenshot).
A special section for connections to other notes Personally, I am also putting links to other notes directly into the body of the note. The special section is precious to provide subjective annotations for the links: why do I think there is a connection between this note and another one.

I used to have two sections: for links to the sources, and for related notes, but now I tend to just merge both into "Related".

I keep most metadata out of the way by putting it in the end of the note.

Additionally, emacs lets us explore the connected graph of notes in 3D, filter them by tags, preview them nicely etc. thanks to org-roam-ui. It looks like this:

What about digital sources?

Most of my books or articles are in digital format, so I read a lot on my tablet as well. For that, I have bought Samsung Galaxy Tab7 Fan Edition this summer, and it is just really good for me:

The screen is big (and I never felt like I have enough screen space for comfortable handwriting before)
Samsung S-Pen feels good to me, better than Apple pencil. The feel is a bit closer to the fountain pen, which is my writing tool of preference on paper.
The performance is adequate for everything I need.

As for software, I found that Flexcil works best for me. It does not choke when I open 40 tabs with documents, stays responsive and handles even huge books like Differential geometry reconstructed, which has around 2500 pages as of today.

The editing capabilities are not great, e.g. you can not rotate your handwritten text, and the Android version lacks a good synchronization mechanism – you basically have to do backups and restore them. The backup process is fast thought, so I am just doing backing up my notes in cloud every day. Once these features get implemented, Flexcil will become an ideal tool for me.

I also read for pleasure, without annotating. For that I still prefer using my old e-ink reader PocketBook 902 E-ink displays are easier on my eyes. It has now turned 12 years old, and still works like a charm.

At some point I was considering buying an e-ink tablet for annotating, but I find that having color on annotations is too valuable. The current e-ink tablets were not convincing to me.

Pitfalls

Do not try to create a hierarchy (taxonomy) for your notes. It is not scalable and only works for narrow domain. Prefer tags.
I did not perceive this system as too complicated, until I have written this post. Start small, build a habit, grow as you see fit.
Without a habit, your Zettelkasten will stay empty. You may need to force yourself to stop reading at some point, revise your notes and put them to the permanent database. Do not delay this too much, or you will not be recalling the annotated texts, but rather reading them anew.
Redundancy is good. You will naturally put the same information in many notes, and it is all right.
Drawing is great. Emacs and org notes support .svg files, but you can also include photos or screenshots of your hand-drawn diagrams, why not.
Think carefully which software you will be using, if any. It is great to have the second brain for life. My first digital notes were in some note-taking app for Symbian OS, then in Evernote. These programs are proprietary and do not always have an easy way for you to migrate your notes somewhere else. Emacs (and text files) is clearly not everyone's darling, but it is free, open-source, and has been around for more, than I am alive. It means that it is flexible enough to adapt to the new technological context: we are not using the same computers with the same operating systems as in 80s anymore, and emacs runs fine on basically everything. I am quite certain that:
- either I will be able to keep my notes there for life,
- or I will be able to migrate them somewhere, should such need arise.
Figure 1: My favorite pen since my teen years is Sheaffer Fashion.

Footnotes:

Most texts are not too dense and include a fair percentage of redundancy, so you can summarize them in a useful way. There are, of course, exceptions, for example, Tractatus Logico-Philosophicus of Ludwig Wittgenstein, where the book is a structured tree of key points of his word view from that time. In this case I had to keep the annotations separated from the book.

Programming languages and computer systems (1)

Tue, 26 Jul 2022 00:00:00 +0200

A couple years ago I have started to write my second book, whose main point is to teach students elements of computer system design, largely speaking. Besides discussing the notion of systems, complexity and approaches to managing systems' complexity in a meaningful way, I want to give special attention to the programming languages. Source code transformed into a running program is an important part of system, the language design and the software design affect the resulting system in a lot of ways.

The book remains largely unfinished, but as a mean to push myself to complete it and to share the first versions of the chapters I will post them here.

The first chapters introduce the notions of systems and computer systems. Software and hardware engineers can greatly benefit from studying systems: system design techniques can be applied to designing all kinds of software, hardware, hybrid systems, distributed systems, and even applied to developer teams, which are systems themselves. However, we are not focusing on the systems theory itself, and many ideas can be expanded or made more precise.

Systems around us

In our lives we encounter multiple collections of objects that we call systems. System is a collection of components interacting with each other and outside world, giving the system additional properties, that individual components do not possess. Such properties are called systemic or emergent.

This drawing shows a system of three components, denoted A, B, and C. Component B interacts with A and C, while A and C do not interact directly with each other. A big circle is separating the system from the outside world.

Figure 1: A system of three components: A, B, and C.

The world around us is filled with systems. Here are some examples:

Cars can transport passengers and objects. This ability is a systemic property of cars. No part of a car can transport objects like a whole functioning car.

Moreover, when the car is assembled but not running, it stays still. Only the interaction between the parts of a car allows it to be a transport.
Living humans are systems. Unlike in a corpse, our cells are interacting. Our ability to walk, think, breathe, being alive and conscious is our systemic property.
Computers will not perform any work unless powered up and turned on. The ability to compute is their systemic property. It is born from interactions of software and CPU, memory, storage, buses, and other hardware components.
A team of software, or hardware developers is unable to produce a complex piece of software unless all participants interact with each other, making sure that they are on the same page. The ability to produce an output together through solo work and interactions is a systemic property.

In this part we will continuously revisit these three examples: cars, humans, computers, and teams.

There is a saying: "a system is more than a sum of its parts." Indeed, a system is not only a sum of its parts, but also of interactions between them. Disregarding the interactions takes the emergent properties out of the picture.

Systems emerge in many areas of our lives, e.g., sociology, biology, physics, economics, computer science, and engineering. Systems theory has become an interdisciplinary field studying systems models at a high level of abstraction, disregarding specific domains.

Environment

Systems do not exist in the void, they are interacting with the world.

Cars interact with the driver and the road, the trees around them, other cars.
Humans interact with other humans, sounds, visuals, the air we breathe, and the food we consume.
Computers interact with other devices, with humans through screens, keyboards, and other interfaces.

Environment is a collection of objects outside the system with which the system engages in meaningful interactions.

Figure 2: In the same system, B is an interface, and E,F,G are parts of the system environment.

The context of our analysis dictates which objects should be part of the environment. The car might, in some cases, interact with blue whales, sound waves, bacteria, or asteroids, but most analyses can safely ignore them. If the system interacts with an object from an outside world, we include this object in the environment. The interaction should be essential for our analysis.

The environment usually includes an organized part and a chaotic part. Sometimes it may be viewed as a system on its own, consisting of multiple systems and interactions between them. If we design our own system, we often want to protect it from interacting with the chaotic part and set up the correct interactions with the organized part.

Usually only a handful of the system's components are interacting with the environment directly, and we call them the interface of the system. Here are some examples:

The interface of cars includes its controls and wheels.
The interface of humans includes their eyes, ears, and skin.
The interface of computers includes external devices, e.g. keyboard, screens, mice, network adapters.

In other words, interface consists of such components that can be viewed both as parts of the system, or of its environment.

The interface can vary depending on the system's function: if we consider a verbal conversation between two humans, their interface is a spoken word; however, when these humans dance together, the interfacing happens through visual contact and touch.

An interface is like a border between the system and its environment. The border consists of such parts of the system that can be attributed to either the system itself or its environment. It is not always easy to define the border well, it may be blurred.

Computer systems

In computer systems, we peek through the interface of a computer system and observe the processes inside. Then we interpret our observations, and this interpretation is what constitutes computations.

Examples of computer systems in nature

To demonstrate the importance of our interpretation, let us study some systems where computations happen in unorthodox ways.

Computers are deterministic, so they do not have components to generate random numbers. Unless they incorporate a specific hardware component to generate randomness, they cannot achieve true non-determinism. Computers can imitate randomness through clever mathematical trickery, but such numbers, although they seem random, are predictable. However, actual random numbers have numerous uses, especially in cryptography.

One good way of generating actual random numbers is an online service located at www.random.org. It uses electromagnetic noise from the atmosphere of Earth as a seed for the random number generator. Using such numbers for cryptography is much more adequate than pseudo-random numbers generated by a traditional computer.

We can say that the part of random.org that performs computations is the atmosphere of Earth itself. It may seem strange because computation results come from observing nature, not through computations on a human-made computer. However, observing atmospheric noise and interpreting our observations, then plugging it into a more conventional computer system, provides us with numbers with very particular properties. What is it, if not a computing process?
Another example is pulsars. Pulsars are stars at the late stages of their life, emitting bursts of light with high and very stable frequency. Therefore, they can be used as a clock: we count the bursts and know how many milliseconds have passed. Counting time is also computation.
The third example is ant colonies. Ants are searching for food in a process that can be described as a random walk. As they move, they mark their steps with a scent of pheromones.

When an ant worker reaches food or another resource, it goes back to the ant colony, tracing back its steps. Thanks to the smelly footprints, it can go back following precisely the same route from the ant colony to food.

However, its initial path can be far from optimal. For example, the ant may walk a full circle, so its patch will contain loops. Alternatively, it can go uphill instead of going around the hill.

Nature seeks efficiency, and ants use their capacities to recognize stronger and weaker smells to optimize their paths. The smell gradually evaporates, so the path to the ant colony has a weaker smell. Similarly, the path to the food was taken more recently and thus has a stronger smell. Turns out, this information is sufficient to shorten paths substantially.

So, if we put a piece of food in the proximity of an ant colony, we will find an optimal path from this colony to food. All of it — by observing ants, without any circuitry. This is a kind of computation that does not look different from something computed on your laptop. Actually, ant colonies have inspired some path search algorithms.

To sum it up, computations can be performed in a lot of ways, and interpretation plays a huge role in it. Sometimes nature performs computations without computers, and we interpret natural processes as computations.

Programs as systems

Writing programs is of particular interest to us. What is the relationship between programming and computer systems?

When we program, the outcome of our work is the source code of a program. It may be executed through interpretation, it may be compiled and then executed in a different form.

Systems are alive, their parts are interacting, and new properties emerge from these interactions.

The source code is dead. It is akin to a manual on how to assemble a piece of furniture, and someone or something else has to do the actual work.

However, a running program becomes a part of a hybrid hardware-software computer system. It is impossible to understand its behaviour without thinking about hardware, software, and their interaction. By reducing the system to just software or hardware we lose the other part from the sight, and we also lose their interactions.

The source code relates to the running program as a manual on how to assemble a piece of furniture relates to a system of furniture assembler, instruction manual, and furniture parts. This system is performing work on assembling the furniture.

The blueprint of an airplane is also like the program source code. If we load the program in memory and prepare for its execution, it will be an actual plane standing on the ground. A running program will be a flying airplane.

Source code	Program
airplane blueprint	flying airplane
furniture assembly manual	working furniture assembler

Functional and structural decomposition

The system of interest may be very complex. It may be difficult to decompose, i.e., identify its parts. There are several distinct ways of doing it.

Structural parts

The first way is to decompose the system structurally. It is useful for the systems assembled from isolated parts, like a car or a house.

Wheels, chairs, gearbox are all parts of a car.
Bones, muscles, internal organs are parts of a human.
Processor, motherboard, memory chips are parts of a computer.

The second, more useful way, is to decompose the system functionally. Each part is identified by its role: what is its function in a system? This is not the same as structurally decomposing system, because we only care about the functionality.

Functional components

One structural part may participate in multiple interactions and play different roles.

Steering wheel, knobs, car computer, gear box, steering wheel booster and other parts of a car assist its driver in controlling the car; it is their function. Some of these parts have also other functions, like the car computer, which may regulate the fuel consumption. So, the car can be functionally decomposed in multiple parts, two of which are:
- A part to assist the driver.
- A part to regulate the fuel consumption.
It is impossible to isolate these two parts one from another; therefore it is not a structural decomposition.
Bones are structural parts of humans, but their roles are many: they support us, they protect bone marrow, they allow an effective functioning of muscles and tendants.
A hard disk drive may be a structural part of a computer. It has many roles, e.g., it is a data storage hosting partitions with filesystems; it also participates in the functioning of virtual memory, as we will see in Chapter 6.
Scissors is another good example of a system with different functional and structural decomposition. It is able to cut paper better than a simple knife because of its systemic properties. Structurally scissors are composed of two pieces of metal and a screw that connects them together. Functionally, the same scissors consist of two parts: one is used to hold them, whereas another cuts paper. Not only do scissors have more structural parts than functional, we can not map structural parts to functional parts.
Consider a team, a system of people working on the same project. Structurally, every person is a part of such system. Functionally, the parts of this system are roles: programmers, designers, managers, testers etc. One of the differences between them is that one person can play several roles (e.g. be a programmer and project manager, or programmer and graphics designer).

Functional view of systems

Functional way of viewing systems is considered more efficient for engineering computer systems, including programs. It emphasizes the important aspect of systems — their functionality. Then this functionality may be implemented in software, in hardware, or in a hybrid form. Later we will study an example of virtual memory; it is a mechanism implemented by an operating system that leverages certain hardware features. Virtual memory is not exclusively hardware or software.

Imagine we are building a system that requires intensive computations, like ray tracing in computer games. Its evolution may look like this:

The starting point is a simple single-threaded algorithm to perform ray tracing on a single CPU. To speed it up, we provide the fastest CPU available.
The algorithm is optimized on multiple levels to work as fast as possible.
The algorithm is parallelized so that ray tracing is performed in multiple threads.
The computations are moved to a dedicated graphics card. This requires redesigning the algorithm. The graphics card is often a good computation unit to run highly parallelized algorithms.
If a graphics card is not efficient enough, or if it draws too much power, we may design our own chip. So the whole algorithm will be encoded as a circuit on a FGPA (field-programmable gate array) which will allow to maximize its speed and minimize the energy consumption.

At all these stages of life, the functional decomposition of the system was the same, but its structural decomposition has changed considerably. It is therefore more effective to think about functionality and roles of system parts, than to limit yourself with the fixed implementation straight away. The implementation should follow the functionality and various requirements, not the other way around.

Examples from computer systems

To get accustomed to structural and functional decompositions we will study examples from computer systems and programming.

The first example is a cursor; we all see it constantly on computer screen as we point at things, drag them or otherwise interact with them. Cursor is a functional component, not a structural one. It is not equivalent to mouse, graphical tablet or any other device, because a program can take control over the cursor and move it around.
A lot of programs are being executed simultaneously on your computer. They coexist and are isolated from one another through virtual memory. The function of virtual memory is to store data. The virtual memory space itself consists of areas of forbidden addresses, files, mapped from hard drive, anonymous pages and so on. The virtual memory is backed by the physical memory, storage (in case of memory swapping or mapping files) and operating system, which ensures transparent operation with it.

It does not seem possible to divide computer system into structural parts in a way that isolates one virtual memory space from all others, so it is not a structural component. Another example is a process — we have all seen a list of processes in a task manager. A similar argument is applicable to it: we can not divide computer on parts structurally to isolate one process. A process contains a virtual address space too, parts of which are shared among processes. So a process is also a functional component.
The final example is storage — its name alone suggests its function and hence that it is a functional component. It can be implemented as a piece of hardware, but the data can be stored in a distributed filesystem, in a distributed database, in cloud. In cloud storage it is not clear where a specific piece of data is stored due to factors like replication blurring it.

Don't think twice about sending me a message if something caught your eye :)

Job offers [RnD, compilers, PLT, systems] are welcome

Tue, 22 Mar 2022 00:00:00 +0100

The time has come for me to work in Europe again.

For the last couple of years I was residing in Poland and France, but my main occupation was reforming a curriculum for system architects in ITMO university in St.-Petersburg, Russia. Our idea was to create a novel curriculum in Computer Science and Engineering tying together the science of complex systems, programming, computer system architecture and strong mathematical foundations. Quite a task if you ask me! The works connecting complex systems and software/system engineering are quite scarce at the moment.

We were lucky to start to assemble a great team of young and more seasoned professionals, passionate and smart students. The level of education went up considerably, we have provided a vision and clearer goals towards a unique curriculum. Sadly, due to the war I have to limit my engagement in this project and hence I am looking for a work to hone my hard skills.

I am looking for a job in Warsaw, France (remotely) or anywhere in EU (also remotely, or mostly). My main area of interests includes compilers, programming language theory, formal semantics, programming language design, system design. I like problems related to organizing code, systems, I know and understand the low level reasonably well (and have even written a book about it). I would love to do some R&D requiring some interesting mathematics and writing code. I do love maths and studying – the big part of my life problems were consequences of poor work/life balance, but thankfully this part is past me.

I have studied in ITMO university and Academic University of Saint-Petersburg. Then I have done research in IMT atlantique in Nantes, where I was proving correctness of some refactorings of the C language formally, using the formal semantics of CompCert verified C compiler.

[my CV]

Write me a mail if you have a project where I could be of a use.

When functions dissolve

Sat, 12 Dec 2020 00:00:00 +0100

In this post we are going to explore how the notion of function loses its significance as an abstracted module of program logic when we compile a higher level imperative language to the assembly code.

In imperative programming, functions isolate pieces of program logic. For example, in C-like language, the following function would increase its argument by one and return its value.

  int f( int x ) {
     return x + 1;
  }

Ideally, each function should have a single, well-defined goal and only contain the code both necessary and sufficient to accomplish the said goal.

Modularity

Imagine we have several programs running in parallel on the same computer. Each program is a component of this computer system.

If these programs are threads of the same process, they are sharing the same address space. Therefore, one program can affect the functioning of another by corrupting its memory structures, either deliberately or by mistake. On the other hand, if the each program is running in a separate process, it has its own virtual address space. Then an error in one process will not influence the memory of another process, provided they are not engaged in any other interactions. There are of course details to that, like what happens when a process crashes while another one is waiting for its response, but such situations can be dealt with in a reasonable way e.g. through timeouts.

There is a concept of modularity in system design. It means that to construct a system we start with smaller building blocks, which are built in isolation. Then we assemble the bigger system from these blocks, setting up connections between them.

In the example, we see two degrees of modularity. When programs share their address space, the modularity is softer, it does not prevent unintended interactions from happening. An error in one program (module) can then affect other programs, disrupt their functioning and lead to more errors. This error propagation is out of control and leads to unpredictable consequences.

On the other hand, programs in different processes are less fragile. Then the modularity is stronger.

To sum it up:

weak, soft modularity: errors inside one module can propagate to other modules bypassing interfaces.
strong modularity: modules are more isolated and the error propagation is limited to the interfaces.

These two notions are pure and extreme, real systems often fall somewhere in between.

Functions as modules

We are used to think about functions as the building blocks of program code. Functions abstract away the complexity: a caller provides them with arguments, and the function yields the computation result plus side-effects, like file output.

In a high level programming language, functions give us some reasonable modularity (if we do not use global variables or resources). Any function can communicate failure to its caller through returning errors: NULL, instances of optional types, or through exceptions.

Higher level functions and lower level subroutines

The source code is eventually compiled into machine code. Without too much oversimplification, functions in higher level languages are translated into functions in machine code. To make a distinction between them we will call these lower level assembly functions subroutines.

Subroutines should support two kinds of operations:

we should be able to call subroutines from anywhere, and
subroutines should be able to return to the place where they have been called.

On common architectures, there are instructions or instruction patterns that are used to call subroutines and return from them, like call / ret on Intel 64, or bl / pop pc on ARM.

Subroutines are different from functions in one important way: subroutines only operate in a common memory space and do not have their own isolated piece of memory. Functions have local variables and arguments; subroutines can allocate space on stack just for themselves, but are forced to use CPU registers, which are global and shared among them.

To make an analogy with higher-level language, imagine that:

all your functions accept zero arguments;
you are not allowed to create local variables;
you have a limited amount of global variables;
you have to repurpose the same global variables in different functions.

Because of this, modularity supported by subroutines is even "softer" than the function modularity. I wonder if, in this case, we can talk about any modularity at all (except that each subroutine is written in one place in the source code which makes a subroutine a structural module). Maybe the better way to think about subroutines is to accept that they do not exist. Once we get there, we may better understand how assembly programs are written, how they function and what possibilities it offers to us.

The rest of this post uses the assembly code as a device to tell about two interesting concepts: tail-call elimination and coroutines. We are going to optimize the subroutine print_newline in several stages up to the point that it is going to lose its shape as an isolated subroutine and get reduced to one instruction.

; Gets symbol code in rdi and writes it in stdout
print_char:
   ...
   ; the code of print_char is not important to us
   ...
   ret

; Prints newline character
print_newline:
    mov rdi, 0xA          ; code
    call print_char
    ret

The subroutine print_newline is just an adapter for print_char:

  void print_newline() {
    print_char(0xA);
      }

Tail call optimization

Tail call happens when the subroutine ends with a call to another subroutine. In the higher level language this happens in cases such as:

void g();

void f() {
    g();
    return;
}

This is also a tail call, because we just return what another function returns:

int g(int x);

int f() {
    return g(42);
}

This is not a tail call: after calling function we have to multiply its result to another number. We wait until fact(n-1) completes its execution, and then use its result in other computations. Note that in this example the function calls itself rather than some other function, but this is unimportant.

int fact( int n ) {
  if (n < 2) return 1;
  return n * fact(n-1);
}

On the assembly level, a tail call corresponds to the following pattern of instructions:

f:
  ...
  call other
  ret

This pair of instructions is located in a function which we are going to call f. The control reaches this function from some other function, say, f_caller.

Let us follow the state of stack through the execution of f_caller, f and other. For that we expand this code to include f_caller and other functions.

f_caller:

...
   call f
    instruction> ; mark 4
...

f:            ; mark 1
  ...
  call other
  ret         ; mark 3
...
other:        ; mark 2
   ...
   ret

When we start executing f the stack holds the return address inside f_caller (we reach mark 1).
When we call other the stack holds the return addresses for f_caller, then f on top (we reach mark 2).
The subroutine other returns too, in this moment we have only return address for f_caller on top of the stack (we reach mark 3).
The subroutine f returns, popping return address of f_caller from the stack (we reach mark 4).

The last two steps were essentially popping two return addresses from stack consecutively. It suggests that the first one (the return address to f) is useless and we do not need to store it. We are indeed right.

The call other instruction is equivalent to the pair of pseudoinstructions:

push rip    ; rip is the program counter register
jmp other

If we do not need to push return address to f, then we can just substitute this instruction for jmp and get rid of one ret:

f_caller:

...
   call f
    instruction>
...

f:
  ...
  jmp other
...
other:
   ...
   ret     ; will return directly to f_caller

The subroutine other becomes essentially the continuation of the subroutine f; we pass from f to other via a simple branching instruction.

Coming back to our example, we can rewrite it like that:

print_char:
   ...
   ret

print_newline:
    mov rdi, 0xA          ; code
    jmp print_char

Tail recursion

What if other and f are the same function? Then the jmp is performed to the start of f making the whole thing into a loop:

f_caller:

...
   call f
    instruction>
...

f:
  ...
  jmp f

This is what we mean when we say that the compiler optimizes tail recursion into a loop.

Coroutines

Our current example is the implementation of print_newline in two instructions:

print_char:
   ...
   ret

print_newline:
    mov rdi, 0xA          ; code
    jmp print_char

When the example is written like this it is easy to guess the next step. If the execution is sequential why bother with a jump here? The control routinely falls from one instruction to the next one anyway.

print_newline:
    mov rdi, 0xA          ; code
print_char:
   ...
   ret

The subroutines have merged here, and it does not look like there are two separate functions anymore. We got a subroutine with two entry points labeled print_newline and print_char. We call such a thing a coroutine.

Coroutines are a generalization of subroutines, they may have a state and multiple entry points. You may often encounter them or their variations in modern languages e.g. generators in Python or lazy IEnumerable's in C#. Our coroutine does not have a state, however. It is funny to note that coroutines appeared first as a assembly programming pattern and is not a fancy new thing like cubical type theory provers.

If your language supports continuations, you can implement coroutines easily.

Conclusion

I do like to draw connections between seemingly distant notions and areas. I also enjoy learning new points of view on things I already know. My students inspired me to write this post because for a lot of them, coming from higher level languages, the functions were entities with well shaped bodies, having a visible start and end. Well, in assembly they do not have to.

It also supports a way of thinking about compiled code as a "soup" of sorts, where everything is blended together, reordered, precomputed, optimized until only the actual actions and their order suggest how the original program looked like. I think this is especially useful when reasoning about parallel programs and lock-free algorithms.

Writing essays

Sun, 16 Sep 2018 00:00:00 +0200

I am not a naturally good writer. It took me writing a 450 pages book and countless pages of notes, reports, essays and posts to only start building a sane mechanism to identify the exact flaws in my writing (and in the text creation process itself). I found an introductory document on writing essays by J.B.Peterson to be of a great use for me. I think it is valuable for someone like me who has to write a lot of text for his PhD, albeit my domain is computer science. These are my notes for it.

General concerns

Eating well is important. Proteins and fats are good for breakfast.
You have a daily limit of several hours of productive work. This is why working every day is so important.
Try to concentrate for 15 minutes straight, if you succeed the concentration will persist.
If you have writer's block, read. It gives food for thought.
Separate writing (create) from editing (reduce, arrange).
There are rules for writing, which work most of the time. They are dictated by empirical evidence on providing the best reading experience and by the experience of other writers.
First draft should be longer than the final version (probably 25%).
Better grab the reader's attention immediately, so do not forward reference in the beginning.
Write often to structure your thoughts.

Levels of perceptions

There are many choices we make when we write. These choices affect the text semantics. We can categorize them based on the "level of resolution":

Selection of words.
Sentence structure.
Order of sentences in a paragraph.
Order of paragraphs in an essay.
The essay as a whole.
Essay as seen by a reader (prism of his mind and personal experience).
Essay in a context of culture the reader is in.

Method

Choose a topic
Make a reading list and read; take notes (probably 2-3 times the amount of final text you need).
Write an outline (10-15 sentences). Stock introductions/conclusions are ok, but should be thrown away after.
Write a paragraph per outline heading (10-15 sentences). Do not edit too much. A paragraph should present a single idea.
For a single paragraph: place each sentence on its own line. Rewrite each sentence to make it smoother and better. You can read it aloud and listen to yourself. Try to cut length by 15-25%. Do for all paragraphs.
Repeat by looking at each paragraph as a whole. Sentences that are no longer necessary are to be eliminated here.
Read your essay. Try to make an outline without looking at the text. It will help throwing some sentences away, rearrange them etc.
Wait a few days and re-read everything. When you are rewriting something but you are not sure it becomes better anymore, it is time to stop.

Variance in programming languages

Tue, 01 May 2018 00:00:00 +0200

Covariance, invariance, and contravariance are concepts many students have difficulties to grasp. However, the idea behind them is pretty simple. This post will attempt to illustrate that in a shortest and simplest way possible.

Take an imaginary object-oriented language with generic types. Java or C# will do. We are going to draw diagrams, where rectangles represent types and arrows represent a relation "is parent of".

A \(\rightarrow\) B means "A is a parent of B", or, in other words, "A is a supertype of B".

Let's create two hierarchies:

one consists of three classes, each with a type parameter: A, B and C ;
the other contains no parameterized types: x, y, z.

Now we will draw a 3x3 matrix with all possible substitutions of x, y and z as type parameters of A, B and C, like this:

You already see the arrows like A \(\rightarrow\) B; these are not going anywhere, they are valid for any value of type argument.

What is of interest to us is: for a parameterized type such as A, how will A and A be related?

There can be three cases:

Invariance: whatever relation exists between x and y, A and A have no relation whatsoever.
Covariance: if x \(\rightarrow\) y, then A \(\rightarrow\) A.
Contravariance: if x \(\leftarrow\) y, then A \(\rightarrow\) A.

It's all on the picture: for invariance, A and A are not connected for x \(\neq\) y; for covariance, we draw more arrows according to the hierarchy of type arguments themselves; for contravariance, we invert these arrows.

Language prerequisites

In order for variance to even exist we need the language to have the following features:

It has to be typed
It should sometimes allow an entity of type A to be implicitly interpreted as an entity of type B.

The most common cases are:
- Subtyping in class hierarchies.
  
  Basically, everything that has an "Object Oriented Programming" label on it: C++, C#, Java etc.
- Implicit type conversions (coercions).
  
  For example, in C/C++/Java, there are implicit conversions from numeric types into their wider versions, like short to long, or float to double.
We also need parameterized types.

It does not mean a presence of generics though, as functions alone are enough to introduce covariance and contravariance.

In other words, we need to be able to represent types as an oriented graph.

Common facts

Here I want to mention a couple of facts it is useful to be aware of.

Function variance

Functions are contravariant on arguments and covariant on return type. Why so?

A function that returns a Cat can be used to fill a variable of type Animal. Hence we got the "natural" way: Animal is more generic than Cat, function returning Animal is more generic than function returning Cat.
If you need a function that can work on Cat (its argument), it is safe to use a function that can work on Animal instead, or any supertype of Cat. Hence, the function with a more generic argument type is of a more specific type itself.

Variance and mutability

As a rule of thumb, immutable collections can be covariant, mutable collections should be invariant.

Imagine, that a mutable List is covariant. Then take a List. You can reinterpret it as a List, because Object \(\rightarrow\) String. That list can store anything, so it is a valid operation to add an integer to it. From the type perspective, its method set (int idx, T value) will become set(int idx, Object value), so it is valid to give it an integer as a value. If we do it, we are screwed because we just have added an integer to a list that assumes it is holding strings, the objects of an incompatible type, effectively hacking the type system.

The next time when we try to use some function like printString on all elements of the said list, we are up for some surprises, ranging (depending on the language semantics) from runtime errors, to undefined behavior.

In Scala, if you try to define a covariant List, you will get warned on the definition of its set method, that accepts an argument of type T (corresponding to the list contents). As an argument of set, T should be contravariant (as in any function), but as a type parameter of List, it will be marked as covariant (because we made it so in List definition). Hence the warning: "Covariant argument in contravariant position".

Immutable collections do not have such problem. If we try to replicate that example, we will just get a new List of objects, for which it is perfectly fine to store everything you might want it to.

Advice for programming students

Sat, 17 Mar 2018 00:00:00 +0100

There are many things I wish I knew when I started my journey as a programming student. Almost 10 years have passed since, and, sadly, I can not share my experience or insights with my past self, only with my younger colleagues. This post collects some of the most useful bits of advice I wish I heard when I was 18.

Decide who you are

You certainly do not need to be familiar with formal logic or categories if you want to just know one practical thing (say, frontend) and only do it. There are two main paths which differ by an effort, duration and outcome.

You can become proficient in one domain relatively fast — say one, two years. You will not be useless, you will do things and make a living. There are enough job opportunities (at least, for now) which do not demand much versatility.
You can become a well rounded specialist who invested a lot of time and effort in foundations. Then you will be able to adapt and switching career paths becomes relatively easy. You can do machine learning, then formal verification, than some low-level programming for trading or switch to game dev. That demands time and dedication — I’d estimate a minimum of 6–8 years.

I strongly advocate the second path because it’s more versatile, interesting, and brings more in the long run. IT is ever-changing so you want to pick up new technologies fast. You also have more choice. Should you choose the hard way, the rest of this post should be of a use for you.

Learn math because math is useful

I can not stress that enough. When you start, you might think that you don’t need linear algebra, because you are unaware of applications. However, for any non-trivial machine learning, you need it. You need statistics and probability. You need logic, combinatorics, set theory, all sorts of discrete mathematics, graph theory, computability, formal grammars, lambda calculus, formal semantics, topology, type theories, a bit of number theory, groups, rings, fields, categories.

New technologies are constantly emerging. Many of them are based on the existing mathematical models. If you know the underlying mathematics well, you get very nice perks:

Picking new trendy things is orders of magnitude simpler.
You understand where you can apply new methods and where you should not.
You usually understand why the solutions are the way they are. Then you can tweak them to better suit the context.

For example, I have had an impression, that few people understand, that you should not always use least squares to evaluate how well your linear regression fits the data. This is only adequate when the errors are distributed normally with the appropriate mean value. If it is not the case, you will blindly apply an inadequate solution without even thinking that a part of the model needs tweaking.

Learn math to learn mathematical thinking

Writing proofs makes you rigorous. You want to always think about all possible paths of execution your program can take in order to not introduce bugs and security issues. The clarity of thinking gained from constructing proofs is precious. It also helps you in writing short, concise code.

Pick your first language carefully

It should be well designed, which means:

Consistency.
Small core.
No unnecessary complexity (it often comes from inconsistency: there are things you should just remember or constantly be aware of, that bring nothing useful to the table).
Makes it harder to shoot yourself in the foot.
This should also be a high level language, because programming is problem solving, not a mastery of a specific language. Knowing all little particularities of your favorite language is not a mastery of programming in itself.

I advise one of these languages:

Scheme (there is a good classical introductory course “Structure and Interpretation of Computer Programs”)
Smalltalk
Eiffel
ML

Don’t be fooled by their seeming unpopularity, in the programming world the popularity does not mean quality. Do not start with Python, pretty please! It is badly designed, inconsistent, and does not teach you rigorous thinking. No need to get used to "well, it seems to usually work" mentality. Python has its uses, but not as a first language.

Expose yourself to greatness

If you get used to crap languages and crap tools, and crap software, and crap solutions, you will inevitably replicate them in your own work. Be critical, question everything, critic everything, search for inconsistencies and flaws.

For example, imagine you are learning a new language, Go. Google "Go language sucks" and read why people criticize it. Some of them will be pathetic, but some will actually have a point. It is likely, that you will obtain new knowledge from reading critical remarks and evaluating, whether they actually have a point, or are just there to whine.

Think on your own

I am teaching programming (C and assembly) since 2009 to the students in ITMO university in St.Petersburg, Russia. A lot of people have trouble programming and never actually succeed in learning it because of inability of creating code. When they get an assignment, they try to imitate an existing solution, maybe take some snippets from Stack Overflow, tune them to their liking. OK fine, you got your solution, what else do you want?

You should learn to write code from scratch. The types of skills needed for that are so different from meddling with existing code!

Programming is about making conscientious choices. You are in state A (you have access to a number of language features/libraries and you know how to combine them); you want to get to state B (the language constructions are combined in a way to express a solution). How do you build a route from A to B? Now, that is the real programming, the problem solving.

When you start writing programs from scratch, it will be hard, but it is absolutely necessary to learn to build things from zero. To improve your problem solving skills it is crucial that you learn algorithms and data structures. Pick up a good book and solve contests online. I recommend “Algorithms” of Dasgupta for a start, then the classic work of Cormen. This will open a whole new world for you, I promise.

The complimentary part of software creation process is designing the software architecture; it is impossible to learn to structure your programs well without building them from 0 to 100.

Broaden your horizons

Program everyday, do side projects. There is a very easy (and mostly accurate) way for me as a teacher to understand that my student will succeed as a programmer with a high probability. One question: What are you programming in your free time?

There is just not enough time for your teachers to tell you about everything. After all, after you are out of the university, you have to continue to learn on your own, until you retire. If you are passionate about what you are doing, you will explore different types of software just for fun, and that will give you much more experience and skills, than your less motivated peers will have.

Ideally, you should touch everything: write your own compiler, maybe a toy OS, http server, database engine, games, ray casting, build some neural networks, fiddle with proof assistants and dependent types, write a simple mobile app, write for embedded … you go on. Place all your projects on GitHub and take pride in them: your future employer might have a look at it. Use this portfolio to your advantage.

It is common knowledge, that recruiting a good programmer is extremely hard. Many programmers applying for jobs have trouble writing trivial things like FizzBuzz. If you have existing projects hosted on GitHub, the employer will be more assured that you are the real deal.

Expose yourself to different tools and languages

If someone tells you all languages are alike, this is either an oversimplification or a lack of experience. Let me explain that a bit.

A model of computation is a set of basic operations and ways of gluing them together in order to build complex algorithms. Some languages have very similar models of computations, but some are very different.

Programming is so much bigger than your commonly known C/Python/Java/C++/C#/Go/Javascript, which are all built on the same principles: imperative, structural, with occasional bits of OOP and syntactic sugar to mimic other programming styles. The world of programming languages is huge, here is a little taste of it:

Industrial functional programming languages with complex and well thought out type systems (Haskell, Ocaml)
Functional languages with dependent types, which allow not only to program, but to write proofs of correctness (Coq, Agda, LEAN)
Stack based, concatenative languages? (Forth)
Logic programming (Prolog, Refal)
Finite state machines (regular expressions, Promela)
Heavily extensible languages allowing to implement virtually any syntax constructs, as Lisp, Forth, Camlp4/5 or Rebol allow.
Domain-Specific Language workbenches such as JetBrains MPS or XText

Every new model of computations is hard to learn, because it is a new way of thinking for you. But the investment is worth your time, because once you get familiar with it:

Every language based on it is easy.
Every language whose model of computations has similarities is easier.
As every model of computations is very fitting for specific type of problems to solve, you now have a new powerful tool, whose usage in specific contexts is orders of magnitude more productive.

Be social

I have been very fortunate to know some amazing people. My mates helped me to perfect my skills, to learn something new, to see the world from a different point of view. Isolating yourself will bring you no good in the long run: you need other people to discuss, to see what they are up to, what they think. If your mate has read an interesting article and told you about it, you just saved a lot of your own time, because he spoon-fed you with a processed, crystallized knowledge.

Stick with passionate, smart people, and try learning from them

You will be surprised, how much you can learn during a lunch time with your mates, who are eager to share the details of their work or research. This kind of idea cross-pollination is one of the main reasons corporations like Google give you free quality food.

Ask people who are better in coding for code reviews and read their code

Looking at someone’s work, given he is better than you, can teach you a lot, in ways you do not expect. Code reviews are even better, because the guy will tell you, how he would have written the same code. This is probably one of the most effective ways to become a better coder very, very fast.

Write tests

This is so important, that it has a section on its own. Tests are an integral part of creating software, and even guys like me who are working on formally verified software (which means it should be mathematically proven correct) are writing tests, albeit one might think, that the guarantees given by proofs are strictly stronger.

I hope that this might actually help someone to get a bigger picture, learn faster and become a better programmer; should you have any questions, I will be glad to help. Good luck!

Why every programming student should learn Coq

Sun, 11 Mar 2018 00:00:00 +0100

My personal experience with Coq proof assistant over the last years made me think, that such tool, as exotic and niché as it might seem, is invaluable in a programmer's education. Maybe we should include it in common Software Engineering and Computer Science master programs, so that students prove theorems using it.

When we are learning to program, we are trying to make a piece of code "just work". It means, that in a certain context it should demonstrate an expected behavior. Kind of like:

We are currently in a context \(A\)
We want to get to another context \(B\)
How do we get from \(A\) to \(B\)?

Then we are building a system of language constructions in order to be able to finally combine them into a more or less straight road from \(A\) to \(B\).

And here is the source of a vast majority of programming errors: the definition of a working program. A program "works" when it does not demonstrate an unexpected behavior in any possible context! So, our roads we have built should never allow us to get somewhere we do not want to go. Most programmers are usually OK with a fairly superficial analysis of possible values and program behaviors. Behaviors in plural is not a mistake here, because many languages are allowing for a non-deterministic model of computation. It means, for example, that in C we do not know whether f or g will be called first in this piece of code:

int x = f(4) + g(2);

Now add for a total of 10 functions and make sure all of them have side effects (like logging or networking) to make sure a compiler will chose one of \(10!\) possible behaviors:

int x = 
  f1(42) +
  f2(42) +
  f3(42) +
  f4(42) +
  f5(42) +
  f6(42) +
  f7(42) +
  f8(42) +
  f9(42) +
  f0(42);

Most languages used in either industry or for education are giving too much liberty to a coder, the right to which is usually exercised in the least convenient places. For example, the switch construction which does zero reasoning about whether the cases are exhaustive or not.

The first step towards more correct programs is learning functional languages with well thought out type systems, like Ocaml and Haskell. Their syntax and semantics are implemented in a way that makes you think about all possible branches of execution in a more concise way. I mean such features as pattern matching, expressions and statements not being separated into two different syntactic categories, else branch being mandatory. This make you adapt a better reflex of thinking about all possible behaviors in a given context and cutting unwanted roads right away. This reflex stays with you no matter what language you are programming in, and will make you a better Java or C programmer.

Now, Coq is doing that at an extreme. You have to write proofs. Writing a non-trivial Coq program and proving it correct is hard and verbose; this is an excellent exercise of thought discipline. Every inference should be explicitly stated.

Surely, you might ask, why can't I just do more mathematics? It is indeed right, that building mathematical proofs as we are doing in e.g. calculus is aimed at developing the same skills. However, using automated proof assistant makes this exercise much more efficient because of the feedback look. Your proof is being constantly checked and the prover does not allow you to complete it if you forget a corner case here and there. Such proofs are also much more verbose, because of being highly formal.

Memory in CompCert: overview

Wed, 17 Jan 2018 00:00:00 +0100

CompCert is a certified C compiler written in Coq. I work with it to create a verified refactorer, that is proven correct w.r.t. operational semantics of C. During my journey, I am tackling various parts of CompCert, including its memory model. It might seem interesting, how they dealt with raw memory and handled things like encoding values.

There were several versions of memory specification. This note is describing the second version, as of CompCert 2.6. Some minor details are omitted for brevity (like alignment checks); I will probably make more posts about memory and connected subjects. But first we have to take a look at Radix trees, a data structure pervasive in CompCert.

Radix (Patricia) trees in CompCert

Radix tree is a data structure to implement partial mapping from integers. It is extensively used to implement all sorts of maps, such as:

Memory: maps block IDs into block contents
Memory block: maps offsets into block elements.
Symbol table: maps global symbol IDs into memory blocks.
Environment: maps local symbol IDs into pairs of block ID and symbol type.

The type is defined in CompCert in lib.Maps.PTree:

Inductive tree (A : Type) : Type :=
   | Leaf : tree A 
   | Node : tree A -> option A -> tree A -> tree A

A node with no children (a leaf in terms of trees) is encoded as Node Leaf val Leaf. Nodes can store values or be empty. The latter is useful because in Radix trees the paths themselves matter.

Positive integers in Coq

Positive integers are represented through their binary encoding:

Inductive positive : Set :=
      | xI : positive -> positive 
      | xO : positive -> positive  
      | xH : positive

Since positive integers always start with leading one, we are using xH to encode it. Applying xO appends zero, applying xI appends one. The result looks like the binary representation of a number written from right to left.

For example, an integer number 11 is represented as 1011 in binary form. The corresponding term in Coq will be:

xI (xI (xO xH) )

Paths are positive integers

There is an isomorphism between positive integers and paths in PTrees.

All paths in PTree start at root, just like positive integers start as xH. Then they either go to the left or to the right, which corresponds to the choice between applying xO or xI.

Encoding maps using Radix trees

Radix trees are used to encode partial maps from positive integers into some domain. The domain values are stored into nodes; the path to the node corresponds to its index.

Let us encode a map:

\[\begin{cases}% 1 \mapsto \text{"one"}\\ 3 \mapsto \text{"three"}\\ 4 \mapsto \text{"four"} \end{cases}\]

As a tree, it will look like this:

As we see, the elements are enumerated according to the breadth-first search order.

Other tree-related types

PTree is a type of a Radix/Patricia tree; PTree.get returns None if no element has a given index.
PMap is a pair of PTree and a default value returned by PMap.get instead of None
ZMap is PMap where any integer can be used instead of only positive ones. It is done through a bijection between all integers and positive integers.

Memory: basic notions

Memory is a mapping of addresses into block contents coupled with permissions map and some constraints.

*Block contents/ maps offsets (integers) into memory values.

All pointers are pairs of block index and an offset inside a block, so it is impossible to jump from one block into another by changing the offset value. In other words, blocks can not overlap by design.

Value

Values are encoded as follows:

Inductive val : Type :=
    Vundef  : val
  | Vint    : Int.int        -> val
  | Vlong   : Int64.int      -> val
  | Vfloat  : Floats.float   -> val
  | Vsingle : Floats.float32 -> val
  | Vptr    : block          -> Int.int -> val

Memory value

Memory value is defined as follows:

Inductive memval : Type :=
    Undef    : memval
  | Byte     : int   -> memval
  | Fragment : val   -> quantity -> nat -> memval

Such memory values are assigned to addresses, which means that a block of \(n\) bytes holds \(n\) such values.

Undef is used to mark the cell as uninitialized. All reads involving such

cells are resulted in Vundef value returned.

Byte is a concrete 8-bit integer. It is also a type of raw memory. Such raw

values can be taken in a pack and decoded in an architecture-dependent way using decode_val.

Fragment is a usual case of storing data. It contains an opaque value, a quantity and an index, showing where exactly are we in this value.

Additional arguments include:

A quantity is either Q32 or Q64. 8-byte values have quantity set to Q64, the other ones are Q32.
An offset in ranges 0..3 or 0..7. As each element of a block represents a single byte, we are storing as many consecutive Fragments as there are bytes in a value. Each Fragment stores its index w.r.t. the value's beginning address.

Note The source [1] states that only pointers are stored inside fragments, while other values are split into bytes according to the architecture specification. The lemmas however never impose such restriction, making cases like Fragment (Vint 4%Z) _ _ possible by construction (and appear in proofs).

For example, executing this assignment:

int32_t x;
int32_t* px;
... 

px = &x

results in the following values being written into memory (assuming x inhabits in the block #4 and a pointer is 32 bits wide).

(Fragment ((Vptr 4 0) Q32 0) ::
(Fragment ((Vptr 4 0) Q32 1) ::
(Fragment ((Vptr 4 0) Q32 2) ::
(Fragment ((Vptr 4 0) Q32 3) :: nil

Permissions

Every address has two associated permissions (access rights) . They put constraints on which operations with are allowed on it.

Permissions are shown in the table below:

Permission	Read	Write	Free
Freeable	+	+	+
Writable	+	+
Readable	+
Nonempty

Non-allocated and freed cells can have None as permissions.

The first permission value associated with a memory byte is its maximal permission: it is set on allocation and can be lowered during execution using drop_perm operation. The current permission is varying between Nonempty and maximal permission.

Operations on memory

Memory itself is not opaque and we can have easily access to the inner tree structure of its contents. However, all memory-related lemmas defined in CompCert rely on specifically crafted memory operations. These are opaque and can not be unfolded into their exact definitions. For us it means that we can only prove our own theorems based on such properties of these transformations, that are already proven in CompCert.

alloc       : mem -> Z -> Z -> mem * block
free        : mem -> block -> Z -> Z -> option mem

load        : memory_chunk -> mem -> block -> Z -> option val
store       : memory_chunk -> mem -> block -> Z -> val -> option mem 

loadbytes   : mem -> block -> Z -> Z -> option (list memval)
storebytes  : mem -> block -> Z -> list memval -> option mem

drop_perm   : mem -> block -> Z -> Z -> permission -> option mem

Note For now, we are only going to study load, store, loadbytes and storebytes operations.

load accepts a chunk type (Mint32,=Mint8signed= etc.), source memory, block ID and offset. It returns a decoded value or None.
store accepts a chunk type (Mint32,=Mint8signed= etc.), source memory, block ID and offset, and a value. It returns an instance of memory with overwritten cells or None.
loadbytes accepts source memory, block ID and offset, and the amount of bytes to load. It returns a list of memory values or None
storebytes accepts source memory, block ID and offset and a lit of memory values. It returns an instance of memory with overwritten cells or None.

There are lemmas that allow reasoning about load results involving:

Previous store result
loadbytes
extends and injection
unchanged_on
decode_val with a direct memory access.

The following lemma allows us to get raw contents from memory and decode them using decode_val

load_result:
  forall (chunk : memory_chunk) (m : mem) (b : block) (ofs : Z) (v : val),
  load chunk m b ofs = Some v ->
  v = decode_val chunk
    (getN (size_chunk_nat chunk) ofs (PMap.get b (mem_contents m)))

Memory: implementation

Memory is represented as a record:

Record mem' : Type := mkmem
  { mem_contents : PMap.t (ZMap.t Memdata.memval);
    mem_access   : PMap.t (Z -> perm_kind -> option permission);
    nextblock    : Values.block;
    access_max   : forall (b : positive) (ofs : Z),
                   perm_order'' (PMap.get b mem_access ofs Max)
                   (PMap.get b mem_access ofs Memtype.Cur);
    nextblock_noaccess : forall (b : positive) (ofs : Z) (k : perm_kind),
                         ~ Plt b nextblock -> PMap.get b mem_access ofs k = None;
    contents_default : forall b : positive,
                       fst (PMap.get b mem_contents) = Undef }

All mappings are implemented as Patricia Trees.

mem_contents maps block IDs (positive integers) into block contents. It is a map implemented on top of a Radix tree. Block contents are maps from offsets (integers, possibly negative) into block elements.
mem_access maps block IDs into functions, which accept an offset, a permission type (Max or Cur) and return the actual permissions for this block.
nextblock is the maximal block ID. All blocks with ids in range from 1 inclusive to nextblock exclusive should exist.
access_max encodes the following property: for all blocks their maximal permissions are higher than their current permissions.
nextblock_noaccess encodes the following property: no block has an ID greater or equal to nextblock
contents_default encodes the following property: for all blocks the default memory cell value is Undef.

Useful sources

"Program Logic for Certified Compilers". Chapter 32 "The CompCert memory model"

A beautiful intuition on associativity

Sat, 13 Jan 2018 00:00:00 +0100

I found a beautiful explanation about what essential property does the associativity capture: You can think of each element of a monoid as having two sides. The idea is that the left side and right side are independent things that don't interfere with each other.

For example, adding an element at the beginning of a list is independent from adding something at the end of a list. These actions do not affect each other, and it doesn't matter which you do first. That's the idea that associativity captures.

source

On teaching programmers and mathematicians

Fri, 27 Oct 2017 00:00:00 +0200

I have been lucky to be exposed to some very good teachers with different approaches as well as collect my own experience: I am teaching C/Assembly since around 2009. I've also taught things like Lambda-calculus and functional programming, mathematics and playing piano. This post is intended as a summary of how I see an ideal education in virtually any domain. I will speak about teaching mathematics and or programming; the principles however are sufficiently abstract to be applied anywhere.

Based on how our memory and perception work, the following points seem to be generally true:

We remember better when the brain connects information to an emotional event.
We need to connect new knowledge to the knowledge we already have.
We need to repeat new things several times over a short period of time in order to forget them on a much slower pace. See Forgetting curve (Wikipedia).
Different people base on different types of perception: sound, visual, touch. That's what they remember most easily; it does not mean that even if information is not representable in audible format, it should be presented like this to those who tend towards sound perception.
Some people like the top-down approach (deductive thinking), while others prefer generalize examples (inductive thinking).

Based on them, I think that the ideal way to teach should incorporate the following points:

We need charismatic teachers with great personalities who can inspire emotions. An average teacher in person is worse than a world class teacher on tape. By no means do I mean sweet talking “popular science” guys, they are often superficial and empty.
Every class should draw examples from the previous ones and force students to decide, how to apply the knowledge from the previous courses in the course they are studying (very important!)
A lot of practice, and all exercises should not only be written on paper, but explained by students to the teacher or his assistant to enforce better understanding, enable other types of memorisation and expose the flaws in reasoning.
Use a lot of illustrative material. Slides are mostly useless, but infographics and videos are VERY useful when done correctly. Check, for example, this YouTube channel:

An animation, or, even better, an interactive playground are perfect to provide an intuition about a mathematical notion. This is because we need to connect mathematics to the sensory experiences.

Very few people can memorize a lot of abstract stuff and then deduce everything from it when asked to apply it in practice. Hence we need to provide examples. In my opinion, some nice explanation patterns are:
- Deductive and inductive: start with definitions, show examples, explain how exactly each example follows from definition
- Inductive and deductive: start informal, give intuition, provide examples, then give precise formal definition, show how exactly the given examples are generalized to these definitions. Then give more examples, but derive them from formal definitions.

Your students should know, which topics will be covered during the next session. Ideally, they should start studying them on their own. Then when they come they will be more prepared to listen to you and their minds will produce more useful questions. you more prepared.

Impressions of René Magritte

Tue, 18 Jul 2017 00:00:00 +0200

Recently I've been lucky to spend a couple of days in Belgium. I've come to Bruxelles, rushed through Brugge and Gant and ended my journey by visiting René Magritte museum. I was quite impressed, partly because I have not seen much of Magritte before. This post is intended as a quick review of my impressions and thoughts on Magritte style and his philosophy.

First impressions

Magritte's style differs from his fellow surrealists such as Dali or Ernst. I've often got an impression that his approach is quite blunt, a bit like if he threw objects into my face:

The objects on his pictures are big, take a lot of space on canvas. Moreover, the background is empty and not detailed.
The palette has usually few colors. Speaking of modern times, it is the same kind of restriction that exists in pixel art, and produces a similar effect on me, that I rather like.
The colors are rarely bright, which, in conjunction with the previous point, makes paintings look surreal, mystic and alien.
The shadows and shapes are nicely detailed which makes for a highly contrast image (although you won't see many sharp edges).

The Riddler

Magritte's approach is also quite intellectual because he wants to mess with the viewers and make them think (almost a quote of his). Each pair of a picture and its name is a puzzle to solve. The reward is a better comprehension the message of a picture, which in turn can bring you some new understanding about the real world. The name and the picture can be connected in a number of ways, for example:

The name can be a hint to solve the riddle. The picture named "Explanation" depicts two bottles, one of which looks like half-bottle, half-carrot. As we see the image and its title, we come to an understanding of what an explanation is, its idea: the explanation is a process of changing our perception and/or understanding of concepts. This change is the transformation of an image of a bottle into an image of a carrot.

The name can be just something vaguely connected to what's on the painting. For example, the name "Imp of the Perverse" should make you think that something wrong is going on, make you uncomfortable, which, in turn, strengthens the feeling of wrongness coming from the piece of art itself. In a sense, Magritte wants you to think critically, not make you trying to connect distantly related terms. We can find a context to associate any pair of words if we want to, though not all contexts are of importance to us.
The name might hold an allusion to other works of art Magritte associated his painting with. For example, "The Man from the Sea" refers to the film of Marcel L'Herbier, which tells a story of a lone sailor. More importantly, it resembles the final scene of "Juve against Fantômas" where this villain in black prevails by setting off an explosion with a pull of a lever.

Objects, their images and words

What's especially interesting about Magritte's philosophy is his reflections on images, words and real world objects as well as the relations of the said three worlds. He published his famous text "Words and images" in 1929 in the journal La révolution surréaliste as a result to his extensive experimentation in the late 1920's.

This text studies how words (e.g. poetry) and paintings as the means of expressing ideas. Here is the text in English with a little commentary (it corresponds to the images from up to down, from left to right):

An object and its name are not inseparable. We can find a better name.
Some objects have no name.

Words are parts of languages, and there is no a language which includes all notions from any other one. Take Dostoevsky's "nadryv" or dozens of words to describe different shades of snow in the local languages of northern tribes. We create names based on our everyday needs and what seems important to us.
A word can be used to refer you to itself.

This thought is illustrated by a word "sky" inside a closed curve, depicting the said sky. As the label "sky" is placed directly on the image of sky, it refers to this exact instance of "sky", which makes for a funny wordplay.
Sometimes a name of an object meets the object depiction.

This is illustrated with a labeled image of a forest.
Sometimes, a name is used in place of an image.

The picture shows a stub silhouette of an object labeled with its name.
In reality, a word can be used instead of an object.
A word can be substituted by an image.

A pictogram of a sun is used to substitute the word "soleil" (fr. sun).
An object can suggest that there are other objects behind.
Everything suggests that there is little common between an object and its representation.
The words used to represent two objects do not suggest what is different between them.
On a painting, words are like images.
The images and words are seen in an unusual way when on a canvas.

The illustration shows how a written word "montagne" (fr. mountain) blends in and becomes a part of the object texture.
An arbitrary figure can replace an image of an object.

I am not sure what this means, because the words (labels) are not mentioned. However the illustration has all these random objects labeled as the sun. When the object is labeled (possibly with an unexpected word), that's one thing, but when we deduce that the object is replacing sun just because he is in the context (position, effects etc.) we are used to see the sun in, that's entirely other thing.
The object is never the same thing as its image and/or its name.

I can not stress this enough, as this is quite a useful piece of the puzzle. I will dedicate the next section to this.

Visible contours of the object form a mosaic in reality.

This is a beautiful thing to notice. I can deduce, that every point of the space can be named based on it being a part of some part of this mosaic.
The vague shapes have the same significance as the well defined ones.
Sometimes the labels define precise things, and the images are not.
Sometimes, it is vice versa.

Images, words and the real world

In this section I want to explore the following thought of Magritte: The object is never the same thing as its image and/or its name.

The real

We are not reasoning about the real world, but about its projection in our head, a kind of a logic system. This is largely used in philosophy, for example, to construct some theories of causation.

We are fundamentally limited like that. It is evident, that our image is only a part of the big picture. So when we reason about real life, we base on the information that could have been warped only once: by our perception. I think, this is also why methods as palpation still hold such an important place in medicine: there are as little between the doctor and the patient as possible.

The image

If we use an image of reality, someone has to create it first – we will call him Creator, and we are his Observers. Where can we get disrupted?

Creator's perception of reality is not perfect: for example, he can have a mild color blindness; he is also unable to perceive all the infinite details that do exist.
Creator's way of drawing may be flawed.
Observer's perception of an image can be flawed.
Observer's interpretation is flawed.

The word

Words are purer than images in a sense that they represent crystallized ideas. However, the exact meaning of a word can slightly vary because of the differences in cultural background of different people. The existence of dictionaries, however, helps finding a certain common denominator between them, which would be accepted as a norm by most people. We will call the agents Speaker and Listener.

Speaker can mistake an object for another.
Speaker can make a bad choice of word.

The word may seem more precise, but it can be ambiguous (multiple meanings in dictionary). Whether it will hold more or less information than an image is debatable and varies from case to case.

It is important to note, that we are only speaking about singular words. They can be ambiguous, but interpreting a word differs from interpreting sentences, which differs from interpretation of texts, which differs from interpretation of texts in specific contexts.

In fact, Magritte's study is about semantics and human interpretation, concentrating especially on human perception of images and words on them. One might say that Magritte was one of pioneers of semiotics in scope of art.

The virtues of using goto

Wed, 26 Apr 2017 00:00:00 +0200

IT folks are prone to prejudices, as we all are. Once a beginner programmer starts exploring the world of coding, he quickly learns catchy memes from the more experienced part of the community. One can then easily live by them without putting much thought in their meaning.

In general, this is also how human society works: we adopt elements of our parents' lifestyle and their world view, we make them parts of ourselves, usually not questioning the reasons for this specific behavior. However, there is one fundamental empirical rule: there are no silver bullets – rules, that are always applicable.

Thus, each of programming memes requires a certain context to be fully understood. My opinion is that we should always try to reach these deep foundations in order to understand when the rule can be thrown away for good. This post is about an absurd meme "goto is bad". We want to explore what exactly prevents goto from being used well in some contexts and when its use is quite reasonable.

Why goto is stigmatized in the structured programming

Structured programming is a programming style that (as each style) implies a set of rules and constraints to live by. This style implies an extensive usage of subroutines, loops and code blocks (statement sequences between braces). It usually goes in par with imperativeness, when each programming statement is executed sequentially, and mutable addressable memory – roughly following the von Neumann model of computations. It is safe to say that one of its most devoted supporters was Edsger W. Dijkstra. Object-oriented programming can often be viewed as a slightly tweaked version of it. A large slice of modern programming is structured imperative programming or its variations; moreover, it is what kids are taught in school. Because of that educational system flaw, most of us are deeply infected with an imperative thinking and are often emulating other paradigms on top of it in our heads.

Dijkstra considered goto harmful because it makes harder to follow the code logic. This is usually true, but needs a clarification. What makes goto harmful is that it is usually paired with assignments.

Assignments are changing the abstract machine state. When reasoning about a typical program, we are usually tracing its execution and mark how the values of the variables are changing. Throwing goto ’s everywhere makes it notably harder to follow the program state, because you can jump anywhere, from anywhere. That makes the trace much harder to untangle.

However, throw away the state changes and you will have no problems using multiple goto’s in an isolated piece of code (e.g. inside a particular function), because the interesting state will be determined solely by your current position in code!

Finite State Machines

If we want to implement a Finite State Machine (FSM), then goto ’s are the way to go! Such an abstract machine consists of:

a set of states (C labels). One state is marked as an initial state.
input (sequence of global events, f.e., character input, received network packets, any user actions)
output (sequence of global actions, responses of the system: send packets or control signals to the connected hardware, output etc.)
for each state, a set of rules to jump to other states based on the current input.

We start in the initial state and perform jumps between states based on the current input value. As you see, this machine has no memory. If we are implementing them in a language such as C, its state will be characterized solely by the position in the code we are currently at.

Crafting a FSM is equivalent to crafting an algorithm to solve a problem. They are not expressive enough to solve all problems Turing-machines can digest. Nevertheless they are not only potent, but very convenient for some tasks such as template matching in strings, implementing network protocols and robot controlling tasks.

Here is a toy example, taken from [my book](http://www.apress.com/us/book/9781484224021). This FSM checks whether the input string containing only characters 0 and 1 contains an even number of ones. It is common to draw cool looking diagrams for FSM, showing states as circles and transitions as arrows between them.

Let us take a look at its implementation in C. I have omitted error checks for brevity.

  #include <stddef.h>
  #include <stdio.h>

  /* str should only contain 0 and 1 */
  int even_ones( char const* str ) {
    size_t i = 0;
    char input;
   _even:
    input = str[i++];
    if (input == '1') goto _odd;
    if (input == '0') goto _even;
    /* end of string -- null terminator */
    return 1;
   _odd:
    input = str[i++];
    if (input == '1') goto _even;
    if (input == '0') goto _odd;
    /* end of string -- null terminator */
    return 0;
  }

  void test( const char* str ) {
    printf("%s\n", even_ones( str ) ? "yes" : "no" );
  }

  int main(void) {
    test("0101011");
    test("");
    test("010");
    test("110");

    return 0;
  }

Model checking

An additional benefit is that there exist a quite powerful verification technique called model checking which allows you to reason about the program properties if the said program is encoded as a finite state machine. You can reason about them using temporal logic, checking properties such as "If I got into the state A, I will never reach the state B from there". The model checkers can often generate C program automatically from a FSM description.

For examples of what model checking tools are capable of, I recommend you this exercise page for SPIN model checker.

Deinitializing resources

C++ has a nice feature C lacks. It can automatically call object destructors whenever the object's lifetime is over. For example, in the following code the destructor for an object myC will be automatically called after we reach the closing bracket. But it gets better: every time you are writing a return statement, everything that exists in the surrent stack frame gets automatically destroyed in the correct order (reversed initialization order). Consider this function, which returns the error code and uses three objects: myA, myB and myC. The respective classes should have defined destructors which free all associated resources.

  #include <iostream>

  int f() {
    A myA;
    B myB;
    C myC;

    if (! myA.init() ) return 0;
    if (! myB.init() ) return 0; // myA's destructor is called
    if (! myC.init() ) return 0; // myA and myB's destructors are called

    //...

    // Destructors for myA, myB, myC will be called here anyway
    return 1;
  }

In C we often want to do the same thing, but we do not have that luxury of automatically calling anything. It is, however, very important do to because some structures have dynamically allocated fields or are associated with other resources such as file descriptors. It can easily leak resources. So, to do things right, we have to produce quite a mess:

  int f() {
    struct sa a;
    struct sb b;
    struct sc c;

    if ( ! sa_init( &a ) ) return 0;
    if ( ! sb_init( &b ) ) { sa_deinit( &a ); return 0; }
    if ( ! sc_init( &c ) ) { sb_deinit( &b ); sa_deinit( &a ); return 0; }


    return 1;
  }

Imagine you had 5 structures to work with, this straightforward approach is going to turn your code into nightmare! However, with the help of goto's we are going to exploit a nice little trick. It bases on the fact that all such branches can be ordered by inclusion: each branch looks exactly like some other branch preceded by an additional deinit:

  //
  return 0;
  sa_deinit( &a ); return 0;
  sb_deinit( &b ); sa_deinit( &a ); return 0;

If we throw labels in between we could jump to any statement in this sequence. Then all following statements will be executed as well. This way we are going to refactor the example above to look like this:

  //
  int f() {
    struct sa a;
    struct sb b;
    struct sc c;

    if ( ! sa_init( &a ) ) goto fail;
    if ( ! sb_init( &b ) ) goto fail_b;
    if ( ! sc_init( &c ) ) goto fail_c;

    return 1;

   fail_c:
    sb_deinit( &b );
   fail_b:
    sa_deinit( &a );
   fail:
    return 0;
  }

Isn't it way nicer that what we have seen before? Additionally, no assignments are performed hence no fuss about goto evilness at all.

Computed goto

Computed goto is a non-standard feature supported by many popular C and C++ compilers. Basically, it allows to store a label into a variable and perform jumps to it. It differs from calling function by pointer because no return is ever performed. The simplest case is shown below:

  #include <stdio.h>

  int main() {
    void* jumpto = &&label;

    goto *jumpto;
    puts("Did not jump");
    return 0;

   label:
    puts("Did jump");
    return 1;
  }

We are taking raw label address using an unusual double ampersand syntax and then perform a goto. Notice the additional asterisk before goto operand. When launched, this program will output Did jump.

Where can we use such a feature? Expressivity wise, it is not very interesting. However, sometimes we can get a speedup. A case that comes to my mind is a bytecode interpreter (but I have written hell of a ton of them, so I should be quite biased towards them). The instruction fetching takes typically no less than 30% of the execution time, and computed goto allows one to speed it up.

Without computed goto:

  #include <stdio.h>
  #include <inttypes.h>

  enum bc { BC_PUSH, BC_PRINT, BC_ADD, BC_HALT };

  uint8_t program[] = { BC_PUSH, 1, BC_PUSH, 41, BC_ADD, BC_PRINT, BC_HALT };

  void interpreter( uint8_t* instr ) {
    uint8_t stack[256];
    uint8_t* sp = stack;

    for (;;)
      switch ( *instr )  {
      case BC_PUSH:
        instr++;
        sp++;
        *sp = *instr;
        instr++;
        break;

      case BC_PRINT:
        printf( "%" PRId8 "\n", sp[0] );
        instr++;
        break;

      case BC_ADD:
        sp--;
        sp[0] += sp[1];
        instr++;
        break;

      case BC_HALT:
        return;
      }
  }

  int main() {
    interpreter( program );
    return 0;
  }

With computed gotos:

#include <stdio.h>
#include <inttypes.h>

enum bc { BC_PUSH, BC_PRINT, BC_ADD, BC_HALT };

uint8_t program[] = { BC_PUSH, 1, BC_PUSH, 41, BC_ADD, BC_PRINT, BC_HALT };

void interpreter( uint8_t* instr ) {
  uint8_t stack[256];
  uint8_t* sp = stack;

  void* labels[] = { &&label_PUSH, &&label_PRINT, &&label_ADD, &&label_HALT };

  goto *labels[*instr];

  label_PUSH:
  instr++;
  sp++;
  *sp = *instr;
  instr++;
  goto *labels[*instr];

  label_PRINT:
  printf( "%" PRId8 "\n", sp[0] );
  instr++;
  goto *labels[*instr];

  label_ADD:
  sp--;
  sp[0] += sp[1];
  instr++;
  goto *labels[*instr];

  label_HALT:
  return;
}

int main() {
  interpreter( program );
  return 0;
}

We replaced switch with an array storing the address of an instruction handler. Each time we fetch an instruction, we are using its bytecode as an offset in this array. After taking an address from there, we jump to it. The larger the instruction set and the switch gets, the more noticeable gets the difference in a real world program.

Using switch slow us down for two reasons:

It is forced to perform the bounds check according to C standard. If no case exists for a switch expression, and the default case is missing as well, no part of the switch body is executed¹. Computed goto will just result in an undefined behavior in case of an invalid opcode.
When using switch, there is a single point where the decision about where are we going is taken. In case of computed goto, such decisions are taken at the end of each instruction handler. It makes CPU hold separate prediction histories for each decision making point, which is good for dynamic branch-predicting algorithms.

Starting with Haswell architecture the branch prediction algorithms were tuned so that switch is predicted better, so the performance gain from using computed goto is not that substantial.

P.S. If you really want to make a faster interpreter, consider implementing indirect threaded code, direct threaded code or write a JIT compiler. Computed goto is not a magical thing to make your interpreter as fast as possible.

Footnotes:

C99 standard, Section 6.8.4.2.

Proving dependent equalities in Coq with SSReflect

Fri, 29 Jul 2016 00:00:00 +0200

Proving dependent equalities in Coq is boring, but quite frequently done. I got so annoyed with it that I wrote a little tactic to automatize it a bit.

It does use ssreflect routines but it should not be hard to adapt it to vanilla Coq.

Ltac depcomp H := apply EqdepFacts.eq_sigT_eq_dep in H; 
apply Coq.Logic.Eqdep.EqdepTheory.eq_dep_eq in H.
Ltac eq_comp c x y := 
  move: (c x y); 
  case; 
  last try do [by [right; case]| 
  right; case; let H' := fresh "H" in move=>H'; by depcomp H'];
  first try let H' := fresh "H" in move=> H'; subst.


Definition eq_dec T := forall x y: T, {x = y} + {~ x = y}.

(* Example *)
Theorem pair_eq_dec T U: eq_dec T -> eq_dec U -> eq_dec (prod T U).
move=> HT HU [x y] [a b].
eq_comp HT x a.
eq_comp HU y b.
by left.
Qed.

Causation in modern philosophy of science

Sat, 16 Jul 2016 00:00:00 +0200

Our minds use the notions of cause and consequences all the time. It is, without a doubt, one of the fundamentals of human reasoning. However, when observed closely and formally, it becomes apparent that our concept is based rather on intuition and lacks strictness and connection to the real word. This post is intended as a quick introduction to causation from the philosophical point of view.

General thoughts

Where to establish the causal relation? We are going to use propositions, which represent quantitative properties of various systems in a given moment of time, that is, measurable properties. Events represents the changes in these properties. These events will be selected as causes and consequences. This way we are building an expressive language to construct statements of a scientific value. Contrary, people often think about events that can either happen or not. In our system such events are easily modeled as Boolean properties (taking value of either 0 or 1).
There exist two seemingly equally justified points of view on the place of causal relations in the world.
- There are common causation laws that rule the world; then they are instantiated in different situations with different events.
- Causes and consequences are an indispensable part of the world, all generalizations are secondary.

Hume's arguments

The arguments we are going to study can partly be traced to XVIII century! Meet David Hume, a philosopher.

Hume’s account on causes and consequences is highly empiric. What an observer (even ideal one) can observe about two events A and B is:

\(A\) occured before \(B\).
\(A\) and \(B\) are close in space and time. Or they are connected by a chain of events, where each link is a relation between two events which satisfies these properties.
When we observe something resembling \(A\) again, something resembling \(B\) appears as well.

It is suspicious that these three observations sum up everything an observer can notice — even an ideal observer. There is no way this information might be of a foundation for a strict causal relation as we usually imagine it.

Adding causality changes absolutely nothing here, because it does not give us anything observable.

In fact the pattern above can describe many events not necessarily connected in any way. Hume’s opinion is a great starting point, but now let us also address some problems about it.

Three properties of \((A,B)\) pinpointed above occur not only in cases where we want to establish the relation, but also:

When \(A\) and \(B\) have a common cause
Just by coincidence
By preemption. It means that there is an event \(C\) that occurred before \(A\)

and would have caused \(B\) anyway.

These three major problems are addressed differently depending on how one sees causation in general, which features of the cause-consequence pair are really key. Let’s now speak about those ways.

Sufficiency

To this day people tend to think about causation as a deterministic beast. Our knowledge about micro world, however, contradicts it (as many things about micro world contradict common sense). Contrary to some traditional views on causation based on necessity of the cause to bring about its effect, their modern counterparts fit mostly in two categories:

Those who base on regular occurrences of \(A\) followed by \(B\).
Those who reconstruct the events of real world in the other, purely logical world.

Let’s take a closer look at the second point. In order to construct our logical structures we can use certain sets of rules (they can represent f.e. the laws of nature) of form:

\(\forall x, F \ x \Rightarrow G \ x\)

Here for an event \(a\) the expression \(F a\) will be substituted by an event instance; \(G a\) is the consequence. As for the arrow, we can substitute it for either a material conditional (think simple implication) or something stronger like a subjunctive conditional ("If Oswald hadn’t killed Kennedy, someone else would have").

Necessity

You know, sufficiency is not sufficient for causation. Even if we stick to determinism, the overdetermination (multiple causes) alone is a valid reason to question it. Let us say, \(A\) and \(B\) can both equally cause \(C\) and they occurred simultaneously. What caused \(C\)?

Basing on sufficiency of cause we can deduce:

If \(A\) was the cause, then \(C\) would not have occurred without \(A\). So \(A\) is not the cause.
If \(B\) was the cause, then \(C\) would not have occurred without \(B\). So \(B\) is not the cause.

Some people tend to think that it’s rather the cause’s necessity for the consequence that forms a casual relation between them. This way \(A\) caused \(B\) if and only if:

\(A\) and \(B\) occurred.
We can assert: "If \(A\) had not occurred, than \(B\) would not have occurred" (we will refer to it as sine qua non, because, well, its what it is).

I guess it should look like: \(A \land B \land ( \neg A \rightarrow \neg B)\).

It is but a foundation of a longer talk I intend to give in the next post based on arguments of Lewis and maybe Mackie (if I do have time for his book).

Probabilistic approach

Reasoning about sine qua non is not easy as long as you abandon determinism. Non-determinism, however, goes in pair with probabilities. The general idea of this approach is that causes increase probabilities of consequences in a large variety of contexts:

\(P ( B / AZ ) > P (B / \neg AZ )\)

Here \(Z\) should take into account:

Common causes of \(A\) and \(B\).
Preemptive causes of \(B\).

This way preemption problem and common cause problem are addressed, and \(A\) and \(B\) become probabilistically independent.

The hard thing is to choose \(Z\) and a good definition of probability.

For example, take relative frequencies for probability. This way we should exclude from \(Z\) all causal consequences of \(B\) for which \(B\) is necessary. If we do not do it, we just get a wrong inequality \(1 > 1\) (check it!). However, look at it again: to calculate \(P\) we need exactly causal data for which we are building a theory! We have faced a vicious circle that is not easy to break.

Conclusion

As we see, there are loads of interesting subtleties when it comes to a closer study of causation. Numerous attempts were made to exile this relation completely from scientific thinking, but its roots are so strong it proved almost impossible to do. Next time I want to talk about Lewis’s very influential theory on causation, dated 1973, and then about his new causation theory of early 2000’s.

Proving type inequalities in Coq

Sun, 29 May 2016 00:00:00 +0200

Sometimes you just want to prove that nasty ~ T = U for some types T and U. Well, while in general it is not decidable (nor provable), sometimes there is a relatively easy way to do it, when T and U are not both infinite. In other words, either T or U, or both of them should be finite.

Simple example: bool is not unit

The idea is simple: if types are equal, there exists a bijection between their elements. Let us prove a simple lemma exists_bijection:

\[\forall A \ B: Set, A = B \rightarrow \exists f \exists g: \forall x:A \ \forall y : B, (f \ x = y -> g \ y = x )\]

Definition id {T} := fun x : T => x.
 
Lemma exists_bijection (A B: Set):
A = B -> exists f, exists g, forall (x:A) (y:B),
 f x = y -> g y = x.
Proof.
 intro H; subst. 
 exists id. exists id.
 intros. subst. 
 reflexivity.
Qed.

The proof was a piece of cake. Now to battle!

You remember, that unit is a type inhabited by only one element tt?

Inductive unit : Set := tt : unit.

Both f true and f false are evaluated to tt. There is not much choice here. However we know, that \(\forall x, (g \cdot f) \ x = x\) . By instantiating x with true and false we get \(g (f \ true) = true\) and \(g (f \ false) = false\). Since both \(f \ true\) and \(f\ false\) are equal, we deduce \(g\ tt =true\) and \(g \ tt = false\), contradiction.

Now the code:

Lemma unit_neq_bool: bool = unit -> False.
Proof.
intro Heq.
destruct (exists_bijection _ _ Heq) as [f [g Hfg]].
destruct (f true) eqn: Hft.
destruct (f false) eqn: Hff.
pose proof (Hfg _ _ Hft) as Hgtt.
pose proof ( Hfg _ _ Hff) as Hgtf.
 
rewrite Hgtf in Hgtt.
inversion Hgtt.
Qed.

So, the key idea is to enumerate possible bijections. As at least one of sets is finite, you will eventually enumerate all the candidates, and when the sets are of different size, you will get contradictions because you will run out of distinct functions trying to enumerate all bijections.

Example: nat is not bool

Let us apply the same principle to prove ~ nat = bool.

We are going to use a bit of semicolons because otherwise the proof will become very repetitive. Basically where you see an exclamation mark we got 8 goals which contexts include:

g x = 0
g y = 1
g z = 2

where x, y, z all range over \({ true, false}\). There will always be at least two of three boolean values equal to each other (which should follow from Dirichlet principle), which will feed us with nice contradiction to do a rewrite and inversion.

Lemma nat_neq_bool: nat = bool -> False.
intro Heq.
destruct (exists_bijection _ _ Heq) as [f [g Hfg]].
 
destruct (f 0) eqn: Hf0;
destruct (f 1) eqn: Hf1;
destruct (f 2) eqn: Hf2;
pose proof (Hfg _ _ Hf0) as Hg0;
pose proof (Hfg _ _ Hf1) as Hg1;
pose proof (Hfg _ _ Hf2) as Hg2.  (* ! *)
rewrite Hg2 in Hg1; inversion Hg1.
rewrite Hg1 in Hg0; inversion Hg0.
rewrite Hg0 in Hg2; inversion Hg2.
rewrite Hg1 in Hg2; inversion Hg2.
rewrite Hg1 in Hg2; inversion Hg2.
rewrite Hg2 in Hg0; inversion Hg0.
rewrite Hg0 in Hg1; inversion Hg1.
rewrite Hg1 in Hg0; inversion Hg0.
Qed.

An interesting question (the answer to which I do not know) is whether we could automate it inside Coq somehow without writing plugins for it.