Coroutines In C

It is virtually a rite of passage for C programmers to realize that they can write their own cooperative multitasking system. C is low-level enough, and there are several ways to approach the problem, so, like Jedi light sabers, each one is a little bit different. [Christoph Wolcher] took his turn, and not only is his system an elegant hack, if that’s not an oxymoron, it is also extremely well documented.

Before you dig in, be warned. [Christoph] fully admits that you should use an RTOS. Or Rust. Besides, after he finished, he discovered the protothreads library, which does a similar task in a different way that is both more cool and more terrible all at the same time.

Once you dig in, though, you’ll see the system relies on state machines. Just to prove the point, he writes a basic implementation, which is fine, but hard to parse and modify. Then he shows a simple implementation using FreeRTOS, which is fine except for, you know, needing FreeRTOS.

Using a simple set of macros, it is possible to get something very similar to the RTOS version that runs independently, like the original version. Most of the long code snippets show you what code the macros generate. The real code is short and to the point.

Multiprocessing is a big topic. You can have processes, threads, fibers, and coroutines. Each has its pros and cons, and each has its place in your toolbox.

20 thoughts on “Coroutines In C

  1. You’re not kidding about it being a rite of passage. This brought back memories.

    Back in grad school I ran into the problem for the first time because of the limitations of Arduino. On bare metal microcontrollers I was already in the habit of breaking the application into timer ISRs but I didn’t have a clean way of doing that on Arduino. What I ended up doing was restructuring all the logic to finish fast and return, and then in the main Loop I had some logic to control how often the different functions would be called to iterate. Dirty, hacky, inefficient, but I had a working robot in time for an open house event the next day.

  2. First, I don’t agree he should have used an RTOS. An RTOS can be used for simply making code more readable, but on a small uC, some simple loops or statemachines, combined with some ISR’s for the time critical stuff is usually the best method. Also note that an RTOS does a lot of ISR enabling / disabling during the task switching, and this makes the other ISR’s less predictable. In general, If you are using an 8-bitter with a few kB of memory, then an RTOS is probably overkill. If you’re using a 32-bitter (ARM-Cortex XMega’s and such) which have several KB of RAM in addition to >= 32kB of Flash, then you’re starting to get into the area where an RTOS can be beneficial.

    In his comments he also writes:
    “This whole setup is an unholy alliance of C macros, state machines, and sheer willpower. It’s clever, it’s educational”

    I have a big dislike for macro’s and for “clever tricks”, but I do like C++. In his example of a blinking led. I would write a little class that has:
    1). Function pointer.
    2). Two functions: led_on() and led_off().

    And then put a simple while() loop in main(), that repeatedly calls the function to which the function pointer points. You do have to think a bit of how you structure the functions of your state machine, but that is a good thing, because it forces you to think about program structure. State machines based on a function pointer also have very little overhead, and it’s easy to see the overall structure of the program in the while() loop in main(). It’s also completely free of macro trickery.

    1. C++ on uC’s with virtual function indirection and vf tables can be a huge waste of memory and CPU cycles for some applications. Instead it’s possible to implement a simple C, static compile time OO framework rather than a dynamic runtime OO framework like C++ and remove the indirection and virtual function tables. This can be done with templates, and I’ve been on several projects that use this approach, including Intel SSDs.

      1. You are partly right. Yes it increases code size. But as long as you are not on a tight CPU cycle/RAM/ROM budget it usually doesn’t matter.

        Maintenance and (unit) testability score higher on the “important” list than code size and code speed.

        Our CPU doesn’t do anything for 50% of the time. Better to waste some cycles to increase testability instead.

      1. The next step is using setjmp/longjmp to implement your own green threads. Wrapping all the C library blocking routines to yield. No assembler.
        Then later you will want a malloc arena allocator for your threads….

    1. As a bare metal embedded developer in C, I came to regard the main process loop (an infinite for loop) as being analogous to an operating system of sorts. This only contains calls to the various process handler functions that the system needs to carry out all of its tasks. There is no preemption (apart from a few ISRs). The trick is, to write all of the process handlers, as well as their helper functions, to only do work on what’s immediately available, and return as soon as there’s nothing to do. They should never wait for anything.

      Writing code this way, is dependent on writing efficient state machines, never using wait functions, but instead relying on timers that can be started, and then testing for completion elsewhere within the state machine typically several process loop iterations later. However, coding this way can be quite challenging for some programmers…

      The thing is, this complexity has to exist somewhere within the software stack. It’s either within the operating system itself eg through use of a preemptive task scheduler, use of a high-level eg interpreted language with features making the programmers life easier, OR by writing efficient process handling (as above). The main benefit of the latter, is that by not depending on so much generalised code, the system is more efficient. Useful in an embedded solution, but less so for a general purpose operating system.

  3. Something I miss about the 80’s is that you would come up with an idea like this and implement it in a couple projects and discuss it with your friends, and you didn’t find out under three months or three years later that someone else did it (and possibly/probably better). I think spending your time in that sort of low-level playground is essential for building good engineers, in a way that wiring together libraries doesn’t accomplish.

    1. Some might say the same about fussing with crystal point contacts over wiring together IC’s for building good electrical engineers…

      on a serious note, it would be great if beginner IDE’s like Arduino put on display the inner workings of libraries in some way, so that learners could drill down into what makes the microprocessor tick, while still getting difficult things done quickly.

      maybe even down to assembly… (https://godbolt.org/)

  4. “It is virtually a rite of passage for C programmers” . Must be… I remember writing one as well. But I seem to recall at the core was a small assembly code module that did the actual switch to the next task to run. This was back in the x86 days. Never did for need it for 68xxx as we used Ready Systems VRTX as the preemptive multitasking core.

  5. When I hear “coroutines in C” it reminds me of “coroutine.h” which I wrote ages ago (based on something I found on whatever passed for the Internet at the time), which wraps the function body in a switch statement and defines a “yield” operation which IIRC does something along the lines of “state = LINE; return _VAL; case LINE:”. You can embed this INSIDE of a block and the (hidden) outer switch will take you back inside of it. It’s kinda weird and surprisingly is actually part of the conformance tests so compilers are required to support it.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.