Data driven programming in C++ 14

Data driven programming is a paradigm for quickly processing large amounts of data in large coherent memory allocations. This article gives a brief overview of the fundamental techniques that I often use in C++14.

Global functions

To allow reuse of functional code, data driven programming often keep functions outside of the structure definitions. Unlike object orientation, this allow using a type without including every single API that uses the type. You do not have to clutter the global namespace to use global functions in C++, you can put functions into the library's own namespace.

Basic formats

Because the raw data is not encapsulated, the data structure is like a railroad that has to be built right from the start.

Packed

The simplest form of data driven programming is to allocate an array of structs, which is called a packed format. Packing is used to minimize waste in cache reads when you want to know everything about something, such as all color channels of a pixel.

Planar

When some of the data is not used together but neighbors are often accessed at the same time, you might want to allocate that as a separate array, which is called a planar format. Planar formats are convenient for writing reusable code that only takes what it needs as input.

Semi-planar

As a whole data structure, mixing packed and planar data becomes semi-planar, which gets the benefits of both. One array can contain things that are often used together packed together, while something else is stored separatelly.

Memory alignment

When a computer reads unaligned memory, it must read multiple memory sections from different block RAMs and reorder them using a hardware pipeline, which adds latency to your memory reads. On some systems, unaligned memory access is not even allowed. For a pointer to be aligned with N bytes, the address must be evenly divisible by N. When doing SIMD operations, always align the memory with the SIMD vector's length.

Memory access

Predictable access patterns

One of the most important thing that makes data driven code fast is how memory is accessed. The more linearly and predictably you read and write data, the less cache misses the CPU will stumble on. The more pointers you use, the slower memory reads become, because it needs to read the pointer from memory before it knows where to go next, only to stall with a cache miss when it goes somewhere random in the memory space. Use values read in sequential order instead of pointers to random places.

Cache lines

Cache lines are the blocks of data that are loaded from memory at the same time. Different threads should not write to the same cache line at the same time, to prevent a race condition called false sharing, where both read and modify the same cache line before both write their modified version. The more irrelevant data you get when fetching a cache line into the cache, the more you pollute and trash the data cache so that there is not enough space for the data actually used.

Inlining

If a function is pre-declared in the header but defined in another *.cpp file, the compilation phase of building can not inline calls to the function by not knowingvthe content of the called function, which may prevent many other optimizations. If a small function is declared in the header, it can be inlined by the compiler. Using the inline keyword before the function definition can hint to the compiler that you want it to be inlined.

Templates

A powerful way to generate fast code in C++ is to use template functions and constrain the possible types using traits. Unlike a virtual function in a virtual class that has to be looked up at runtime, templates can call overloaded inline functions doing the same thing but with different types and no runtime overhead for the calls.

Overloading operators

The main benefit of using C++ instead of C, is the ability to use math operations for your own functions. For SIMD abstraction layers, this allow using an infix plus to add two vectors and have it compiled into different intrinsic functions for different target systems and vector lengths.

SIMD vectorization

SIMD intrinsics can be used to process multiple elements at the same time. This can be combined with multi-threading for even more data parallelism. Planar formats are the easiest to work with for SIMD operations, because then each SIMD vector is filled with elements of the same type.

Multi-threading

Data-driven programming makes it very easy to perform tasks using multiple threads without getting corrupted memory. Simply segment a target region along cache lines and let each thread write to its own memory location, which can be done with reusable template functions.

Safety abstractions

You do not have to sacrifice safety to get good performance, because you can always disable the safety checks in release mode once billions of random test cases confirms that the end user is not going to see any memory corruption.

Bound checked pointers

If you want to iterate over elements using a pointer, you can create a template type for pointers that only contains additional information for tight bound checks in debug mode.

Buffers

If you want a generic memory allocation, you can create a buffer type holding both a pointer and the size. Then you can ask it for a bound checked pointer that knows the buffer's size in debug mode.

Separating file access from encoding

Let each file format define how to parse content from a memory buffer and serialize content into a buffer. Then use generic functions for saving and loading buffers independent of format. This can make saving to a file much faster by telling the file system how much memory is needed in advance with one big chunk of bytes. It also makes it possible to embed all file formats into archives, and reduces duplicated code for file access scattered everywhere. Code interfacing directly to the operating system should be kept to a minimum to make porting to new systems easier.

Common optimization myths

My code is fast enough, I don't need to optimize.

When code is optimized to become 1000 times as fast, you can do things that were previously not possible. Optimization is a feature enabler by taking on bigger problems.

Optimization will make my code buggy and hard to read

Only if you are a beginner at optimization that have not yet mastered zero overhead abstractions. Once experienced, optimizing and cleaning up becomes the same thing. Most of the performance bottlenecks are also things that cause crashes, such as complex pointer structures, branching into special cases and dividing numbers.

Algorithm people don't need to know optimization

90% of optimization is in the choice of algorithm based on available memory bandwidth and hardware instructions. Optimization must be kept in mind at all time to avoid having to throw everything away when someone else has to optimize your algorithm. The best algorithms are written when the algorithm and optimization skills are present at the same time during early brainstorming.

Optimizing for speed will sacrifice precision in my algorithm

Lossy optimization is about maximizing quality per execution time and equalizing quality evenly among notable artifacts. Optimization usually improves quality and shortens execution time by affording more math operations where they matter the most with higher resolutions and more iterations.

Interesting read! I take it you're not a fan of linked lists? ☺️

Like
Reply

To view or add a comment, sign in

More articles by David Piuva

  • Can you read my emotions?

    Can you read my emotions?

    Good intentions are not enough to avoid discriminating applicants during work interviews, therefore you need to know…

Explore topics