Linux Kernel's List Head Structure: Zero Allocation Growth and Decoupled Logic

This title was summarized by AI from the post below.

In the Linux Kernel, we don't put data inside lists. We put lists inside data. Most CS101 textbooks teach linked lists as boxes that contain a payload. The Linux Kernel flips this concept inside out with the list_head structure. Defined in include/linux/list.h, this "intrusive" design means we embed the list pointers directly inside our custom data structures. It’s an elegant solution that allows a single object to live on multiple lists simultaneously without extra memory allocations. Why it's the "Senior" choice for architecture: ✅ Zero Allocation Growth: Adding a node to a list is O(1) and requires no kmalloc. ✅ Decoupled Logic: One list library manages everything from processes to files. ✅ Seamless Retrieval: Remember yesterday's container_of()? The kernel’s list_entry() macro is just a wrapper around it! Quick Quiz for the Kernel Experts: When traversing a list and deleting elements, why MUST you use list_for_each_entry_safe() instead of the standard version? Let’s hear your take on the mechanics of pointer corruption in the comments! 👇 #LinuxKernel #SystemsEngineering #EmbeddedSystems #CProgramming #DataStructures #TechInsights #StaffEngineer

To view or add a comment, sign in

More Relevant Posts

Ivan Gaydardzhiev
2w Edited
Report this post
Most of us have called printf a thousand times without asking what allows it to exist. Not how it works internally. What has to be true before it can work at all. There is a layer beneath the C standard library that almost no one looks at deliberately. The runtime. The entry point. The BSS clear. The heap base. The negotiation with the kernel that happens before main is ever called. It isn't hidden, exactly. It just isn't where attention tends to go. I went there anyway. The Precondition is the book I wrote for the moment when the abstraction stops being enough. It rebuilds a C runtime library for ARMv8-L 32-bit Linux from scratch, in assembly and C, exposing the exact conditions that allow a program to come into existence at all. It isn't a reference. It's closer to an argument: that the foundations of a C program are not mechanical background noise, but something worth thinking about directly. That understanding what the processor obeys, what the kernel provides, and what abstractions quietly conceal is not a niche interest but a different way of seeing. https://lnkd.in/eD5cePy6

The Precondition: How a C Program Comes to Exist on ARMv8-L 32-Bit Linux amazon.com
Like Comment
To view or add a comment, sign in
Ashutosh Pandey
6d
Report this post
I am in process of learning Freeswitch memory model, it is so fascinating to know that it internally uses Apache Portable Runtime for following - - Memory pools — hierarchical memory allocation with automatic cleanup - Threading primitives — threads, mutexes, condition variables, read-write locks, all cross-platform - Data structures — hash tables, queues, arrays, linked lists - Filesystem and network I/O — portable file handling, socket I/O - Process management — fork/exec abstractions, pipe handling - Time and timers — portable time representation, sleep/timeout - Dynamic loading — dlopen-equivalent that works on Windows and Unix - String utilities — apr_pstrdup, apr_pstrcat, etc., that interact with pools Source code - APR Pool is seriously good stuff to look over - https://lnkd.in/gWN4RQNc
Like Comment
To view or add a comment, sign in
Bhanu Teja
6d
Report this post
$ ls /proc/$(pidof app)/fd | wc -l 17428 File descriptor leak. Found in 3 seconds, no APM needed. Day 6 / 7. Series A: Linux Process & Memory. The Problem: /proc is the kernel telling you the truth about a running process. No instrumentation, no restart, no agent. Just files that read live state. Most teams reach for dashboards first. The dashboard is downstream. /proc is the source. What each path answers: $ cat /proc/PID/status # threads, RSS, voluntary ctx switches $ cat /proc/PID/limits # actual rlimits inside the container $ cat /proc/PID/maps # mmap regions, shared libs, JIT pages $ ls /proc/PID/fd # every open socket, file, pipe $ cat /proc/PID/stack # kernel stack: where the thread is blocked $ cat /proc/PID/cgroup # exact cgroup path, which limits apply The fd one solves leaks fastest. Symlinks point at the actual target: $ ls -l /proc/PID/fd | awk '{print $NF}' | sort | uniq -c | sort -rn 10k entries to one log file = your logger forgot to close. 8k sockets to one IP = connection pool misconfigured. The Fix: Pin /proc reading into your runbook before the dashboard query. For OOM, read /proc/PID/status RSS plus /proc/PID/smaps_rollup Pss. For hangs, read /proc/PID/stack. For descriptor leaks, count fd entries. Takeaway: /proc is free, always on, accurate. Most prod debugging needs nothing else. Just joined? No problem. Every day of the series, past and upcoming, lives in one repo: https://lnkd.in/g9ZeRA-f 800+ engineers read DecodeOps for production deep dives. Free K8s ebook (200+ pages, 30 scenarios): decodeops.substack.com Best /proc trick in your toolkit? Drop it below. #Linux #DevOps #Kubernetes #SRE
Like Comment
To view or add a comment, sign in
Anushka Badhe
2w
Report this post
Before writing a single line of my own allocator, I wanted to actually watch malloc work. (ﾉ´ヮ´)ﾉ*:･ﾟ✧ So I spent time observing glibc's malloc, using strace, sbrk(), and /proc maps before designing anything myself. A few things that genuinely surprised me (⊙_⊙): ☆ Asking for 1000 bytes gave me 132KB. glibc buys memory in wholesale and stores it internally so it doesn't have to make expensive syscalls every time. ☆ Three consecutive malloc() calls didn't trigger a single extra brk() — all three fit inside the arena that was already set up. ( •̀ ω •́ )✧ ☆ glibc runs code before main() even starts. The kernel is unaware of any of the internal structures glibc imposes on the heap — chunks, bins, top chunk. To the kernel, it's just bytes. (˶ᵔ ᵕ ᵔ˶) All of this is Phase 0 of my mini-malloc series, where I'm building a memory allocator from scratch, referencing the official glibc source throughout. (˶ˆᗜˆ˵) Link in the comments! 🔗 🔗 ⋆˙⟡ ⋆.˚ ⊹₊⟡ ⋆⋆˙⟡ ⋆.˚ ⊹₊⟡ ⋆ #SystemsProgramming #C #Linux #MemoryManagement #LowLevel #DevJournal
48 Comments
Like Comment
To view or add a comment, sign in
Wilson Bilkovich
2w Edited
Report this post
Note the ease with which the "computering machine" will do hundreds of times more work than you asked for. You always learn something from real measurements.
Anushka Badhe

Systems Programming Enthusiast | Linux Kernel & Low-Level Development | C/C++ | Btech IT
2w

Before writing a single line of my own allocator, I wanted to actually watch malloc work. (ﾉ´ヮ´)ﾉ*:･ﾟ✧ So I spent time observing glibc's malloc, using strace, sbrk(), and /proc maps before designing anything myself. A few things that genuinely surprised me (⊙_⊙): ☆ Asking for 1000 bytes gave me 132KB. glibc buys memory in wholesale and stores it internally so it doesn't have to make expensive syscalls every time. ☆ Three consecutive malloc() calls didn't trigger a single extra brk() — all three fit inside the arena that was already set up. ( •̀ ω •́ )✧ ☆ glibc runs code before main() even starts. The kernel is unaware of any of the internal structures glibc imposes on the heap — chunks, bins, top chunk. To the kernel, it's just bytes. (˶ᵔ ᵕ ᵔ˶) All of this is Phase 0 of my mini-malloc series, where I'm building a memory allocator from scratch, referencing the official glibc source throughout. (˶ˆᗜˆ˵) Link in the comments! 🔗 🔗 ⋆˙⟡ ⋆.˚ ⊹₊⟡ ⋆⋆˙⟡ ⋆.˚ ⊹₊⟡ ⋆ #SystemsProgramming #C #Linux #MemoryManagement #LowLevel #DevJournal
Like Comment
To view or add a comment, sign in
WotAI

29 followers
1mo
Report this post
Three details from Claude Code 2.1.117–2.1.121 that most coverage will gloss over: 1. The Opus 4.7 context bug wasn't just cosmetic. Auto-compact fires when /context percentage hits a threshold. Computing percentage against 200K instead of 1M means Claude was summarizing your session 5x too aggressively, losing detail you didn't need to lose. If you've felt sessions "forget" things faster than they used to, this is why. 2. Native builds on macOS and Linux replaced the Glob and Grep tools with embedded bfs and ugrep, available through the Bash tool. The win is the eliminated tool round-trip – Claude doesn't context-switch to a separate search tool for what's effectively a `find` or `grep`. 3. The find file-descriptor crash on large directory trees was a host-wide stability bug, not a Claude Code-specific bug. macOS and Linux native builds were exhausting the user's open-file limit during recursive search, taking down other processes. 2.1.121 reduced peak FD usage during find significantly. Full breakdown:

Claude Code 2.1.121: Opus 4.7 finally uses its full 1M window wotai.co
Like Comment
To view or add a comment, sign in
David B. Hon
3w
Report this post
> All scripts pass syntax checking. Here's a summary of everything that was built: gemOS 1.0.0-rt is now fully scaffolded. Here's what was created or rewritten: New scripts: - scripts/rt-tune.sh — --apply sets CPU governor to performance, removes RT bandwidth throttling (sched_rt_runtime_us=-1), migrates IRQs off isolcpus, disables THP and NMI watchdog, writes PAM limits. --revert and --status modes included. - scripts/build-rt-kernel.sh — auto-queries kernel.org for the latest stable version, downloads the matching PREEMPT_RT patch, applies it, configures with PREEMPT_RT=y + NO_HZ_FULL + RCU_NOCB_CPU, builds and installs. Supports localmodconfig for fast builds and a CUSTOM_CONFIG override. New configs: - config/rtkit.conf — grants RT scheduling (up to priority 99) to audio and daq groups via rtkit-daemon. - config/limits-rt.conf — PAM limits for the daq group: unlimited memlock, rtprio=99, nice=-20. Updated manifests: - core-noble.txt — adds RT kernel packages (linux-image-lowlatency, rtkit, rt-tests, tuna, schedtool), full COMEDI/libcomedi stack, PyVISA, sigrok/PulseView, serial/I2C/CAN bus tools, Octave signal/control/statistics, PETSc/SLEPc. - extended-noble.txt — adds LinuxCNC, EPICS, GNU Radio + SDR (rtl-sdr, HackRF), SageMath, Julia, ParaView, PyVista, InfluxDB client, R signal/time-series packages. - python-requirements.txt — full DAQ/instrument stack: PyVISA, pyserial, pyusb, nidaqmx, PyDAQmx, zhinst, QCoDeS, pyepics, caproto, obspy, MNE, nptdms, pyqtgraph. - r-packages.R — adds signal, pracma, zoo, xts, forecast, tseries, changepoint, spectral, psd. Updated installer (install-gemos-stack.sh): - New --rt-kernel flag installs the RT kernel, deploys rtkit/limits configs, and creates the daq group. - --all now means: RT kernel + core + extended. - --pip and --r flags for Python/R post-install. ▸ Credits: 2.47 • Time: 7m 33s

The Linux Kernel Archives kernel.org
Like Comment
To view or add a comment, sign in
martinuke0
2w
Report this post
Hi! Building Userland Mutexes with the Futex System Call > TL;DR — Futexes let user‑space code coordinate threads with minimal kernel involvement; by using `FUTEX_WAIT` and `FUTEX_WAKE` you can build a fast, fair mutex that outperforms pthread mutexes in many scenarios. In modern multi‑core applications the cost of a lock is often the difference between scaling smoothly and stalling at the first contention point. While the POSIX `pthread_mutex_t` abstraction is convenient, it hides a costly kernel round‑trip whenever a thread must block. Linux’s fast userspace mutex (futex) primitive was introduced precisely to keep the common uncontended case entirely in user space and only fall back to the kernel when absolutely necessary. This article walks through the reasoning behind futexes, explains the low‑level API, and shows how to assemble a production‑grade userland mutex from scratch. The code is written in C, but the concepts translate directly to Rust, Go, or any language that can invoke the `futex` syscall. 2. Slow path – if the lock is already held, invoke a kernel primitive (e.g., `pthread_mutex_lock`) that puts the thread to sleep. Read the full guide: https://lnkd.in/dGp6ywRK #systemsprogramming #concurrency #linux #futex #mutex

Building Userland Mutexes with the Futex System Call martinuke0.github.io
Like Comment
To view or add a comment, sign in
KOTESWARA RAO BUGGALA
3w
Report this post
💀 “Process created = memory allocated” Yeah… and I’m the CEO of a kernel. Let’s fix that myth 👇 You think the OS just hands over a big chunk of RAM and says “Go ahead, enjoy”? Nope. It’s way smarter (and a little savage). --- 🔹 First move: Copy-on-Write OS be like: “Why copy memory? Share it… until someone touches it.” Lazy? No. Efficient. --- 🔹 Then comes: Virtual Memory Every process thinks: “I own all this memory.” Reality: You’re living in an illusion carefully managed by the OS. --- 🔹 Memory layout isn’t random either: • Text → your code • Data/BSS → globals • Heap → your "malloc()" experiments • Stack → where bugs love to crash the party --- 🔹 Final twist: Demand Paging OS: “I’ll give you memory… when you actually use it.” Until then? It’s just a promise. --- 🔥 Reality check: A process doesn’t get memory. It gets a well-controlled illusion of memory. And the OS is the one pulling all the strings. --- #Linux #OperatingSystems #EmbeddedLinux #CProgramming #TechTruths #Kernel
Like Comment
To view or add a comment, sign in
Abhishek Raj
2w
Report this post
Recently, I spent some time building a complete mini HPC cluster from scratch using: • Rocky Linux • Slurm • OpenMPI • LDAP • NFS • Spack Working in HPC and parallel programming, especially around optimization and scientific workloads on the PARAM series of supercomputers, I’ve always worked *with* cluster environments. But this time I wanted to understand how everything actually comes together behind the scenes. As programmers, we spend a lot of time optimizing applications, tuning MPI workloads, and improving scalability — but understanding how the infrastructure itself works gives a completely different perspective on performance and system behavior. So I decided to build the whole environment myself and document the entire process along the way. This project helped me understand a lot of things much more deeply: • scheduler behavior • distributed execution • shared storage • centralized authentication • MPI runtime environments • software stack management • cluster networking • real troubleshooting and debugging I also documented everything as a practical hands-on guide for anyone interested in learning HPC infrastructure, Linux clusters, and parallel computing from the ground up. Future updates will include: • advanced Slurm configurations • monitoring & observability • profiling & optimization • parallel programming sections • GPU integration • more production-style HPC concepts Project Repository: https://lnkd.in/gbWXUyw7 Read Documentation Online: https://lnkd.in/g6RAVumc If you find any issue in the documentation or setup process, feel free to open an issue on GitHub. #HPC #Linux #ParallelProgramming #MPI #Slurm #OpenLDAP #Spack #DistributedSystems #HighPerformanceComputing #MPI

5 Comments
Like Comment
To view or add a comment, sign in

644 followers

10 Posts

View Profile Follow

Linux Kernel's List Head Structure: Zero Allocation Growth and Decoupled Logic

More Relevant Posts

Explore content categories