🧠 How to Debug a Linux Device Driver (Step-by-Step): The Kernel-Truth Playbook
(When it fails, don’t “guess.” Walk the chain: DT → bind → probe → resources → interfaces → IO → runtime.)
Most driver bugs are not “logic issues.” They’re state mismatches across kernel subsystems (bus, DT, IRQ, MMIO, udev, power, clock, userspace).
Here’s the systematic debugging workflow I use for platform / I2C / SPI / USB / char drivers.
0️⃣ Start with the Non-Negotiables (ABI + binary sanity)
If you skip this, you’ll waste hours.
✅ Kernel version:
uname -a
✅ Module vermagic:
modinfo mydrv.ko | grep vermagic
✅ What the kernel says when you insert:
sudo insmod mydrv.ko
dmesg -T | tail -200
✅ If symbols fail:
dmesg -T | grep -i "Unknown symbol\|disagrees about version"
Kernel truth: if vermagic/ABI/symbol CRCs mismatch, it’s not “debugging” — it’s wrong build.
1️⃣ Confirm the Kernel Sees Your Device (Bus truth, not assumptions)
✅ Platform (Device Tree)
- Does node exist?
ls /proc/device-tree/
cat /proc/device-tree/<path>/compatible
- Does the kernel create the device?
ls /sys/bus/platform/devices/
✅ I2C
- Is the bus present?
ls /dev/i2c-*
i2cdetect -y 1
ls /sys/bus/i2c/devices/
✅ SPI
ls /dev/spidev*
ls /sys/bus/spi/devices/
✅ USB
lsusb
ls /sys/bus/usb/devices/
dmesg -T | tail -200
Checkpoint: If the device isn’t instantiated in /sys/bus/.../devices, your driver may never get a chance.
2️⃣ Confirm Binding: Device ↔ Driver Match (Why probe() didn’t run)
The binding chain must be true:
DT compatible / ID table → bus match → driver binds → probe()
✅ Identify what driver is bound:
readlink /sys/bus/platform/devices/<dev>/driver
# or i2c/spi/usb bus paths accordingly
✅ Check driver is registered:
ls /sys/bus/platform/drivers/
ls /sys/bus/i2c/drivers/
ls /sys/bus/spi/drivers/
✅ Force a bind/unbind (gold for debugging):
# unbind
echo <dev> | sudo tee /sys/bus/platform/drivers/<driver>/unbind
# bind
echo <dev> | sudo tee /sys/bus/platform/drivers/<driver>/bind
🔥 Deferred probe trap (very common on SoCs)
cat /sys/kernel/debug/devices_deferred 2>/dev/null
If you see your device here → clocks/regulators/pinctrl aren’t ready. That’s not failure; it’s dependency ordering.
3️⃣ Turn dmesg into a Timeline (not a dump)
✅ Filter only your driver:
dmesg -T | grep -i mydrv
✅ Track init steps + failures:
dmesg -T | egrep -i "mydrv|probe|remove|defer|error|fail"
Pro move: Put checkpoint logs at decisions, not everywhere:
- DT parsed
- MMIO mapped
- IRQ requested
- HW init done
- interfaces registered
4️⃣ Resource Debugging: The “Big Four” That Break Probes
If probe() runs but fails, 80% of the time it’s here:
(A) MMIO mapping / reg address wrong
✅ Validate mapping:
cat /proc/iomem | grep -i <device-or-base>
If devm_ioremap_resource() fails → DT reg is wrong or region busy.
(B) IRQ not firing / wrong trigger
✅ Check IRQ increments:
cat /proc/interrupts | grep -i <device-or-driver>
If stuck at 0:
- wrong IRQ number in DT
- wrong edge/level
- controller mismatch
- handler not registered / ack wrong
(C) GPIO ownership conflicts
cat /sys/kernel/debug/gpio
If your GPIO is already claimed → you’ll never drive the pin.
(D) Clocks / resets / regulators not enabled
cat /sys/kernel/debug/clk/clk_summary 2>/dev/null | head
(If your SoC depends on these, missing enables often equals “mystery failure”.)
5️⃣ “Loaded but /dev missing” = You’re mixing kernel truth with userspace
Remember this chain:
probe() → sysfs exists → uevent → udev → /dev node
So debug it in that order.
✅ Kernel-side check (sysfs truth):
ls -l /sys/class/<my_class>/
ls -l /sys/dev/char | grep <major>
✅ Userspace udev events:
udevadm monitor --kernel --udev
✅ Minimal rootfs reality: If there’s no udev (busybox/initramfs), /dev may not auto-populate. Then your device can exist in sysfs but not appear in /dev.
6️⃣ Stop printk spam: Use Dynamic Debug (runtime switch)
✅ Enable debug prints without recompiling:
echo 'file mydrv.c +p' | sudo tee /sys/kernel/debug/dynamic_debug/control
Disable:
echo 'file mydrv.c -p' | sudo tee /sys/kernel/debug/dynamic_debug/control
This is the cleanest way to debug probe sequencing and error paths.
7️⃣ Trace execution path with ftrace (kernel oscilloscope)
When you suspect “this function never runs” or “order is wrong”.
echo function | sudo tee /sys/kernel/debug/tracing/current_tracer
echo mydrv_* | sudo tee /sys/kernel/debug/tracing/set_ftrace_filter
sudo cat /sys/kernel/debug/tracing/trace_pipe
Or with trace-cmd:
sudo trace-cmd record -p function -l mydrv_*
sudo trace-cmd report
8️⃣ If it crashes: Convert “panic” into evidence (KASAN / LOCKDEP / stacks)
If you have control over kernel config:
- KASAN → use-after-free/out-of-bounds
- LOCKDEP → deadlocks/lock inversion
- KMEMLEAK → leaks
If it hangs:
Dump blocked tasks and stacks:
echo w | sudo tee /proc/sysrq-trigger
echo t | sudo tee /proc/sysrq-trigger
That turns a “freeze” into a call trace + culprit.
9️⃣ Runtime IO Debugging: Verify the actual data path
Once the device exists, verify the IO contract:
✅ Char device: open/read/write/IOCTL returns?
- Validate return codes and errno
- Confirm blocking vs non-blocking behavior
- Confirm IRQ-driven wakeups
✅ Bus-level transactions:
- I2C: check for NACKs / wrong address
- SPI: mode/CS/clock polarity issues
- UART: baud/clock mismatch, flow control
Kernel truth: “driver loaded” does not mean “transactions are correct.”
🔟 The Senior Debug Mindset
Beginners: “Why isn’t it working?” Seniors: “Which checkpoint in the chain broke?”
✅ Device created? ✅ Driver bound? ✅ probe() ran? ✅ Resources valid? ✅ Interfaces registered? ✅ Events emitted? ✅ Userspace node created? ✅ IO path correct under load?
Debugging becomes fast when you treat it like chain-of-custody.