A Kernel Panic is the nuclear option. But what happens if the kernel encounters a severe bug, but it detects that the bug is isolated to a specific thread or a non-critical driver?
In this case, it triggers a Kernel Oops.
Oops vs Panic
An "Oops" is a severe kernel bug (like dereferencing a null pointer inside the Audio driver).
When an Oops occurs:
- The kernel prints a massive crash log to the console (the "Oops message").
- The kernel forcefully kills the specific thread that caused the error using
do_exit(). - The system continues running.
However, continuing to run after an Oops is incredibly dangerous. The kernel's internal state is now technically corrupted. If the audio driver was holding a critical system lock when it was killed, the next time another driver tries to grab that lock, the entire system will deadlock.
Because of this danger, Android engineers usually configure production devices with a special flag: panic_on_oops. This forces the kernel to instantly upgrade every single Oops into a full-blown Kernel Panic, guaranteeing a clean reboot rather than unpredictable system instability.
You can verify if a device is set to panic on an oops:
adb shell cat /proc/sys/kernel/panic_on_oops
(A value of 1 means it will panic).
Reading Oops Output
When an Oops (or Panic) occurs, the kernel dumps a very specific formatted text block. Understanding this block is critical for a platform engineer.
The output contains:
- The Bug Type: E.g.,
Unable to handle kernel NULL pointer dereference. - The Offending Code: The exact name of the function that crashed.
- CPU Registers: The exact hexadecimal values held in the CPU registers (R1, R2, LR, PC) at the microsecond of the crash.
- The Call Stack: A trace showing exactly which functions called which functions to get to the crash site.
// Example snippet of a Kernel Oops log
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
PC is at my_buggy_driver_write+0x14/0x30 [buggy_module]
LR is at vfs_write+0xbc/0x184
...
Decoding with addr2line
The Call Stack in an Oops message is often just a list of raw hexadecimal memory addresses (like [<c01053a4>]).
To figure out what line of C code caused the crash, engineers use a tool called addr2line (included in the Android NDK).
By passing the raw kernel binary with debug symbols (vmlinux) and the hexadecimal address to addr2line, the tool will parse the debug symbols and output the exact filename and line number, making it incredibly easy to fix the bug.
# How to decode a memory address into a line of C code
aarch64-linux-android-addr2line -e vmlinux c01053a4
# Output: /drivers/audio/buggy_module.c:452