The Four Conditions of Deadlock
A deadlock is a system state where two or more threads are permanently blocked, each waiting for a resource held by the other. For a deadlock to occur, four distinct conditions must be met simultaneously, known as the Coffman conditions:
- Mutual Exclusion: At least one resource must be held in a non-sharable mode. Only one thread can use the resource at a time (e.g., a standard Mutex).
- Hold and Wait: A thread is currently holding at least one resource and is requesting additional resources that are currently held by other threads.
- No Preemption: Resources cannot be forcibly taken away from a thread; a resource can only be released voluntarily by the thread holding it.
- Circular Wait: There must exist a circular chain of two or more threads, where each thread is waiting for a resource held by the next member in the chain (e.g., Thread A waits for Lock 2 held by Thread B, while Thread B waits for Lock 1 held by Thread A).
Deadlock Prevention Strategies
To prevent deadlocks entirely, you must architect your system to guarantee that at least one of the four Coffman conditions cannot occur.
- Breaking Mutual Exclusion: Often impossible for hardware or specific data structures, but using atomic variables or lock-free data structures avoids mutexes entirely.
- Breaking Hold and Wait: Require a thread to request all required resources at once before execution. If it cannot acquire all of them, it drops the ones it holds and tries again. (Highly inefficient).
- Breaking No Preemption: If a thread holding a lock requests another lock and fails, it must release its current lock. (Often implemented using Timed Mutexes and retry loops).
The Most Practical Strategy: Lock Ordering
The most effective and common strategy in AOSP to prevent the Circular Wait condition is strict Lock Ordering.
If your system requires acquiring multiple locks, you define a global, strictly enforced hierarchy (order) for acquiring them.
Rule: If a thread needs Lock A and Lock B, it must always acquire Lock A before Lock B.
If every thread in the system adheres to this rule, a circular wait is mathematically impossible. The challenge is enforcing this rule in massive, highly-coupled codebases like system_server.
Detecting Deadlocks via Thread Dumps
When an Android device freezes, it is often due to a deadlock in a critical system service (like ActivityManagerService or WindowManagerService). The watchdog process will eventually detect this unresponsiveness and trigger an ANR (Application Not Responding) or a system crash, generating a traces.txt file.
Analyzing this thread dump is the primary way to diagnose deadlocks in the wild.
// Snippet from a hypothetical traces.txt
"ActivityManager" prio=5 tid=12 Blocked
- waiting to lock <0x0a1b2c3d> (a com.android.server.wm.WindowManagerGlobalLock) held by thread 24
at com.android.server.am.ActivityManagerService.updateOomAdjLocked(ActivityManagerService.java:18500)
- locked <0x09876543> (a com.android.server.am.ActivityManagerService)
"WindowManager" prio=5 tid=24 Blocked
- waiting to lock <0x09876543> (a com.android.server.am.ActivityManagerService) held by thread 12
at com.android.server.wm.ActivityRecord.setAppVisibility(ActivityRecord.java:5400)
- locked <0x0a1b2c3d> (a com.android.server.wm.WindowManagerGlobalLock)
In this trace, the circular dependency is explicitly visible. The ActivityManager thread holds the AMS lock and is waiting for the WMS lock. Simultaneously, the WindowManager thread holds the WMS lock and is waiting for the AMS lock. Identifying this exact trace allows engineers to refactor the specific methods involved to respect the established lock hierarchy.