Mark-Sweep GC - Advanced AOSP Subsystems

Overview

This lesson explores the traditional Mark-Sweep Garbage Collection (GC) algorithm. While modern Android relies heavily on Concurrent Copying (CC) GC, understanding Mark-Sweep is fundamental because it was the primary GC in Dalvik and early ART, and its principles still apply to certain memory spaces today.

The Mark-Sweep Algorithm

Mark-Sweep is a tracing garbage collector. It determines which objects are "dead" (unreachable) by tracing all "live" (reachable) objects and then reclaiming the memory occupied by everything else.

The process operates in two distinct phases: Mark and Sweep.

1. The Mark Phase

The goal of the Mark phase is to identify all objects currently in use by the application.

GC Roots: The process begins at well-known, always-live locations called GC Roots. These include:
- Local variables on thread stacks
- Active Java threads
- Static variables in loaded classes
- JNI global references
Tracing: Starting from the GC Roots, the collector traverses the object graph. It follows every reference from one object to another.
Marking: As the collector visits an object, it sets a "mark bit" (often kept in a separate bitmap to avoid modifying the object itself) to indicate the object is live.

2. The Sweep Phase

Once the Mark phase completes, the heap contains a mix of marked (live) and unmarked (dead) objects.

During the Sweep phase, the collector linearly scans the entire memory space. If it finds an object without a mark bit, it considers the object dead and reclaims the memory, making it available for future allocations. Finally, it clears the mark bits of the live objects in preparation for the next GC cycle.

GC Pause Sources (Stop-The-World)

The biggest drawback of a naive Mark-Sweep algorithm is that it must pause application execution. If the application continues to run and modifies object references while the GC is tracing, the GC might miss a live object (causing a crash) or mark a dead object as live (causing a memory leak).

This pause is known as a Stop-The-World (STW) pause. In traditional Mark-Sweep, STW pauses occur because:

Root Scanning: Finding all the roots requires suspending all threads.
Concurrent Modifications: If the mark phase is done concurrently, a final STW pause is required to re-mark objects that were modified during the concurrent phase.

Long STW pauses lead to dropped frames, audio stutter, and a generally poor user experience ("jank").

Mark-Sweep in ART

In the Dalvik era, Mark-Sweep was heavily intrusive. In early ART versions, the implementation was vastly improved into a Concurrent Mark-Sweep (CMS) collector.

ART's CMS minimized STW pauses by performing most of the marking concurrently while the application ran. It achieved this using a write barrier (specifically, a card table). Whenever the application modified an object reference, the card table flagged the memory region as "dirty."

During the final STW pause, the GC only needed to re-scan the dirty cards rather than the entire heap, significantly reducing the pause time.

// AOSP reference: art/runtime/gc/collector/mark_sweep.cc
// Simplified marking loop
void MarkSweep::MarkReachableObjects() {
  // 1. Mark roots
  MarkRoots();
  // 2. Process the mark stack (tracing)
  ProcessMarkStack();
}

While CMS was an improvement, Mark-Sweep inherently suffers from heap fragmentation. Because it reclaims memory in place without moving objects, the free space becomes scattered into small blocks. If an application tries to allocate a large array, the allocation might fail even if the total free memory is sufficient, triggering an expensive compacting GC. This limitation paved the way for the Concurrent Copying GC.