Advanced AOSP Subsystems
4 min read

DEX Format

Overview

This lesson provides a deep dive into the DEX Format, the foundational bytecode format used by the Android Runtime (ART) and its predecessor, Dalvik. Understanding the DEX format is critical for reverse engineering, performance optimization, and deep Android subsystem analysis.

DEX File Structure

A .dex file is a highly optimized executable format designed specifically for the memory and storage constraints of mobile devices. Unlike JVM .class files, which store a single class per file, a .dex file aggregates multiple classes into a single file, eliminating redundancy by sharing constants, strings, and type signatures across all classes.

The Header

Every DEX file starts with a header_item, which contains metadata about the file, including magic numbers, checksums, and offsets to other data structures.

// Simplified representation of the DEX Header in C++ (AOSP: art/libdexfile/dex/dex_file_structs.h)
struct Header {
  uint8_t magic_[8];           // Magic number and version, e.g., "dex\n039\0"
  uint32_t checksum_;          // adler32 checksum of the rest of the file
  uint8_t signature_[20];      // SHA-1 signature of the rest of the file
  uint32_t file_size_;         // File size in bytes
  uint32_t header_size_;       // Header size in bytes
  uint32_t endian_tag_;        // Endianness indicator (0x12345678)
  uint32_t link_size_;         // Size of link section
  uint32_t link_off_;          // Offset of link section
  uint32_t map_off_;           // Offset to the map item
  uint32_t string_ids_size_;   // Number of strings in the string ID list
  uint32_t string_ids_off_;    // Offset to the string ID list
  // ... (offsets and sizes for type_ids, proto_ids, field_ids, method_ids, class_defs, data)
};

String Pool and Shared Data

To conserve space, DEX files use a string pool. Instead of embedding strings directly in the code, the DEX format stores each string exactly once in the string_ids section. Instructions that need a string simply reference its index in this pool.

Similar structures exist for:

  • type_ids: References to class types (e.g., Ljava/lang/String;).
  • proto_ids: Method prototypes (return types and parameters).
  • field_ids: References to class fields.
  • method_ids: References to methods, pointing to the class, prototype, and name.

Class Definitions and Method Code

The class_defs section links these IDs together to define classes, their inheritance hierarchies, interfaces, and the actual offset to the bytecode for their methods.

DEX Bytecode Overview

DEX bytecode is register-based, in stark contrast to the stack-based JVM bytecode. A register-based architecture typically requires fewer instructions to perform a given task, though each instruction might be larger.

Consider a simple addition: a = b + c.

  • JVM (Stack-based): iload_1, iload_2, iadd, istore_3 (4 instructions)
  • DEX (Register-based): add-int v0, v1, v2 (1 instruction)

Examining DEX Bytecode

You can inspect the DEX bytecode of any APK installed on your device using the dexdump tool or oatdump.

# Pull an APK from the device
adb shell pm path com.android.settings
adb pull /system_ext/priv-app/Settings/Settings.apk

# Extract the dex file and disassemble it
unzip Settings.apk classes.dex
dexdump -d classes.dex > decoded_dex.txt

A typical method in DEX might look like this:

0000: sget-object v0, Ljava/lang/System;.out:Ljava/io/PrintStream;
0002: const-string v1, "Hello, DEX!"
0004: invoke-virtual {v0, v1}, Ljava/io/PrintStream;.println:(Ljava/lang/String;)V
0007: return-void

Multi-DEX (Multidex Splitting)

Historically, the Dalvik executable format had a hard limit of 65,536 method references per .dex file. This limitation stems from the invoke-kind instructions, which use a 16-bit index to reference the method_ids table.

When an application exceeds this limit, the build system must split the code into multiple .dex files (e.g., classes.dex, classes2.dex, classes3.dex).

How ART Handles Multi-DEX

In older Android versions (Dalvik), handling multiple DEX files required a special support library (MultiDexApplication) to load secondary DEX files at runtime, which severely impacted application startup time.

In ART (Android 5.0 and above), the runtime natively supports loading multiple .dex files. During the installation phase, the dex2oat utility compiles all .dex files found in the APK into a single optimized OAT file, eliminating the runtime overhead of secondary DEX loading.

DEX vs. JVM Bytecode

The design of the DEX format is fundamentally different from JVM bytecode to accommodate mobile environments.

  • Architecture: DEX is register-based; JVM is stack-based.
  • Storage Efficiency: DEX merges all classes into one file, eliminating duplicate strings and constants. JVM stores them per class.
  • Execution Environment: DEX is executed by ART or Dalvik; JVM bytecode is executed by a standard Java Virtual Machine.

Understanding these differences is key when analyzing performance bottlenecks or analyzing Android internals.