Advanced AOSP Subsystems
3 min read

MediaCodec

Overview

MediaCodec is the primary Android API (available in Java, Kotlin, and C++ via NDK) for accessing low-level media encoders and decoders. It acts as a wrapper around the native Stagefright engine and the Codec2 framework, allowing application developers to feed compressed data in and get uncompressed data out (or vice versa) with tight control over the buffer flow.

The MediaCodec API: Encode and Decode

MediaCodec operates on a buffer-in, buffer-out model. The application acts as a client that continuously interacts with the codec's input and output queues.

  1. Request an empty input buffer from the codec.
  2. Fill the buffer with data (e.g., a compressed H.264 NAL unit or raw PCM audio).
  3. Queue the input buffer back to the codec for processing.
  4. Dequeue a filled output buffer from the codec (e.g., the decoded YUV frame or compressed AAC data).
  5. Consume the data and release the output buffer back to the codec.

MediaCodec State Machine

To use MediaCodec correctly, developers must understand its strict internal state machine.

  • Uninitialized: The codec is just created via MediaCodec.createDecoderByType().
  • Configured: The app calls configure(), supplying the MediaFormat (resolution, bitrate, keys) and a target Surface (if applicable).
  • Flushed: The app calls start(). The codec is now ready to receive data.
  • Running: Buffers are actively being processed.
  • End-of-Stream (EOS): The app flags an input buffer with BUFFER_FLAG_END_OF_STREAM. The codec processes remaining buffers and flags the final output buffer with EOS.
  • Released: The codec is destroyed, and hardware resources are freed.

MediaCodec Surface Mode for Video

While MediaCodec can output raw video frames (YUV) into ByteBuffer objects for the CPU to process, this is highly inefficient. For video playback, MediaCodec is almost always used in Surface Mode.

Zero-Copy Rendering

During the configure() step, the app provides a Surface (e.g., from a SurfaceView or TextureView).

MediaCodec codec = MediaCodec.createDecoderByType("video/avc");
codec.configure(format, surface, null, 0);
codec.start();

When using a Surface:

  1. The output buffers requested by dequeueOutputBuffer do not contain actual pixel data. They act as tokens.
  2. The hardware decoder writes the decoded frame directly into a GraphicBuffer owned by the Surface.
  3. When the app calls releaseOutputBuffer(index, true /* render */), it tells MediaCodec to send that GraphicBuffer to SurfaceFlinger for display.

This creates a zero-copy pipeline: Network -> MediaExtractor -> MediaCodec (Hardware VPU) -> SurfaceFlinger (Hardware Display Controller).

Asynchronous Mode vs Synchronous Mode

Historically, apps used a while loop to poll dequeueInputBuffer and dequeueOutputBuffer (Synchronous mode). Starting in Android 5.0, Asynchronous mode via callbacks became the preferred approach.

Asynchronous Implementation

codec.setCallback(new MediaCodec.Callback() {
    @Override
    public void onInputBufferAvailable(MediaCodec mc, int inputBufferId) {
        // 1. Get the empty buffer
        ByteBuffer inputBuffer = mc.getInputBuffer(inputBufferId);
        // 2. Read data from file/network into inputBuffer
        // 3. Queue the buffer
        mc.queueInputBuffer(inputBufferId, 0, size, pts, 0);
    }

    @Override
    public void onOutputBufferAvailable(MediaCodec mc, int outputBufferId, BufferInfo info) {
        // 1. Frame is ready. Render it to the Surface.
        mc.releaseOutputBuffer(outputBufferId, true);
    }
    // ... other overrides
});

Asynchronous mode eliminates polling, reduces CPU wakeups, and allows the framework to push buffers to the application exactly when the hardware is ready, leading to smoother playback and better power efficiency.