The Game Boy APU

The Game Boy’s APU (audio processing unit) is responsible for generating the chiptunes that you (probably) know and love. Unlike with the PPU and the CPU, it’s possible to write a Game Boy emulator that runs most things pretty well without emulating the APU at all, though you’ll obviously have no audio. But what’s the fun in that?

My APU implementation didn’t change much from the first go around, so this will be a more general discussion of the APU rather than discussing how I did things differently.

Digital Audio

Before discussing anything related to audio emulation, it is necessary to discuss how digital audio works.

In the real world, our ears pick up vibrations in the air and our brains interpret those vibrations as sound. These vibrations are often called audio waves. The wave oscillation frequency determines the pitch that we hear (higher frequency -> higher pitch), and the wave magnitude determines how loud it sounds.

In the digital world, it’s impossible to 100% accurately represent an audio wave because that would require an infinite amount of space. Instead, audio waves are represented by sampling the waveform at fixed time intervals and storing those samples. Human ears cannot hear audio waves with frequencies higher than about 20 KHz, so if the samples are taken at a high enough rate, the sampled waveform is imperceptibly different from the real thing (as long as an appropriate filter is applied to remove potential aliasing). 44.1 KHz and 48 KHz are common sample rates in consumer audio, which is a more concise way of saying 44100 or 48000 samples per second.

In modern audio, sample values are typically represented using either signed 16-bit integers (ranging from -32768 to +32767) or as floating-point values (ranging from -1.0 to +1.0). Audio hardware is able to take a stream of samples and generate audio waves in the real world.

One thing that helped me when I was trying to understand all this initially was writing a very short program that plays a tone by generating raw sample values and playing them through a low-level audio API (e.g. SDL2). It’s pretty easy to generate samples representing a square wave, for example - you can generate 100 samples with a value of -0.5, then 100 samples with a value of +0.5, then 100 more samples with a value of -0.5, and repeat for however long you want the tone to play:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


// 2 seconds worth of square wave samples at 48 KHz
let samples: Vec<_> = (0..96000)
    .map(|i| {
        if i % 200 < 100 {
            -0.5
        } else {
            0.5
        }
    })
    .collect();

Many audio APIs are callback-based, but SDL2’s AudioQueue API for example lets you push samples into a queue whenever they’re ready:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


use sdl2::audio::AudioSpecDesired;
use std::error::Error;
use std::thread;
use std::time::Duration;

fn generate_samples() -> Vec<f32> {
    // definition omitted
}

fn main() -> Result<(), Box<dyn Error>> {
    let audio = sdl2::init()?.audio()?;

    let audio_queue = audio.open_queue(
        None,
        &AudioSpecDesired {
            freq: Some(48000),
            channels: Some(1),
            samples: None,
        },
    )?;
    audio_queue.resume();

    let samples = generate_samples();
    audio_queue.queue_audio(&samples)?;

    // Sleep while the sound plays via SDL2's audio thread
    thread::sleep(Duration::from_secs(2));

    Ok(())
}

Back to Regularly Scheduled Programming

Now, back to the Game Boy.

The Game Boy APU contains 4 audio channels:

Two pulse wave generators, one of which supports sweep (a unit that automatically adjusts the wave frequency over time)
A noise generator that generates pseudo-random noise using a linear feedback shift register
A custom wave channel that can play back raw 4-bit sample values stored in 16 bytes of dedicated waveform RAM

Each channel generates a stream of 4-bit sample values ranging from 0 to 15. Each channel also has its own digital-to-analog converter, or DAC, which converts these streams of digital sample values into analog audio signals.

The 4 audio signals feed into a mixer that mixes the 4 generated audio waves into 2 output waves: a left channel and a right channel for stereo sound. The Game Boy speaker only supports monoaural sound (in which case the left and right channels just get mixed together), but the Game Boy does support stereo sound when using headphones.

The mixer applies a configurable volume multiplier to each of the two output channels, which then pass through an amplifier and on to the output device (either the built-in speaker or headphones).

The primary sources of APU documentation available are Pan Docs and the GBDev wiki. You really need to reference both of them to create an accurate APU implementation because they both have details that the other is missing. Pan Docs also has weird explanations for some things, possibly as a side effect of being written primarily for homebrew developers rather than emulator developers.

Pulse Waves

You could try to emulate the APU by looking at how each channel generates its waveform and writing code that emulates that. For example, the pulse channels.

Each pulse channel has the following settings:

Frequency
Duty cycle
Length counter (if enabled, automatically mute the channel after a set time)
Envelope (handles volume)

One of the two pulse channels also supports sweep, which I’m going to ignore for the moment.

A pulse wave is a more generalized version of a square wave, repeatedly oscillating between two discrete values. The main difference is that a pulse wave is not necessarily split between 50% time at one value and 50% time at the other value. The Game Boy’s pulse waves oscillate between 0 and 1.

The pulse wave’s duty cycle defines how much time the waveform spends at each of the two values before oscillating to the other. The Game Boy pulse wave generators support 4 different duty cycles, each describing an 8-step waveform:

12.5% / 87.5%
25% / 75%
50% / 50% (square wave)
75% / 25% (simply an inversion of 25% / 75%)

A pulse channel’s frequency can be configured as an 11-bit number ranging from 0 to 2047. The overall wave frequency is equal to:

          4194304 Hz
   -------------------------
    32 * (2048 - frequency)

The phase step rate (how often the generator takes a step in the duty cycle waveform) is:

          4194304 Hz
   -------------------------
     4 * (2048 - frequency)

…since there are 8 steps in the waveform. The 4194304 Hz number is the Game Boy’s CPU clock speed, which is a very big hint on how to time these updates. The waveform step should advance once every 4 * (2048 - frequency) CPU clock cycles.

Each channel has its own length counter and envelope that are clocked by a 512 Hz frame sequencer timer in the APU. It clocks the length counters 256 times per second and the envelopes 64 times per second.

Based on the Pan Docs description of the length counter, on each clock, it increments by 1 if it’s enabled. When it reaches its max value of 64 it automatically disables the channel. Disabling a channel causes it to constantly output samples of 0 until the channel is re-enabled.

The envelope is used to control volume. Each envelope has a starting volume, a period, and a direction. For a non-zero period N, the envelope automatically increments or decrements the channel’s volume after every N envelope clocks. A period of 0 disables auto-increment/decrement and causes the channel to output at a constant volume.

The pulse channel’s output is equal to the waveform generator’s output (0 or 1) multiplied by the current envelope volume (0 to 15). This is the 4-bit sample value that gets passed through the channel’s DAC.

Now, you could write code that emulates the channel by generating waves as described here based on the current channel configuration - that’s what I did in my first pass at this in my first emulator. It will sort of work! You’ll get audio that sounds kind of right some of the time, but very wrong at other times. Your audio will likely have lots of pops and cracks that aren’t there on actual hardware (though GB audio does tend to have lots of pops regardless).

If you want to do better than that, you need to emulate at a lower level. Surprisingly, this ends up being simpler to get right!

A Bunch of Counters

Reading over the Pan Docs pages is really helpful for understanding what the different channels do and how they can be configured, but the GBDev wiki page is much more helpful for understanding how the APU works internally. Let’s discuss how the pulse channels actually operate.

Starting with the waveform generator, GBDev describes each duty cycle in terms of 8 concrete steps:

Duty   Waveform    Ratio
-------------------------
0      00000001    12.5%
1      10000001    25%
2      10000111    50%
3      01111110    75%

The generator starts at step 0 when the APU is powered on. Each time it’s clocked it advances to the next step, wrapping around from the last step to the first step.

The waveform generator is clocked by a timer with a period of 4194304 Hz / (4 * (2048 - frequency)), where frequency is the configured 11-bit frequency value.

The APU implements its frequency timers as down-counters that are reloaded upon decrementing to 0. On each 4194304 Hz clock, the counter is decremented by 1. When the counter decrements to 0, the timer generates an output clock and reloads its counter with the current period value. Emulating it this way nicely handles the quirk that frequency changes don’t take effect immediately - the frequency is only read when reloading the counter, which only happens when the counter decrements to 0 (or when the channel is triggered).

You could implement such a down-counter something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


#[derive(Debug, Clone, Copy)]
struct PulsePhaseTimer {
    phase: u8,
    counter: u16,
    frequency: u16,
}

impl PulsePhaseTimer {
    fn new() -> Self {
        Self { phase: 0, counter: 4 * 2048, frequency: 0 }
    }

    fn tick(&mut self) {
        self.counter -= 1;
        if self.counter == 0 {
            // Reload counter and move one step in the waveform
            self.counter = 4 * (2048 - self.frequency);
            self.phase = (self.phase + 1) % 8;
        }
    }

    fn trigger(&mut self) {
        self.counter = 4 * (2048 - self.frequency);

        // Triggering does not reset phase!
    }
}

The tick() method should be called once per CPU clock cycle for correct timing.

You could implement the duty cycle something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


#[derive(Debug, Clone, Copy, Default)]
enum DutyCycle {
    #[default]
    Eighth,
    Fourth,
    Half,
    ThreeFourths,
}

impl DutyCycle {
    fn waveform_step(self, phase: u8) -> u8 {
        let waveform = match self {
            Self::Eighth => [0, 0, 0, 0, 0, 0, 0, 1],
            Self::Fourth => [1, 0, 0, 0, 0, 0, 0, 1],
            Self::Half => [1, 0, 0, 0, 0, 1, 1, 1],
            Self::ThreeFourths => [0, 1, 1, 1, 1, 1, 1, 0],
        };
        waveform[phase as usize]
    }
}

Combine these two by using the timer’s current phase to look up the current duty cycle step, and we have a functional waveform generator!

The length counter can also be implemented as a down-counter (and it’s easier to do that way), despite Pan Docs describing it as an up-counter. The current length counter value is not exposed to Game Boy software so it’s really an emulation implementation detail.

For a down-counter implementation, whenever a new length counter value is written, simply set the counter to 64 - length rather than setting it to length. Decrement the counter on every clock, and when it decrements to 0, disable the channel:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


#[derive(Debug, Clone, Copy)]
struct LengthCounter {
    enabled: bool,
    counter: u8,
}

impl LengthCounter {
    fn new() -> Self {
        Self { enabled: false, counter: 64 }
    }

    fn load(&mut self, length: u8) {
        self.counter = 64 - length;
    }

    fn clock(&mut self, channel_enabled: &mut bool) {
        if !self.enabled || self.counter == 0 {
            return;
        }

        self.counter -= 1;
        if self.counter == 0 {
            *channel_enabled = false;
        }
    }

    fn trigger(&mut self) {
        // Triggering resets the counter to max value if it has expired
        if self.counter == 0 {
            self.counter = 64;
        }
    }
}

Finally, the envelope, which can automatically increase or decrease the channel volume every time it’s clocked by the frame sequencer. This one is a bit more complex, mainly because of the detail that envelope configuration changes typically don’t take effect until the channel is triggered:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54


#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
enum Direction {
    #[default]
    Decreasing,
    Increasing,
}

#[derive(Debug, Clone, Copy, Default)]
struct Envelope {
    // Active counter values
    volume: u8,
    counter: u8,
    period: u8,
    direction: Direction,
    // Configured values via registers
    starting_volume: u8,
    configured_period: u8,
    configured_direction: Direction,
}

impl Envelope {
    fn clock(&mut self) {
        // Period of 0 disables volume auto-increment/decrement
        if self.period == 0 {
            return;
        }

        self.counter -= 1;
        if self.counter == 0 {
            self.counter = self.period;

            match (self.direction, self.volume) {
                (Direction::Decreasing, 0) | (Direction::Increasing, 15) => {
                    // Volume cannot decrease past 0 or increase past 15; do nothing
                }
                (Direction::Decreasing, _) => {
                    self.volume -= 1;
                }
                (Direction::Increasing, _) => {
                    self.volume += 1;
                }
            }
        }
    }

    fn trigger(&mut self) {
        // Copy configured values and reload counter
        self.volume = self.starting_volume;
        self.direction = self.configured_direction;
        self.period = self.configured_period;

        self.counter = self.period;
    }
}

This example implementation is not accurate in terms of obscure behaviors because I think emulating those behaviors makes the code a lot harder to follow.

With the envelope in place, we can use the current volume value as the multiplier for the wave generator output, and we can start generating samples! I’ve omitted some details, like:

Everything related to channel 1’s sweep unit
How to handle the DAC being enabled/disabled
How to generate the frame sequencer clocks for the length counter and envelope (see both references for an overview, but it’s basically just another counter except it gets clocked by the system timer)

These aren’t too difficult to handle other than the sweep unit, which definitely has some quirks.

Ignoring those details, we could put it together like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42


#[derive(Debug, Clone)]
struct PulseChannel {
    channel_enabled: bool,
    timer: PulsePhaseTimer,
    duty_cycle: DutyCycle,
    length_counter: LengthCounter,
    envelope: Envelope,
}

impl PulseChannel {
    fn tick(&mut self) {
        self.timer.tick();
    }

    fn clock_length_counter(&mut self) {
        self.length_counter.clock(&mut self.channel_enabled);
    }

    fn clock_envelope(&mut self) {
        self.envelope.clock();
    }

    fn trigger(&mut self) {
        // Note: If the DAC is disabled, triggering should not re-enable the channel
        self.channel_enabled = true;

        self.timer.trigger();
        self.length_counter.trigger();
        self.envelope.trigger();
    }

    fn sample(&self) -> u8 {
        if !self.channel_enabled {
            return 0;
        }

        let waveform_step = self.duty_cycle.waveform_step(self.timer.phase);
        let volume = self.envelope.volume;
        
        waveform_step * volume
    }
}

There’s one big problem though…

Resampling

Say we call the pulse channel’s tick() method once per CPU T-cycle (or equivalently 4 times per CPU M-cycle), we call the clock_length_counter() and clock_envelope() methods with appropriate timing based on the frame sequencer timer, and we call trigger() any time the NR14/NR24 register gets written with bit 7 set. We can then easily get the current sample value at any time by calling sample().

…But at which times do we generate samples?

We ultimately want to generate an audio signal with a sample rate that consumer audio hardware can deal with, like say 48 KHz. With a pulse channel set to maximum frequency, the phase timer steps at a rate of 1.048576 MHz! And the custom wave channel can step at twice that rate! Outputting a sample every time the value could change will definitely not work.

This gets into the topic of digital signal resampling, or sample-rate conversion, which is the process of converting a digital signal with one sample rate to a digital signal with a different sample rate. Doing this well turns out to be extremely complicated!

The simplest approach is to apply a nearest-neighbor filter. With an input sample rate of 1.048576 MHz and an output sample rate of 48 KHz, the output signal should contain exactly 1 sample per 21.845333~ input samples. A nearest-neighbor filter simply outputs the nearest input sample after each 21.845333~ sample step (“nearest” being necessary due to the fractional step).

This isn’t ideal, but it works! The sound might be a bit harsh, but audio will sound mostly correct in most games. Here’s King Dedede’s theme from Kirby’s Dream Land, though with only the 2 pulse channels emulated:

The biggest problem is that you’ll get audio aliasing in any games that set channels to extremely high frequencies, such as Final Fantasy Adventure. These ultrasonic frequencies are not audible to the human ear, but naive downsampling causes these extremely high frequency waves to create audible aliasing in the downsampled signal. This often manifests as a really obnoxious buzzing or ringing noise.

Here’s the music from Final Fantasy Adventure’s opening fight with a basic nearest-neighbor filter:

There are a few possible solutions to this. The simplest is to have channels output constant 0s when they’re configured to extremely high frequencies, say greater than 20 KHz. This isn’t accurate to the hardware but it mostly solves the aliasing problem.

A more sophisticated solution is to apply a low-pass filter before downsampling. This will remove waves that are above a certain frequency so that they won’t cause aliasing in the downsampled signal. The most precise way to do this involves using the Fast Fourier Transform and its inverse in order to apply the filter in what’s called the frequency domain, but this can be computationally expensive to do live at runtime.

Blargg’s Blip Buffer library is one way to apply low-pass filtering that is used by a number of well-known emulators. I came up with my own low-pass filter implementation that uses a FIR filter generated using Octave, which is a fancy way to say that I have an array of coefficients that I apply to my input sample buffer every time I need to generate an output sample:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


const LPF_COEFFICIENT_0: f64 = ...;

const LPF_COEFFICIENTS: &[f64; 45] = &[...];

fn generate_output_sample(input_buffer: &VecDeque<f64>) -> f64 {
    LPF_COEFFICIENT_0
        + LPF_COEFFICIENTS
            .iter()
            .zip(input_buffer)
            .map(|(&a, &b)| a * b)
            .sum::<f64>()
}

I’m not going to claim that this is better than other solutions (it’s almost certainly much less efficient), but it seems to work decently for this particular aliasing issue. Here’s the same Final Fantasy Adventure audio with my low-pass filter applied (no more ringing!):

The Remaining Features

Here’s a quick overview of what I didn’t cover:

Some finer details regarding “triggering”, which is how Game Boy software restarts the audio channels after making configuration changes
The channel DACs, which convert the 0-15 digital sample values to analog audio signals (which you could represent using floating-point numbers between -1 and +1)
The frame sequencer, a 512 Hz clock driven by the system timer that is used to clock the length counters, the envelopes, and the channel 1 sweep unit
Speaking of which, the channel 1 sweep unit, which can automatically adjust the pulse wave frequency over time and automatically disable the channel when frequency overflows
The custom wave channel, which cycles through raw sample values from waveform RAM at a configurable frequency
The noise channel, which generates psuedo-random noise using a linear feedback shift register (LFSR) that gets clocked at a configurable frequency
The mixer, which mixes the input signals together (by just adding them) and applies L/R volume multipliers

The trickiest of these to get right is probably the sweep, which has some quirky edge case behaviors related to when exactly it disables the channel on frequency overflow. The custom wave and noise channels are pretty straightforward to implement once you have a working pulse channel implementation.

Bringing It All Together

Here’s the Final Fantasy Adventure audio once more, this time with a complete APU implementation:

Just for fun, here’s another tune from Bionic Commando GB (with some sound effects in the background):

And finally, here’s a snippet from Prehistorik Man, which does some sort of ridiculous things with the hardware:

I will probably not do a follow-up covering the remaining APU details because honestly, once you have the pulse channels and general audio output working, the rest of the APU isn’t that hard to get right (barring some of the obscure behaviors and sweep edge cases). Just keep in mind that it’s a bunch of timers and counters!