Game Boy Advance Audio

This is a moderately low-level overview of the Game Boy Advance audio hardware, from the perspective of having recently emulated it.

GBA audio has a reputation for being very low-quality, one that I personally think is deserved, but this is not entirely caused by the audio hardware. At least not directly.

Audio Channels

The Game Boy Advance APU has 6 audio generation channels: 4 PSG channels (programmable sound generation) and 2 PCM channels (raw sample playback).

The 4 PSG channels are slightly modified versions of the Game Boy Color’s 4 audio channels: 2 pulse wave generators, a pseudorandom noise generator, and a wavetable channel (plays custom 4-bit samples in a loop). These were possibly only included to support backwards compatibility with Game Boy [Color] games, but native GBA software can use them as well.

The pulse and noise channels are largely unchanged from GBC, with only minor differences. Like on GBC, only 1 of the 2 pulse channels supports frequency sweep, where the channel can automatically increase or decrease its frequency over time.

The wavetable channel is more changed from its GBC incarnation. The GBC version only supports 32-sample loops, and software can’t modify wavetable RAM while the channel is playing. The GBA version supports both 32-sample and 64-sample loops, and wavetable RAM is split into two 32-sample banks so that software can modify one bank while the channel is playing from the other bank. GBA also has a new 75% volume option ((3 * sample) >> 2)) where GBC only supports 100%, 50%, 25%, and 0% volume.

In practice, few if any games use the additional GBA-exclusive wavetable features. Most games use the wavetable channel as if it works exactly like the GBC’s version, only with some different behaviors around writing to wavetable RAM.

The other big change from GBC to GBA is that the GBA PSG channels do not have per-channel DACs (digital to analog converters) like the GBC channels do. Each channel outputs digital sample values rather than an analog audio signal, and the GBA digitally emulates the GBC’s analog mixing hardware. (More details on this in the mixing section.)

PCM (Direct Sound)

The PCM channels, called Direct Sound channels, are new to GBA. There are two channels, A and B, with equivalent functionality (with the PSG channels being channels 1-4).

Games are meant to use one channel for L audio output and the other for R output, though the hardware also supports mono audio output using only one channel. Most games with stereo audio seem to use channel A for R output and channel B for L output.

These channels are really simple. All they do is directly output signed 8-bit PCM samples supplied by software at a configurable sample rate, with an optional 50% volume reduction. That is it. There are no sound generation or post-processing effects - they are very simple 8-bit sample playback channels.

Each channel has a (logically) 32-sample FIFO queue for input so software doesn’t need to write samples at exactly the right sample rate, unlike PCM playback on some other consoles (namely Sega Genesis).

Games control each channel’s sample rate using one of the GBA’s hardware timers. Each PCM channel can be configured to track either timer 0 or timer 1. Whenever the selected timer overflows from 0xFFFF to 0x0000, the channel will pop a sample off its FIFO queue and output that sample until the next FIFO pop. In practice, games generally configure both PCM channels to track timer 0.

Given a timer’s counter reload value and CPU clock divider (normally 1), you can compute the effective sample rate like so:

1
2


gba_clock = 1 << 24  # Roughly 16.77 MHz
sample_rate = gba_clock / ((0x10000 - timer.reload) * timer.clock_divider)

For example, Fire Emblem configures its audio timer with a reload value of 0xFB1A and a divider of 1, which comes out to a sample rate of about 13379 Hz. That is very low in absolute terms, but by GBA standards it’s not abnormally low - sample rates in the 10000-14000 Hz range are very common.

Software can use two of the GBA’s DMA transfer channels in a special audio DMA mode where they’ll automatically copy samples from RAM buffers to the FIFO queues at exactly the correct rate. When DMA channel 1 or 2 is in audio DMA mode (“special” start timing), it will automatically perform a 4-word / 16-sample transfer any time the PCM channel’s FIFO queue is at least half-empty. For the purposes of checking when to trigger audio DMA, DMA1 is hardwired to channel A and DMA2 is hardwired to channel B.

Note, the FIFOs are actually 7-word queues plus some sort of 4-sample shift buffer, not true 32-sample queues. This difference is not significant in most cases, but it does affect audio DMA timing in ways that a handful of games depend on. Audio DMA triggers when the PCM channel’s audio timer overflows while it has at least 4 empty slots in its 7-word queue, checked before popping a sample from the FIFO (which might pop a word from the 7-word queue into the 4-sample shift buffer).

Pretty much every game uses the PCM channels while games vary in how they use the PSG channels. A few games heavily use the PSG channels (e.g. Mega Man Battle Network series), some games use the PSG channels only for sound effects or occasional harmonic notes, and some games don’t use the PSG channels at all.

Using Mega Man Battle Network 2’s title screen theme as an example, here’s what parts of it come from the PCM channels vs. the PSG channels:

Mega Man Battle Network 2 - Title Screen

Mega Man Battle Network 2 - Title Screen (PCM channels)

Mega Man Battle Network 2 - Title Screen (PSG channels)

Some games use the PSG a little more subtly, like Castlevania: Aria of Sorrow in a few songs:

Castlevania: Aria of Sorrow - Castle Corridor

Castlevania: Aria of Sorrow - Castle Corridor (PCM channels)

Castlevania: Aria of Sorrow - Castle Corridor (PSG channels)

And some games even moreso, like Metroid Fusion here (first PSG note at 0:19, next at 0:32):

Metroid Fusion - Sector 1 SRX

Metroid Fusion - Sector 1 SRX (PCM channels)

Metroid Fusion - Sector 1 SRX (PSG channels)

Mixing

Like the original Game Boy and Game Boy Color, GBA only has one speaker, but it supports stereo audio output through its headphone jack. Each audio channel can be individually panned to L output, R output, or both using bits in the SOUNDCNT_L and SOUNDCNT_H registers.

The audio mixing is a bit particular. I couldn’t find a fully detailed explanation of how it works anywhere, so here’s my own attempt. This is partly based on existing resources and partly based on running some of my own tests on actual hardware. I won’t claim this is 100% accurate (it’s almost certainly not), but it matches hardware’s behavior in some admittedly contrived edge cases.

The hardware ultimately produces an unsigned 10-bit sample (0 to 0x3FF / 1023) for final output. All mixing is performed separately for L output and R output. If an audio channel is not enabled for a given output channel, the mixing behaves as if that channel’s output is a constant 0, ignoring the below channel output calculations.

First, the hardware separately mixes the PCM and PSG channels.

The PCM mixing is simple: it left shifts each sample by 2 (from signed 8-bit to signed 10-bit), applies the 50% volume reductions if enabled, adds the samples together, then clamps the sum to signed 10-bit. It’s unusual to have both PCM channels outputting to the same output channel, but the hardware does support it, though the sample sum can’t exceed the range of a single channel’s sample.

1
2
3
4


# PCM samples are signed 8-bit (-128 to +127)
# Volumes are 0 (50%) or 1 (100%)
pcm_out = (pcm_a << (1 + pcm_a_volume)) + (pcm_b << (1 + pcm_b_volume))
pcm_out = pcm_out.clamp(-0x200, 0x1FF)

Examples of games that set both PCM channels to play to the same output channels are Metroid Fusion and Metroid: Zero Mission, which both do this when the sound option is set to “GBA speaker” rather than “headphones”. The clamping causes pretty bad audio distortion in GBA speaker mode.

Metroid: Zero Mission - Brinstar (In-game speaker option)

Metroid: Zero Mission - Brinstar (In-game headphones option)

The PSG mixing is slightly more involved, mainly because of how the GBA emulates the GBC’s mixing hardware.

Each PSG channel outputs an unsigned 4-bit sample. On GBC, these get converted to analog -1 to +1 in arbitrary voltage units, and channels with disabled DACs output analog 0. GBA tries to emulate this digitally by multiplying each sample by 2 and using the current channel volume as a bias:

1
2
3


# channel_volume is unsigned 4-bit (0 to 15)
# current_sample is either 0 or channel_volume
channel_out = 2 * current_sample - channel_volume

This produces signed 5-bit sample values ranging from -15 to +15, with disabled channels outputting 0 due to volume being 0. This doesn’t produce the exact same results as the GBC’s per-channel DACs, but it plays more nicely with the GBA’s digital mixing.

…Well, that’s how it works for the pulse and noise channels. The wavetable channel doesn’t have a 4-bit volume the same way the pulse and noise channels do, so the GBA always biases the wavetable channel’s output by -15, regardless of what the wavetable volume is set to. This produces an audible pop whenever you mute or unmute the wavetable channel in the PSG stereo control register SOUNDCNT_L (GBA’s version of GBC’s NR50 and NR51 registers).

1
2


# current_sample is unsigned 4-bit (0 to 15)
wavetable_out = 2 * current_sample - 15

The 4 channel outputs are mixed by simply summing them and then applying the PSG L/R master volume multiplier, which just like on GBC ranges from 1 to 8. This produces a signed 10-bit sample, same scale as the mixed PCM sample.

1
2
3


# Each channel's output is signed 5-bit (-15 to +15)
# Master volume is 1 to 8
psg_out = (pulse1_out + pulse2_out + wavetable_out + noise_out) * psg_master_volume

The last step before final mixing is a GBA-specific optional PSG volume reduction, configured using the lowest 2 bits in SOUNDCNT_H. From my testing, the “prohibited” volume 3 behaves the same as volume 0, i.e. 25% volume. I don’t believe anything ever uses volume 3 though.

1
2


# gba_psg_volume is 0/3 (25%), 1 (50%), or 2 (100%)
psg_out >>= 2 - (gba_psg_volume % 3)

Final mixing first sums the mixed PCM and PSG samples (both signed 10-bit), clamps to signed 10-bit, then adds a software-configurable sound bias value (unsigned 10-bit). The sound bias is practically always set to 0x200 / 512, the center point of unsigned 10-bit.

1
2
3
4


# pcm_out and psg_out are signed 10-bit
# sound_bias is unsigned 10-bit (almost always 0x200 in practice)
signed_out = (pcm_out + psg_out).clamp(-0x200, 0x1FF)
final_out = (signed_out + sound_bias).clamp(0, 0x3FF)

The only reason to set sound bias to anything other than 0x200 is that setting it to 0 slightly reduces power usage if software isn’t playing any audio at all. Instantly changing the sound bias between 0x200 and 0 produces an audible pop, so the GBA BIOS provides a system call that gradually adjusts the bias from 0x200 to 0 or vice versa.

All together (assuming every single channel is unmuted in SOUNDCNT_L and SOUNDCNT_H):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


# PCM mixing
pcm_out = (pcm_a << (1 + pcm_a_volume)) + (pcm_b << (1 + pcm_b_volume))
pcm_out = pcm_out.clamp(-0x200, 0x1FF)

# PSG mixing
# For any channels muted in SOUNDCNT_L, output 0 instead of using the below output calculations
pulse1_out = 2 * pulse1_sample - pulse1_volume
pulse2_out = 2 * pulse2_sample - pulse2_volume
wavetable_out = 2 * wavetable_sample - 15
noise_out = 2 * noise_sample - noise_volume
psg_out = (pulse1_out + pulse2_out + wavetable_out + noise_out) * psg_master_volume
psg_out >>= 2 - (gba_psg_volume % 3)

# Final mixing
signed_out = (pcm_out + psg_out).clamp(-0x200, 0x1FF)
final_out = (signed_out + sound_bias).clamp(0, 0x3FF)

A corollary of the mixing is that a single PSG channel at max volume has the same output level range as a PCM channel with samples ranging from -30 to +30, about one quarter of the full PCM sample range.

The clamping to signed 10-bit can also introduce audio distortion when using both the PCM and PSG channels, if software doesn’t take care to avoid this (e.g. by reducing channel volumes and/or by not using the full PCM sample range).

Output

The mixed unsigned 10-bit samples are output using 1-bit PWM (pulse width modulation) at the GBA clock rate of ~16.77 MHz. Software can select 1 of 4 PWM sampling frequencies:

32768 Hz, 9-bit sample resolution
65536 Hz, 8-bit sample resolution (most commonly used, typically the best audio quality)
131072 Hz, 7-bit sample resolution
262144 Hz, 6-bit sample resolution

The chip converts the 10-bit samples to lower bit depth by simply truncating the lowest bits. This causes extremely significant quantization noise (sounds like hissing or static) when using the PCM channels with the 131072 Hz or 262144 Hz sampling frequencies. 65536 Hz isn’t nearly as bad because the PCM samples are 8-bit to begin with (as long as the 50% volume reduction is not enabled).

PWM audio output works by alternating between 0% voltage and 100% voltage extremely rapidly, at a frequency above the audible frequency range. Within each sample period, the sample value determines the pulse duty cycle, i.e. the fraction of the period where it outputs 100% voltage vs. 0%. The human ear will perceive this as if the voltage is being held constant at a specific level.

For example, with the 65536 Hz sampling frequency, the GBA outputs an 8-bit sample once every 256 CPU clock cycles. With a sample value of e.g. 70, the PWM hardware will output 100% voltage for 70 cycles and then 0% voltage for the remaining 186 cycles (though possibly not in that order). The human ear will perceive this the same as a constant ~27% voltage since 65536 Hz is well above the audible frequency range.

GBA PWM Sample value 70 at 65536 Hz sampling frequency / 8-bit sample resolution

The GBA must internally resample all audio output to the selected PWM sampling frequency. It performs this resampling using nearest neighbor interpolation, i.e. no interpolation. When it needs a sample for PWM output, it simply mixes the current output of each channel. This causes significant audio aliasing, especially when games use low sample rates with the PCM channels.

Emulators generally don’t emulate the 1-bit PWM output directly, but nearest neighbor resampling to the PWM sampling frequency is necessary if you want to accurately emulate the aliasing that’s present on actual hardware. Of course, you might not want to emulate that part accurately, but that’s outside the scope of this post.

Here’s an example that’s playing at 10512 Hz with a PWM sampling frequency of 65536 Hz:

Castlevania: Aria of Sorrow - The Black Sun

Castlevania: Aria of Sorrow - Confrontation

Actual hardware might have a low-pass filter or something to reduce the aliasing, but it really doesn’t sound much different from those recordings.

For an example on the other extreme, here’s Golden Sun: The Lost Age using a PCM sample rate of 63072 Hz, a shockingly high sample rate by GBA standards:

Golden Sun: The Lost Age - Venus Lighthouse

Golden Sun: The Lost Age - Battle Theme (Jenna)

Golden Sun: The Lost Age - Battle Theme (Felix)

This isn’t a totally fair comparison because The Lost Age has higher-quality audio mixing code in addition to using a much higher sample rate, but still! That’s a massive difference in audio quality!

Normally there there are sharply diminishing returns to using audio sample rates higher than 40000ish Hz, but on GBA, using an extremely high sample rate like 63072 Hz produces noticeably less aliased output than using, say, 44100 Hz (CD-quality). This is because there’s a much smaller difference between the audio sample rate and the PWM sampling frequency.

For a more fair comparison to The Lost Age, here’s an example from the first Golden Sun, which has similarly high-quality audio mixing code but is using a sample rate of only 21024 Hz:

Golden Sun - Venus Lighthouse

Golden Sun - Battle Theme

Still very good by GBA standards, but not quite up to the same quality as its sequel.

The resampling-induced aliasing is not exclusive to the PCM channels! It can also alter the sound of the PSG channels. In the Mega Man Battle Network 2 song I posted above, it actually relies on audio aliasing to make the PSG noise channel sound much louder. With enhanced higher-quality resampling in an emulator, it sounds way quieter:

Mega Man Battle Network 2 - Title Screen (Accurate audio interpolation)

Mega Man Battle Network 2 - Title Screen (Enhanced audio interpolation)

This happens because it plays the noise channel at the maximum possible frequency (524288 Hz LFSR clock rate), which is higher than the PWM sampling frequency (65536 Hz). In actual hardware, this causes significant audio aliasing. With higher-quality interpolation however, a lot of the channel’s output gets filtered out as high-frequency noise because…well, because that’s literally what it is!

Anyway, in an emulator, you can convert the unsigned N-bit PWM samples to floating point (-1 to +1) for audio output by doing this:

1
2
3


# pwm_sample is unsigned N-bit
divisor = 1 << (N - 1)
output_sample = (pwm_sample - divisor) / divisor

You could alternatively just do pwm_sample / ((1 << N) - 1) and optionally use a high-pass filter to center the waveform at 0, but that will produce very quiet audio since it’s only using half of the full sample range.

Software Audio Generation

Golden Sun: The Lost Age demonstrates that the GBA audio hardware is capable of decent-sounding audio, although some amount of audio noise and aliasing are unavoidable due to the PCM channels only accepting 8-bit samples as input and then nearest neighbor resampling to the PWM sampling frequency (usually 65536 Hz).

The bigger contributors to poor audio quality are on the software side. Since the hardware only supports playing 1 stereo sample or 2 mono samples simultaneously, games must perform audio generation and mixing in software, and this includes any effects like volume or reverb - the hardware supports no effects beyond a blunt 50% volume reduction.

The GBA doesn’t have a secondary CPU dedicated to audio processing like some earlier gaming consoles do (e.g. SNES / Genesis), so any CPU time spent on audio processing is CPU time that’s not being spent on game logic, graphics processing, etc. GBA does have its own copy of the Game Boy Color CPU, but GBA software can’t touch it - it’s only accessible in GBC backwards compatibility mode.

After taking cycles-per-instruction into account, the GBA’s 16.77 MHz ARM7TDMI CPU is significantly more capable than the main CPUs of the SNES (3.58 MHz 65816) and Genesis (7.67 MHz 68000), though not quite as much as it might seem on paper - the ARM7TDMI doesn’t have a CPU cache, and in practice memory latency eats up a lot of clock cycles.

GBA does have 32 KB of fast RAM called IWRAM (internal working RAM) that the CPU can access with zero latency, but there’s only so much a game can fit in 32 KB of RAM. GBA also has a hardware cartridge ROM prefetcher that attempts to speed up code in cartridge ROM, but how well this works depends very heavily on what exactly the game code is doing; it’s definitely no replacement for a proper instruction cache.

Generating and mixing audio at a low sample rate is a very easy way to reduce the amount of CPU time spent on audio processing, hence why many games use sample rates below 14000 Hz.

Games also often mix audio using 8-bit sample buffers rather than 16-bit, both to save CPU time and to make the audio buffers consume less RAM - the latter particularly important because audio mixing buffers are usually stored in the fast-but-small IWRAM. Mixing with 8-bit precision causes significant audio noise due to quantization error, especially when a game is playing a medium/large number of software audio channels simultaneously. Golden Sun and The Lost Age are two of the few games that mix using 16-bit buffers, which is the main reason why they sound so much less noisy than most other GBA games.

It’s hard to say that low sample rates and 8-bit sample buffers were the wrong choices given that the GBA’s builtin speaker is quite poor, and the GBA SP models don’t have a headphone jack unless you buy a peripheral that plugs into the expansion port. Why sacrifice CPU time on producing high-quality audio when most people are probably going to hear it mangled by the GBA speaker?

Camelot was clearly willing to make that sacrifice when developing the Golden Sun games, but those are turn-based RPGs which are less CPU-demanding to begin with.

There are ways to improve GBA games’ audio quality, some of which even work on actual hardware via game hacks! But that is a topic for a possible future post.

Useful References

GBATEK: GBA/NDS Technical Info - https://www.problemkaputt.de/gbatek.htm
Pan Docs (Audio) - https://gbdev.io/pandocs/Audio.html
Why was the GBA sound so poor? - https://forums.nesdev.org/viewtopic.php?t=19688