Game Boy Advance Audio Interpolation

This post describes an audio enhancement that a Game Boy Advance emulator can implement to reduce audio aliasing and noise, at a fairly high level.

To start with, here’s a comparison from Metroid: Zero Mission as an example of what this can do:

Metroid: Zero Mission - Brinstar (Accurate interpolation)

Metroid: Zero Mission - Brinstar (Enhanced interpolation)

Much cleaner! The second recording does sound a little more muffled, but I’ll take that over the horrible audio aliasing in the first recording.

There are some alternative approaches to improving GBA audio that produce higher-quality results, such as NanoBoyAdvance’s excellent MP2K HQ feature, but the interpolation approach is notable in that it works with any GBA game (though quality can vary by game). MP2K HQ for example only works with games that use the MP2K audio driver (aka M4A aka Sappy), which is many games but not every game.

This approach is not particularly novel - VBA-M has supported enhanced audio interpolation for a very long time. I believe the implementation details are a bit different though.

Background

The previous post goes into more detail on how the GBA audio hardware works, but to summarize the part that’s most relevant for improving audio interpolation:

The GBA audio hardware outputs the final mixed audio samples using PWM at 1 of 4 possible sampling frequencies ranging from 32768 Hz to 262144 Hz. The vast majority of GBA games use 65536 Hz, though occasionally you’ll see 32768 Hz (e.g. Castlevania: Circle of the Moon).

The GBA resamples from each audio channel’s frequency to the PWM sampling frequency by applying nearest neighbor interpolation, i.e. just outputting the channel’s current sample. This causes extremely noticeable audio aliasing in the final audio output, particularly when games use very low sample rates with the 2 PCM channels, which many games unfortunately do - sample rates in the 10000-14000 Hz range are very common (e.g. the Metroid example above is 13379 Hz).

The core idea behind enhancing interpolation is fairly simple: what if, instead of accurately emulating how the GBA PWM hardware works, the emulator uses its own interpolation algorithm to resample from audio channels’ sample rates directly to the emulator’s audio output sample rate?

Source Sample Rate

The first step is figuring out the source sample rate to resample from.

As mentioned in the previous post, you can compute a PCM channel’s sample rate using the clock divider and counter reload value of the GBA timer that it’s tracking:

1
2


gba_clock = 1 << 24  # 16777216 Hz
sample_rate = gba_clock / ((0x10000 - timer.reload) * timer.clock_divider)

The sample rate can be a fractional number, though the final clock divider value ((0x10000 - reload) * divider) is always a positive integer.

Games generally play most of their PCM audio at the same sample rate, but you will occasionally see games use different sample rates for different songs (e.g. Castlevania: Circle of the Moon again, it uses a much higher sample rate for its main menu music).

You need to recalculate the channels’ sample rates (or at least check if they’ve changed) whenever any of the following occurs:

The game changes which timer a channel is tracking
Timer 0 or 1 is enabled or disabled
Timer 0 or 1’s reload value or clock divider changes
Timer 1’s cascading mode is enabled or disabled

When a PCM channel is tracking a disabled timer, it constantly outputs the last sample that it popped from its FIFO. While not completely accurate, you could emulate this as the channel being muted. This will cause problems if a game frequently disables and re-enables its audio timer but that is extremely uncommon. (I wish I could say I’ve never seen it, but…Driver 2 Advance.)

I’ve never seen a game use timer 1 in cascading mode as an audio timer, but the hardware theoretically supports it. You could calculate the effective sample rate like so:

1
2
3


gba_clock = 1 << 24
gba_clock_divider = (0x10000 - timers[0].reload) * timers[0].clock_divider * (0x10000 - timers[1].reload)
sample_rate = gba_clock / gba_clock_divider

However, in my own implementation I just fall back to nearest neighbor resampling when a channel is tracking timer 1 in cascading mode (for that channel only, not for everything). I haven’t seen a game do this and I can’t think of a reason why a game would want to do this.

Resampling & Mixing

Once you know each PCM channel’s source sample rate, you then need to resample from that to your output sample rate (e.g. 48000 Hz). You can ignore the PWM sampling frequency; if you’re not going for accurate emulation, there’s no reason to resample to the PWM sampling frequency as an intermediate step when you can resample directly to the output sample rate.

At this point this is no longer a GBA-specific problem, so you can plug in whatever resampling algorithm you want (or use a library), and then send it an input sample each time the PCM channel pops from its sample FIFO. I use my own resampling implementation so that I can control exactly how interpolation is performed. (Maybe a topic for a future post…)

You’ll need to separately resample the PSG channels down to your output sample rate. You could do nearest neighbor, which will sound somewhat similar to actual hardware, though not exactly the same since actual hardware nearest neighbor resamples to the PWM sampling frequency (usually 65536 Hz).

For higher quality PSG resampling you could do the same thing you can do in a GB/GBC emulator: sample all 4 PSG channels at 2097152 Hz (every 8 GBA CPU cycles), mix them at 2097152 Hz, then interpolate down from 2097152 Hz to your output sample rate. This works because every possible PSG channel frequency is an even divisor of 2097152 Hz, and due to how PSG audio generation works you can safely assume that samples are infinitely repeated in between sample changes. (That’s not really true in physical reality, but it’s true if you’re sampling at 2097152 Hz.) Whatever interpolation you perform will need to involve a low-pass filter to avoid audio aliasing.

For final mixing, you probably want to leave everything as signed samples and completely ignore the GBA sound bias functionality. There’s also no reason to truncate the lowest bits as you’d do with accurate emulation, or even to round to the nearest integer; the resampling process will give you floating-point output samples, so you can just leave them that way and do floating-point mixing. If you want you can even scale the samples down to make clipping impossible (unlike on actual hardware), though that will make the audio output very quiet most of the time.

In my enhanced resampling implementation I support two different interpolation algorithms: 6-point cubic Hermite interpolation and windowed sinc interpolation.

Cubic Hermite is implemented exactly as described in a previous post on Sega CD, originally from here:

1
2
3
4
5
6
7
8
9


def interpolate_cubic_hermite_6p(samples, n, x):
    [ym2, ym1, y0, y1, y2, y3] = samples[n-2:n+4]

    c0 = y0
    c1 = 1/12 * (ym2 - y2) + 2/3 * (y1 - ym1)
    c2 = 5/4 * ym1 - 7/3 * y0 + 5/3 * y1 - 1/2 * y2 + 1/12 * y3 - 1/6 * ym2
    c3 = 1/12 * (ym2 - y3) + 7/12 * (y2 - ym1) + 4/3 * (y0 - y1)

    return ((c3 * x + c2) * x + c1) * x + c0

This algorithm works pretty well for upsampling, where your output sample rate is higher than the source sample rate. This is almost always the case for the GBA PCM channels, with the notable exception of Golden Sun: The Lost Age with its exceptionally high 63072 Hz sample rate. It does not work as well for downsampling unless you apply a low-pass filter to the source signal before interpolating; otherwise you’ll introduce some audio aliasing.

Sinc interpolation, or band-limited interpolation, is extremely high-quality but much more complex. Two approaches to implementing that:

Digital Audio Resampling Home Page (what I based my implementation on)
Band-Limited Sound Synthesis

Note that with the low sample rates common on GBA, band-limited interpolation doesn’t necessarily produce a more pleasant sound than simpler interpolation algorithms! Yes, it’s time for some comparisons.

Examples

First, Brinstar again, but this time also with a windowed sinc version - the example at the top used cubic Hermite. For reference, this game uses a sample rate of ~13379 Hz:

Metroid: Zero Mission - Brinstar (Accurate resampling)

Metroid: Zero Mission - Brinstar (Cubic Hermite interpolation)

Metroid: Zero Mission - Brinstar (Windowed sinc interpolation)

The windowed sinc version sounds less aliased but really muffled. I personally prefer the sound of the cubic version.

In general, sinc interpolation does a much better job at eliminating audio aliasing and noise than cubic interpolation does, but the complete removal of wave frequencies above the source signal’s Nyquist frequency (~6689 Hz here) muffles the sound. This is technically more accurate as far as resampling the original 13379 Hz signal, but in this case at least I don’t think it sounds better.

Here’s another 13379 Hz example, this one from Fire Emblem: The Blazing Blade:

Fire Emblem - Strike (Accurate resampling)

Fire Emblem - Strike (Cubic Hermite interpolation)

Fire Emblem - Strike (Windowed sinc interpolation)

I also prefer the cubic version for this one, again because the sinc version sounds very muffled.

Windowed sinc doesn’t always sound like that though! Here’s a song from Mega Man Zero 4, sample rate ~21024 Hz:

Mega Man Zero 4 - Esperanto (Accurate resampling)

Mega Man Zero 4 - Esperanto (Cubic Hermite interpolation)

Mega Man Zero 4 - Esperanto (Windowed sinc interpolation)

All three versions of this have noticeable static/hissing (thank you 8-bit sample quantization), but the windowed sinc version definitely has the least, and unlike the Metroid and Fire Emblem examples it doesn’t sound significantly more muffled than the cubic version.

Here’s an interesting example from Castlevania: Circle of the Moon, the aforementioned main menu music. It uses a PCM sample rate of 42048 Hz here but with the PWM sampling frequency set to 32768 Hz, which sounds fairly awful with accurate emulation:

Castlevania: Circle of the Moon - Main Menu (Accurate resampling)

Castlevania: Circle of the Moon - Main Menu (Cubic Hermite interpolation)

Castlevania: Circle of the Moon - Main Menu (Windowed sinc interpolation)

(If you’ve played Rondo of Blood, yes, it’s the same song.)

Finally, an example from Golden Sun: The Lost Age with its crazy high 63072 Hz sample rate. This song has really noticeable audio noise despite the high sample rate and the game’s high-quality audio mixing code:

Golden Sun: The Lost Age - Gloomy Caves (Accurate resampling)

Golden Sun: The Lost Age - Gloomy Caves (Cubic Hermite interpolation)

Golden Sun: The Lost Age - Gloomy Caves (Windowed sinc interpolation)

It’s not as noticeable as in the other examples, but sinc interpolation does remove some of the audio noise! Cubic interpolation doesn’t fare quite as well here, though the difference isn’t massive.

The PSG Problem

Here’s a song from Castlevania: Aria of Sorrow that uses both the PCM channels and the PSG channels:

Castlevania: Aria of Sorrow - Castle Corridor (Accurate resampling)

Castlevania: Aria of Sorrow - Castle Corridor (Cubic Hermite interpolation)

Castlevania: Aria of Sorrow - Castle Corridor (Windowed sinc interpolation)

I picked out Aria of Sorrow here because it uses a PCM sample rate of ~10512 Hz, very low.

Higher-quality interpolation removes most of the high-frequency aliasing from the PCM channel output, but the PSG output still accurately contains high-frequency wave components. This makes the PSG channels stand out much more than they’re supposed to in music that uses both the PCM and PSG channels, particularly at low PCM sample rates.

One possible solution to this is to try removing the high-frequency waves from the PSG output, i.e. a low-pass filter.

Here’s what happens after applying a low-pass filter to all PSG output with a cutoff frequency of 5256 Hz, right at the Nyquist frequency of sample rate 10512 Hz:

Castlevania: Aria of Sorrow - Castle Corridor (Cubic Hermite interpolation + PSG LPF)

Castlevania: Aria of Sorrow - Castle Corridor (Windowed sinc interpolation + PSG LPF)

I think that sounds a lot better than not low-pass filtering the PSG, and better than bluntly reducing PSG volume relative to PCM (I tried that, didn’t work well). The windowed sinc version could maybe benefit from using a low-pass filter with either a lower cutoff frequency or a steeper post-cutoff attenuation slope, but it still sounds a lot better than the version with unfiltered PSG I think.

This doesn’t work quite as well when games do the reverse, with the PSG leading the music while the PCM channels play secondary instruments (this one is also 10512 Hz):

Mega Man Battle Network 2 - Title Screen (Accurate resampling)

Mega Man Battle Network 2 - Title Screen (Cubic Hermite interpolation)

Mega Man Battle Network 2 - Title Screen (Cubic Hermite interpolation + PSG LPF)

This particular song has an additional issue where higher-quality PSG resampling makes the noise channel sound much quieter than it does on actual hardware (because it’s playing at an ultrasonic frequency), but even aside from that, I don’t think the interpolated version sounds better here.

In my implementation I dynamically set the low-pass cutoff frequency to (0.5 * pcm_sample_rate), designing a new 2nd-order Butterworth filter each time the sample rate changes. This doesn’t do a perfect job, but it works well enough for the task of moderately attenuating high-frequency PSG output.

I apply the filter after resampling from 2097152 Hz to 48000 Hz, both because it attenuates much more sharply (a mere 2nd-order Butterworth filter does not work well at very high source frequencies) and because it’s significantly less computationally expensive.

End

I think this approach works pretty well for cleaning up GBA audio in real time, though the performance impact can be pretty significant (mainly because of the PSG downsampling). It can’t completely eliminate audio noise (static/hissing sound) due to still receiving noisy 8-bit samples as input, but it significantly reduces audio aliasing and it does still remove some noise.