Sega CD PCM Chip Interpolation

This is the second of two followups to my post on the Sega CD PCM chip. Where the last post described a way to improve audio quality by applying an audio filter to final mixed PCM chip output, this post will describe an audio enhancement that improves audio quality by changing how the emulated chip itself generates samples. This is very similar to what I described in this post on the SNES and PS1 sound chips, though the details are a bit different since actual hardware appears to perform no interpolation rather than overly soft interpolation.

Better Ways to Upsample Audio

Let’s start with the graphs from the previous post that demonstrate the problem with duplicating samples / nearest-neighbor interpolation, which is how the Sega CD’s PCM chip interpolates from individual channels’ sample rates to 32552 Hz (at least as far as I’m aware).

Here are the original 11 samples: Sine Wave SampledSamples of a sine wave

Here’s the nearest-neighbor upsampled version: Sine Wave Duplicated2x upsampling via nearest-neighbor interpolation

The last post stopped there and shifted to the topic of digital signal filtering. Let’s go back and continue this thread: what are some better ways to upsample an audio signal?

After nearest-neighbor interpolation, the next-simplest upsampling method is 2-point linear interpolation, i.e. averaging the two adjacent samples: Linear Interpolation2x upsampling via 2-point linear interpolation

This works significantly better than nearest-neighbor, but there are still some noticeable errors compared to the original sine waveform.

Next, I’m going to use the Hermite interpolator implementations from this paper: https://yehar.com/blog/wp-content/uploads/2009/08/deip.pdf

Let’s try doing 4-point cubic Hermite interpolation, applying this formula with x=0.5 (halfway between samples):

1
2
3
4
5
6
7
8
9
def interpolate_cubic_hermite_4p(samples, n, x):
    [ym1, y0, y1, y2] = samples[n-1:n+3]
    
    c0 = y0
    c1 = 1/2 * (y1 - ym1)
    c2 = ym1 - 5/2 * y0 + 2 * y1 - 1/2 * y2
    c3 = 1/2 * (y2 - ym1) + 3/2 * (y0 - y1)

    return ((c3 * x + c2) * x + c1) * x + c0

That produces this: 4-point Cubic Hermite Interpolation2x upsampling via 4-point cubic Hermite interpolation

That’s not bad! Not perfect, but a lot better than linear interpolation.

There’s also a 6-point version of cubic Hermite interpolation, which is this formula:

1
2
3
4
5
6
7
8
9
def interpolate_cubic_hermite_6p(samples, n, x):
    [ym2, ym1, y0, y1, y2, y3] = samples[n-2:n+4]

    c0 = y0
    c1 = 1/12 * (ym2 - y2) + 2/3 * (y1 - ym1)
    c2 = 5/4 * ym1 - 7/3 * y0 + 5/3 * y1 - 1/2 * y2 + 1/12 * y3 - 1/6 * ym2
    c3 = 1/12 * (ym2 - y3) + 7/12 * (y2 - ym1) + 4/3 * (y0 - y1)

    return ((c3 * x + c2) * x + c1) * x + c0

The result:

6-point Cubic Hermite Interpolation2x upsampling via 6-point cubic Hermite interpolation

That’s not a huge difference compared to 4-point cubic Hermite, but it is better!

For one more variant on Hermite interpolation, there’s a formula for 6-point 5th-order Hermite interpolation, i.e. 6-point quintic Hermite interpolation. This usually works a little better than 6-point cubic, though I’m not going to demonstrate it here because the difference is barely noticeable in this toy example of 2x upsampling samples of a sine wave. Even in real examples I often find it hard to notice an audible difference between 6-point cubic Hermite and 6-point quintic Hermite, though there’s definitely an audible difference between 4-point cubic and 6-point cubic.

Now, these Hermite interpolators are not perfect. A perfect resampler (or very close to perfect) will upsample by an integer factor N via zero stuffing, apply a low-pass filter, and then downsample by an integer factor M. However, this can be very computationally expensive, especially if N and M are very large and a steep low-pass transition is desired.

A less perfect but much more efficient solution is to allow the downsampling factor M to be a non-integer, and to apply some sort of interpolation from the upsampled/oversampled signal to the target frequency. This is the main topic of the paper I linked just above, and also essentially the core idea behind windowed sinc interpolation: oversample the source signal by a huge factor (e.g. 512x), apply an extremely high-order FIR low-pass filter, and then linearly interpolate between low-pass filtered samples. The algorithm is complex because it tries to avoid performing any intermediate calculations that are guaranteed not to affect the final result, and also because of how it handles decreasing the low-pass filter’s cutoff frequency when downsampling, but that’s the core idea: oversample, low-pass filter, and interpolate.

However, even this type of resampling can still be somewhat demanding in terms of performance, especially given that we’re talking about a sound chip with 8 different channels that can all play simultaneously at different sample rates. Nevermind something like the PS1 SPU with its 24 channels, or the PS2 SPU with a whopping 48 channels.

Polynomial Hermite interpolation is significantly less compute-intensive, even 6-point quintic interpolation, and in practice it works very well for upsampling without introducing too much audio aliasing. It doesn’t work quite as well for downsampling unless you apply a low-pass filter before interpolating, but for the concrete use case of Sega CD PCM this isn’t a big deal - games very rarely use sample rates higher than the chip’s output sample rate of 32552 Hz. Games need to deal with the chip only having 64 KB of PCM RAM, and additionally, playing at sample rates higher than 32552 Hz will produce noticeable audio aliasing even on actual hardware due to the chip effectively skipping samples.

Supporting Interpolation

In order to apply any form of interpolation to the emulated RF5C164, we’ll need to keep per-channel buffers of recent samples read from PCM RAM instead of only storing the current sample. For 6-point interpolation we’ll need the 6 most recent samples.

We originally had this:

1
2
3
4
struct PcmChannel {
    current_sample: u8,
    ...
}

Let’s replace it with this:

1
2
3
4
struct PcmChannel {
    sample_buffer: [u8; 6],
    ...
}

Here’s the original code for clocking a channel, copied verbatim from the previous post:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Current address is 16.11 fixed-point decimal
const INT_BITS: u32 = 16;
const FRACT_BITS: u32 = 11;
const TOTAL_BITS: u32 = INT_BITS + FRACT_BITS;
    
// 0xFF is a special sample value that means "loop end"
const LOOP_END: u8 = 0xFF;
    
impl PcmChannel {
    // 12.5 MHz / 384 clock
    fn clock(&mut self, pcm_ram: &[u8]) {
        // Check whether the channel is active
        if !self.enabled {
            // Current address is constantly reset while channels are disabled
            self.current_address = self.start_address << FRACT_BITS;
            self.current_sample = 0;
            return;
        }

        // Increment address
        self.current_address = (self.current_address + self.sample_rate) & ((1 << TOTAL_BITS) - 1);

        // Update current sample
        self.current_sample = pcm_ram[(self.current_address >> FRACT_BITS) as usize];
        if self.current_sample == LOOP_END {
            // Loop marker encountered; loop and immediately read the next sample
            self.current_address = self.loop_address << FRACT_BITS;
            self.current_sample = pcm_ram[self.loop_address as usize];
            if self.current_sample == LOOP_END {
                // Infinite loop; not sure what actual hardware does here, but let's play silence
                self.current_sample = 0;
            }
        }
    }
}

The big change needed is that instead of updating the current sample on every clock, the channel should push a new sample into the buffer every time it advances to a new PCM RAM address. If clocking the channel doesn’t cause it to reach a new PCM RAM address, it should do nothing except update the current address register.

Here’s an attempt at that:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
impl PcmChannel {
    fn clock(&mut self, pcm_ram: &[u8]) {
        if !self.enabled {
            self.current_address = self.start_address << FRACT_BITS;
            self.sample_buffer.fill(0);
            return;
        }

        // Increment address, but store in a new variable instead of updating in-place
        let mut new_address = (self.current_address + self.sample_rate) & ((1 << TOTAL_BITS) - 1);

        // Check if channel advanced to a new PCM RAM address
        if (new_address >> FRACT_BITS) != (self.current_address >> FRACT_BITS) {
            let sample = pcm_ram[(new_address >> FRACT_BITS) as usize];
            if sample == LOOP_END {
                // Loop
                new_address = self.loop_address << FRACT_BITS;
                let loop_sample = pcm_ram[self.loop_address as usize];
                self.push_sample(if loop_sample != LOOP_END { loop_sample } else { 0 });
            } else {
                self.push_sample(sample);
            }
        }

        self.current_address = new_address;
    }

    fn push_sample(&mut self, sample: u8) {
        for i in 0..5 {
            self.sample_buffer[i] = self.sample_buffer[i + 1];
        }
        self.sample_buffer[5] = sample;
    }
}    

Now, this isn’t quite a complete implementation. It won’t do the right thing for sample rates higher than 0x0800 / 32552 Hz because it will skip over samples instead of buffering the samples that actual hardware would skip. It’s also skipping the first sample after a channel changes from disabled to enabled. How to handle these cases is left as an exercise for the reader.

The actual interpolation can be applied here, or while mixing the channels. Either way, let’s stick this behind a new sample() method, and let’s make it return f64 instead of u8 so that the interpolation function won’t need to round to the nearest 8-bit integer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Convert a sign+magnitude RF5C164 sample to its actual PCM sample value in floating-point
fn sm_to_pcm(sm_sample: u8) -> f64 {
    let magnitude: f64 = (sm_sample & 0x7F).into();
    // Sign bit: 1=positive, 0=negative
    if sm_sample & 0x80 != 0 { magnitude } else { -magnitude }
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum PcmInterpolation {
    None,
    CubicHermite6Point,
}

impl PcmChannel {
    fn sample(&self, interpolation: PcmInterpolation) -> f64 {
        match interpolation {
            // For no interpolation, just return the most recent sample
            PcmInterpolation::None => sm_to_pcm(self.sample_buffer[5]),
            PcmInterpolation::CubicHermite6Point => self.sample_cubic_hermite_6p(),
        }
    }

    fn sample_cubic_hermite_6p(&self) -> f64 {
        todo!("cubic Hermite interpolation")
    }
}

The actual cubic Hermite interpolation is simply applying the formula up above after converting the 6 most recent samples to f64 PCM and determining what x value to use as the interpolation index.

Fortunately, we already have something we can use for x: the 11 fractional bits of the current address. This fraction represents the channel’s current distance between samples, which makes it perfect to use as the interpolation index x:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
// Actual hardware always produces samples between -127 and +126
// Clamp final result to this range
const MIN_SAMPLE: f64 = -127.0;
const MAX_SAMPLE: f64 = 126.0;

impl PcmChannel {
    fn sample_cubic_hermite_6p(&self) -> f64 {
        // Use current address fractional bits as the interpolation index
        let current_addr_fract_bits = self.current_address & ((1 << FRACT_BITS) - 1);
        let x = f64::from(current_addr_fract_bits) / f64::from(1 << FRACT_BITS);

        let last_6_samples = self.sample_buffer.map(sm_to_pcm);
        interpolate_cubic_hermite_6p(last_6_samples, x).clamp(MIN_SAMPLE, MAX_SAMPLE)
    }
}

fn interpolate_cubic_hermite_6p([ym2, ym1, y0, y1, y2, y3]: [f64; 6], x: f64) -> f64 {
    // Definition omitted
}

interpolate_cubic_hermite_6p() is just an implementation of the formula from up above.

The mixing/volume code will need to be updated to handle these floating-point samples. I apply the volume multipliers using floating-point math, round to the nearest integer, and then do the remaining mixing calculations using integer math. The exact details of how to do this are also left as an exercise for the reader.

That’s pretty much it, other than supporting other types of interpolation if desired. With this structure it only requires adding a new variant to the PcmInterpolation enum and then adding another match arm to the sample() method.

One thing to note is that 6-point interpolation does introduce an audio delay equal to 3 samples at the source frequency. However, this is well under a millisecond of latency even at the very low sample rate of 10000 Hz, so this is not noticeable in practice.

Time for a comparison!

Trying It Out

Yes, it’s the Lunar 2 boss theme again.

Lunar 2’s First Boss Fight

Here are four recordings: the original horribly aliased version, the 7973 Hz low-pass filtered version from the last post, an unfiltered version using 6-point cubic Hermite interpolation, and a version that combines 6-point cubic Hermite interpolation with the low-pass filter.

No interpolation, unfiltered
No interpolation, second-order 7973 Hz LPF
6-point cubic Hermite interpolation, unfiltered
6-point cubic Hermite interpolation, second-order 7973 Hz LPF

Hmm…the unfiltered cubic-interpolated version does sound a little clearer than the filtered non-interpolated version, maybe? It’s honestly hard to tell a difference. Was this all a waste of time?

High Sample Rate Voice Acting

Let’s try another game. Here’s Popful Mail.

Popful MailWhy yes, this is a Working Designs localization

During this scene it’s playing its voice acting through the PCM chip with a sample rate of 0x0700. In 5.11 fixed-point decimal, 0x0700 represents 7/8, which given a 32552 Hz base rate is an effective sample rate of about 28483 Hz. That’s pretty high by Sega CD standards!

It’s also playing music using the YM2612 and other PCM chip channels during this scene. I’m going to silence those in order to isolate the voice acting.

Here’s the raw audio with no interpolation or filtering:

No interpolation, unfiltered

That does not sound good, but that was expected.

Here’s with the low-pass filter:

No interpolation, second-order 7973 Hz LPF

That sounds…a little better, but still very noisy and aliased. Definitely not as radical of an improvement as it was for Lunar 2’s music.

Alright, let’s try cubic interpolation:

6-point cubic Hermite interpolation, unfiltered

That sounds significantly better to me! There’s still noticeable audio noise, but well, 8-bit sample quantization will do that.

For completeness, here’s cubic interpolation combined with the low-pass filter:

6-point cubic Hermite interpolation, second-order 7973 Hz LPF

Ehh…it cleans up some of the audio noise, but that comes at the cost of making the voice acting less clear. I think I prefer the unfiltered + cubic-interpolated version.

Anyway, as this example shows, there are definitely cases where interpolation produces much better results than simply applying a low-pass filter! Lunar 2 happens to be a particularly good case for the low-pass filter because it usually plays at an exact integer divisor of the PCM chip’s native sample rate.

Not-As-High Sample Rate Voice Acting

I want to revisit one of the comparisons from the previous post: voice acting from Night Trap. Night Trap

I didn’t mention this in the last post, but Night Trap plays its cutscene audio with a sample rate of 0x04A5, which represents 1189/2048 (effective sample rate ~18899 Hz). This is definitely not an integer divisor of the PCM chip’s native sample rate.

Here are the two recordings from the last post again:

No interpolation, unfiltered
No interpolation, second-order 7973 Hz LPF

Now here’s a cubic-interpolated version:

6-point cubic Hermite interpolation, unfiltered

This isn’t as drastic a difference as the Popful Mail example, but I do think the cubic-interpolated version sounds noticeably less aliased than the low-pass filtered version.

Here’s the interpolated + filtered version:

6-point cubic Hermite interpolation, second-order 7973 Hz LPF

I think this one actually sounds the best here. It cleans up more noise than simply interpolating, and unlike the Popful Mail example it doesn’t make the voice acting much less clear. This is kind of expected since the source sample rate is much lower.

Low Sample Rate Music

For one last example, here’s the intro music from Time Gal, another FMV game.

Time Gal Intro

That looks bad in a screenshot, and it looks just as bad in motion where it’s animating at an average of about 6 FPS.

Anyway. This game does some weird things with the PCM chip, but the sample rate here is about 0x0310, which represents 49/128. This is an effective sample rate of ~12461 Hz, quite low.

I’m applying some negative gain in all four of these recordings because this game’s audio is very loud.

No interpolation, unfiltered
No interpolation, second-order 7973 Hz LPF
6-point cubic Hermite interpolation, unfiltered
6-point cubic Hermite interpolation, second-order 7973 Hz LPF

The unfiltered + non-interpolated version sounds bad, unsurprisingly, but the other three are interesting to compare. Cubic interpolation removes most of the resampling-induced audio aliasing, and that makes the music sound pretty different.

The unfiltered + interpolated version has some really noticeable audio noise that seems to be caused by clipping - the final mixed PCM chip sample often exceeds the range of signed 16-bit. This happens even without interpolating, but it’s not really noticeable in the non-interpolated version because it blends in with all the other noise and aliasing. The low-pass filter does a decent job at removing most of this noise.

End

In the context of Sega CD PCM, cubic or quintic Hermite interpolation doesn’t always produce a better result than a 7.97 KHz low-pass filter, but when it’s better it can be significantly better.

Like with SNES and PS1, it can change the sound enough in some cases that it’s probably not a reasonable default, but I think it’s a nice optional audio enhancement to support.

Useful References

updatedupdated2025-02-142025-02-14