PlayStation: The SPU, Part 4 - Everything Else

This is the fourth and final post in a series on the PlayStation SPU which will cover the remaining major features that were not covered in previous posts. This includes the noise generator (pseudorandom white noise), pitch modulation (dynamic pitch adjustment using another voice’s output), SPU IRQs (trigger IRQ when a specific sound RAM address is accessed), and the capture buffers (record recent samples from CD audio and two specific voices).

Noise Generation

The SPU has a builtin noise generator that can generate pseudorandom white noise. The noise generator cannot directly mix into the audio output - instead, software can configure any of the 24 voices to play the current noise generator output instead of playing decoded ADPCM samples. The voice’s ADSR envelope volume and voice L/R volumes are applied to the noise generator sample just as they would be applied to an ADPCM sample.

Here’s an example of a sound effect that is broken if you don’t emulate the noise generator:

This is what that’s supposed to sound like:

The following two registers are used to configure which voices play noise generator output instead of ADPCM samples:

  • $1F801D94: Noise output enabled for voices 0-15
  • $1F801D96: Noise output enabled for voices 16-23

The noise generator itself is configured using bits in the SPUCNT register. Bits 8-9 configure the step (fine frequency) and bits 10-13 configure the shift (coarse frequency).

Internally the noise generator has a 16-bit linear-feedback shift register, or LFSR, which is a pretty typical way to implement pseudorandom noise without using a hardware random number generator (which the PS1 does not have). Each time the noise generator clocks, the LFSR is shifted left by 1 and a parity bit is shifted into the lowest bit. For voices that have noise output enabled, the current 16-bit LFSR value is reinterpreted as a signed 16-bit audio sample and used as the voice output.

The PS1 SPU specifically has a Galois LFSR which XORs multiple specific bits in the LFSR when computing the parity bit. This implementation XORs bits 10, 11, 12, and 15 and then inverts the result.

Implementing this in code is very straightforward:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
struct NoiseGenerator {
    lfsr: u16,
}

impl NoiseGenerator {
    fn clock_lfsr(&mut self) {
        // XOR bits 10, 11, 12, and 15, and then XOR with 1 to invert the result
        let parity = ((self.lfsr >> 15) & 1)
            ^ ((self.lfsr >> 12) & 1)
            ^ ((self.lfsr >> 11) & 1)
            ^ ((self.lfsr >> 10) & 1)
            ^ 1;

        self.lfsr = (self.lfsr << 1) | parity;
    }
}

Noise frequency can be emulated using a downcounter timer. The timer is initialized to 0x20000 >> noise_shift, and on each SPU clock the timer is decremented by the current noise step value (between 4 and 7). Whenever the timer reaches 0, the LFSR clocks and the timer is reset.

As with the ADSR envelope cycle counter, applying the psx-spx pseudocode as-is will work for some things but not everything. In particular, some games seem to expect changes to noise shift to take effect roughly immediately. Final Fantasy VII has some broken sound effects if you don’t do this.

I’m not sure this is correct, but you can fix this by simply recomputing the current timer value whenever the noise frequency is changed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
struct NoiseGenerator {
    lfsr: u16,
    step: u8,
    shift: u8,
    timer: i32,
}

impl NoiseGenerator {
    // 44100 Hz SPU clock
    fn clock(&mut self) {
        // Configured 2-bit step value of N represents a decrement of (N + 4)
        self.timer -= i32::from(self.step + 4);
        if self.timer >= 0 {
            // LFSR does not clock this cycle
            return;
        }

        self.clock_lfsr();

        // Reset timer
        // Might require 2 increments if step and shift are both the maximum possible values
        while self.timer < 0 {
            self.timer += 0x20000 >> self.shift;
        }
    }

    // Called on SPUCNT writes
    fn update_frequency(&mut self, step: u8, shift: u8) {
        if shift != self.shift {
            // Coarse frequency changed; reset timer
            self.timer = 0x20000 >> shift;
        }

        self.step = step;
        self.shift = shift;
    }
}

Now, just implementing noise generation by itself isn’t actually enough to fix the FF7 sound effect above. That’s because FF7 is also using pitch modulation here.

Pitch Modulation

The SPU’s pitch modulation feature allows voices to dynamically adjust their pitches based on the previous voice’s output. It’s similar to the frequency modulation that you see in FM sound chips like the YM2612 in the Sega Genesis, although much more limited (and also much much simpler to emulate).

Final Fantasy VII combines noise generation with pitch modulation for Cloud’s sword sound effect. It enables pitch modulation on voice 21 while enabling noise output on voice 20, which causes voice 21 to pitch modulate using the noise generator output. Interestingly, it doesn’t mute voice 20 while doing this, so the sound effect is the result of mixing the noise generator output with ADPCM playback that is modulated by the same noise output.

If pitch modulation is enabled for a voice, it always uses the previous voice’s output for pitch modulation (e.g. voice 21 always pitch modulates using voice 20’s output). Pitch modulation cannot be enabled for voice 0 because it has no previous voice.

Pitch modulation is configured using two registers:

  • $1F801D90: Pitch modulation enabled for voices 1-15 (not voice 0)
  • $1F801D92: Pitch modulation enabled for voices 16-23

Pitch modulation works by simply adjusting the pitch counter increment using the previous voice’s sample from after ADSR envelope volume is applied but before voice L/R volume is applied. This fact is useful when a voice’s output should be used only for pitch modulation - software can set the voice L/R volumes to 0, which will mute the voice output but still allow it to be used for pitch modulation.

Concretely, the previous voice output is converted from the i16 range (-0x8000 to 0x7FFF) to the u16 range (0x0000 to 0xFFFF) and then used as a multiplier where the value N represents N/0x8000. This makes the range of effective multipliers 0 to ~1.99997 (basically 2).

Ignoring an edge case that triggers when the voice’s sample rate is higher than 0x7FFF, pitch modulation is very simple to emulate. Referencing the psx-spx pseudocode and using the Voice struct from the previous post:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
struct Voice {
    sample_rate: u16,
    pitch_modulation_enabled: bool,
    ...
}

impl Voice {
    fn clock(&mut self, sound_ram: &[u8], previous_voice_output: i16) {
        let mut pitch_counter_step = self.sample_rate;
        if self.pitch_modulation_enabled {
            // Convert previous voice output from i16 range to u16 range
            let multiplier = i32::from(previous_voice_output) + 0x8000;

            // Apply previous voice output as a multiplier to step, N/0x8000
            let adjusted_step = (i32::from(pitch_counter_step) * multiplier) >> 15;
            pitch_counter_step = adjusted_step as u16;
        }

        // Even with pitch modulation, increment cannot exceed 0x4000
        pitch_counter_step = cmp::min(0x4000, pitch_counter_step);

        ...
    }
}

The aforementioned edge case is that the voice’s sample rate is erroneously sign extended from 16 bits to 32 bits when performing the multiplication, and then the value is truncated to the lowest 16 bits after the multiplication. Games shouldn’t trigger this behavior because there’s no reason to set the sample rate higher than 0x4000, but it’s trivial to handle by casting the sample rate to i16 before converting to i32.

Emulating both noise generation and pitch modulation fixes the FInal Fantasy VII sound effect.

SPU IRQs

The SPU has an interrupt feature where it can trigger an interrupt whenever a specific sound RAM address is accessed. SPU IRQs can be triggered by ADPCM decoding, reverb processing, capture buffer writes (next section), or even SPU data port accesses.

Many games use SPU IRQs to sync animations to voice acting, usually by putting the IRQ address in the capture buffer range ($00000-$01000). Some games use SPU IRQs for audio timing.

Since SPU IRQs are always triggered by sound RAM memory accesses, I implemented SPU IRQs by (ab)using operator overloading to make the IRQ trigger if SPU IRQs are enabled and any part of the SPU accesses the configured IRQ address, roughly like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
struct SoundRam {
    ram: Box<[u8]>,
    irq_enabled: bool,
    irq_address: usize,
    // Must use Cell so that the value can be mutated through a shared reference
    irq: Cell<bool>,
}

impl Index<usize> for SoundRam {
    type Output = u8;

    fn index(&self, index: usize) -> &Self::Output {
        if self.irq_enabled && index == self.irq_address {
            self.irq.set(true);
        }

        &self.ram[index]
    }
}

impl IndexMut<usize> for SoundRam {
    fn index_mut(&mut self, index: usize) -> &mut Self::Output {
        if self.irq_enabled && index == self.irq_address {
            self.irq.set(true);
        }

        &mut self.ram[index]
    }
}

This is a bit of a hack, but it seems to work pretty well! All of the SPU code can access sound RAM as if it’s a regular boxed slice, and the IRQ flag will automatically get set if anything accesses the configured IRQ address (and IRQs are enabled).

The IRQ9 bit in I_STAT is edge-triggered, so on each SPU clock I do basically the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
struct Spu {
    last_irq_line: bool,
    sound_ram: SoundRam,
    ...
}

impl Spu {
    fn clock(&mut self) {
        ...

        let irq_line = self.sound_ram.irq.get();
        if !self.last_irq_line && irq_line {
            // Set IRQ9 bit in I_STAT here
        }
        self.last_irq_line = irq_line;
    }
}

Games acknowledge SPU IRQs by disabling them in the SPUCNT register, which should immediately clear the internal SPU IRQ flag (but not IRQ9, that needs to be acknowledged in I_STAT as with any other interrupt).

One thing to note is that keyed off voices can still trigger SPU IRQs by decoding ADPCM samples at the IRQ address even though the voice is muted. Per DuckStation’s difficult-to-emulate wiki page, there is at least one game that depends on this, which unfortunately means that keyed off voices with volume=0 still need to be clocked even though their audio output will always be 0. (Unfortunate because of the performance implications.)

Capture Buffers

The SPU has four 1KB capture buffers that are stored in the first 4KB of sound RAM. Two of them are for CD audio samples (L and R) and the other two are for voices 1 and 3:

  • $00000-$003FF: CD L capture buffer
  • $00400-$007FF: CD R capture buffer
  • $00800-$00BFF: Voice 1 capture buffer
  • $00C00-$00FFF: Voice 3 capture buffer

The captured CD audio samples are from before CD volume is applied, and the captured voice samples are from after ADSR envelope volume is applied but before voice L/R volumes are applied.

Each capture buffer is circular - after the SPU reaches the end, it wraps back around to the start and begins overwriting previously captured samples. Captured samples are signed 16-bit so each capture buffer holds 512 samples.

One sample is written to each of the four capture buffers on every 44100 Hz SPU clock. If no CD audio is playing then zeroes are written to the CD capture buffers.

This is quite simple to emulate using a 10-bit capture buffer pointer that increments by 2 after writing to the four buffers, wrapping around from $3FF to $000. There is a bit in SPUSTAT that indicates whether the SPU is currently writing to the first halves of the capture buffers or the second halves; this is easy to emulate by simply checking whether or not the current capture buffer address is less than $200.

And, well, that’s pretty much all there is to it! Games typically use the capture buffers in combination with SPU IRQs for lip sync and other animations that need to be synchronized to audio playback. Crash Team Racing is one example of a game that does this.

Finally Over

And that’s it for this tour of the PlayStation SPU. There are some quirks that I didn’t cover (mainly because I discovered them only recently), such as a repeat/loop address quirk that Valkyrie Profile depends on: if a game ever writes to a voice’s repeat address register, that completely disables ADPCM loop start flags from having any effect until the voice is keyed on again. The Misadventures of Tron Bonne also depends on this quirk or it will freeze during cutscenes.

Not sure what I’ll cover next! For the last few weeks I’ve been working on a hardware rasterizer which is finally starting to come together, although it doesn’t yet support all of the enhancements that people expect from modern PS1 emulators (e.g. sub-pixel vertex precision to eliminate model wobble in most games). I don’t think I’m ready to write about that yet though.

updatedupdated2024-05-282024-05-28