This is the third part in a series of posts on the PlayStation SPU (Sound Processing Unit). This post will focus on the SPU’s reverb feature, which can simulate echoes or reverberations. In short, it’s a much more advanced version of the SNES APU’s echo filter.
Reverb
In audio, reverb refers to how sound waves reverberate and reflect after the initial sound. Echoes are a specific type of reverb, for example.
In real life, the physical environment determines how sound waves reverberate. For example, inside a house, carpet tends to absorb sounds while wood floors tend to cause a slight echo as sound waves bounce off the floors. (Though most homes have furniture, plants, etc. to help prevent sounds from bouncing all over the place.)
In digital audio, post-processing can apply reverb effects that simulate different physical environments: being in an open field, being in a large building where sounds bounce off the walls, being underwater, etc. The basic idea is to create these different atmospheres by applying different filters to the original audio and then playing the filtered audio after a short delay.
At a very high level, the PS1 SPU’s reverb feature works by recording audio samples into fixed-size FIFO buffers, applying a bunch of audio filters, and then playing the filtered samples as they’re popped out of the buffers. The buffer sizes determine the delay amount. As a sort of implementation detail, the individual buffers are all stored in a single circular buffer called the reverb buffer, or “reverb work area” in some documentation.
How does the SPU fit all of these FIFO buffers into a single circular buffer? The answer is actually pretty simple - all FIFO buffer addresses are relative to the current circular buffer head rather than being absolute addresses, and all FIFO buffers wrap within the circular buffer. Advancing the circular buffer head by 1 sample automatically moves every FIFO buffer forward by 1 sample, effectively popping off the last sample in each FIFO buffer.
The circular reverb buffer must always be located at the end of sound RAM. Software can configure the reverb buffer start address, but the buffer always extends from the configured start address to the end of sound RAM. This was possibly to make things easier for developers in comparison to the SNES APU’s echo filter, where it was all too easy to accidentally configure the filter such that it would start overwriting ADPCM samples - or even the audio driver!
There is a bit in the SPUCNT register ($1F801DAA) that controls whether reverb writes are enabled, but note that clearing this bit does not completely disable the reverb unit - it only disables the writes to sound RAM that would normally occur as part of reverb processing. Software must explicitly set reverb output volume to 0 in order to mute reverb output.
Input
There are four registers that control reverb input:
- $1F801D98: Reverb enabled for voices 0-15
- $1F801D9A: Reverb enabled for voices 16-23
- $1F801DFC: Reverb input volume left
- $1F801DFE: Reverb input volume right
There is also a bit in the SPUCNT register that controls whether reverb is enabled for CD audio.
Software can control which of the 24 voices are used as input to the reverb unit, which mixes its own input samples by summing the L/R samples from every voice that is enabled in $1F801D98 or $1F801D9A. Note that the reverb unit uses the voice samples from after the voice L/R volumes are applied but before the main L/R volumes are applied.
Input volume can be used to decrease the volume of samples as they enter the reverb unit. All reverb volumes are signed 16-bit integers that represent N/0x8000, same as the other volume multipliers in the SPU. However, input volume is probably not applied until after the reverb input is resampled from 44100 Hz to 22050 Hz.
Timing and Resampling
Where the rest of the SPU clocks at 44100 Hz, the reverb unit clocks at exactly half that rate: 22050 Hz. Or rather, it clocks at 44100 Hz, but each clock alternates between processing the left audio channel and processing the right audio channel. This is presumably because the reverb unit does too much work per sample to be able to process both the L and R channels at 44100 Hz.
The reverb unit needs to do something to resample the input signals from 44100 Hz to 22050 Hz or else it will effectively miss every other input sample. Thankfully, this has been reverse engineered, and we know that it performs this resampling using a 39-tap FIR filter.
FIR stands for finite impulse response, as opposed to an IIR filter (infinite impulse response). An N-tap FIR filter is an array of N coefficients that is applied to the most recent N samples in the input signal.
This is pretty easy to implement using a double-ended queue, which most languages provide in their standard library (e.g. VecDeque
in Rust or ArrayDeque
in Java):
|
|
There should be separate deques for the L and R input signals, and the reverb unit should push into both of them on each 44100 Hz clock. It should apply the FIR filter when it needs an input sample for reverb processing, which alternates between L processing and R processing on each 44100 Hz clock (or you could do both L and R together on every 22050 Hz clock - the difference is not significant).
Reverb input volume is presumably applied after applying the FIR filter so that changes to input volume will take effect roughly immediately instead of needing to wait for 39 new samples to flush out the deque.
So Many Filters
Now to get into the actual reverb processing.
The reverb unit has 3 types of filters:
- Reflection filters: Simulate the effect of sound waves repeatedly reflecting off surfaces
- Comb filters: Simulate the same sound wave arriving at different times with a very short delay (e.g. as if it reflected off a nearby hard surface)
- All-pass filters: Apply a frequency-dependent delay
All together, there are 10 filters:
- Same-side reflection left
- Same-side reflection right
- Different-side reflection left-to-right
- Different-side reflection right-to-left
- Comb filter left
- Comb filter right
- All-pass filter 1 left
- All-pass filter 1 right
- All-pass filter 2 left
- All-pass filter 2 right
The comb filter specifically is highly configurable in terms of what it takes as input, but reverb processing typically goes like this:
|
|
The filters are configured using a ton of registers in the $1F801DC0-$1F801DFF range. I will link to psx-spx rather than list them all here, and I’ll use the psx-spx names for the various registers going forward.
Note that all reverb samples in sound RAM are signed 16-bit integers, and any occurrence of ram[addr]
in these examples represents a 16-bit memory access. Additionally, all values are saturated to signed 16-bit before being written to RAM (and possibly also during some of the intermediate calculations - unclear to me).
The current reverb buffer address is automatically incremented by 2 (one 16-bit sample) after processing both L and R samples. It wraps back around to the start of the reverb buffer after reaching the end of sound RAM.
Example: The BIOS
Let’s look at how the BIOS configures the reverb buffer.
It sets the reverb buffer start address to $70940, so the reverb buffer is located at $70940-$7FFFF in sound RAM.
It then sets the following addresses for each individual buffer:
Buffer | Relative Address | Size |
---|---|---|
Same-side reflection L | $0D190-$0F6B0 | 4756 samples |
Same-side reflection R | $0AF78-$0D188 | 4364 samples |
Different-side reflection L | $082B0-$0AF70 | 5732 samples |
Different-side reflection R | $05708-$082A8 | 5588 samples |
All-pass 1 L | $03D18-$05700 | 3320 samples |
All-pass 1 R | $02328-$03D10 | 3320 samples |
All-pass 2 L | $01198-$02320 | 2248 samples |
All-pass 2 R | $00008-$01190 | 2248 samples |
Note that none of the buffers overlap. The reverb buffer’s total size in hex is $0F6C0, so there is enough room to fit all of the buffers without accidental overlap from wrapping. The L buffers are slightly larger than the R buffers which means that the left audio channel will have a slightly longer echo delay than the right audio channel.
It sets the following addresses for the comb filter inputs:
Input | Relative Address | Note |
---|---|---|
Comb L sample 1 | $0E8A0 | Same-side reflection L buffer |
Comb L sample 2 | $0DE10 | Same-side reflection L buffer |
Comb L sample 3 | $099A0 | Different-side reflection L buffer |
Comb L sample 4 | $08FB0 | Different-side reflection L buffer |
Comb R sample 1 | $0C1D8 | Same-side reflection R buffer |
Comb R sample 2 | $0B590 | Same-side reflection R buffer |
Comb R sample 3 | $07968 | Different-side reflection R buffer |
Comb R sample 4 | $062E8 | Different-side reflection R buffer |
As we can see, both the L and R comb filters are taking two of their inputs from the same-side reflection buffers and the other two from the different-side reflection buffers. The two addresses within each buffer are different to create the comb filter effect, where each sample is played twice with a very short delay in between (and possibly with different volume multipliers).
Reflection
The reflection filters take the reverb input sample directly, immediately after applying the resampling FIR filter and reverb input volume.
All of the reflection filters use the same formula but with different address parameters:
|
|
m_addr
here is the head of the corresponding FIFO buffer and d_addr
is the tail. These are the registers for each filter:
m_addr | d_addr | |
---|---|---|
Same-side L | mLSAME | dLSAME |
Same-side R | mRSAME | dRSAME |
Different-side L-to-R | mRDIFF | dLDIFF |
Different-side R-to-L | mLDIFF | dRDIFF |
No, those are not typos in the different-side rows. That’s how different-side reflection works - samples bounce back and forth between the two different-side reflection buffers.
There are effectively three parameters that go into this formula:
- (m_addr - d_addr): The buffer size
- vWALL: Reflection feedback volume (for feeding back samples as they’re popped out of the FIFO buffer)
- vIIR: IIR coefficient for blending the new sample with the previous sample
It’s pretty obvious what vIIR is doing if you rewrite the formula like this:
|
|
Comb
The comb filters simulate hearing an audio wave twice with a very short delay in between. Each comb filter has four input samples, usually two from the corresponding same-side reflection buffer and two from the corresponding different-side reflection buffer. The two samples from each buffer are usually taken from slightly different positions for the comb effect.
The comb filter has the simplest formula:
|
|
The m_comb
addresses are mLCOMB1-4 for the L filter and mRCOMB1-4 for the R filter. Typically 1-2 are inside the same-side reflection buffer and 3-4 are inside the different-side reflection buffer.
The vCOMB1-4 values give each comb input sample its own volume multiplier, and the volume-adjusted input samples are simply added together.
The comb filter output is not written to sound RAM; instead, it’s used as input to the first all-pass filter.
All-Pass
The reverb unit has two all-pass filters that are applied in sequence. The first APF uses the comb output as its input sample, and the second APF uses the first APF output as its input sample.
Both filters use the same formula with different parameters:
|
|
This is a pretty standard implementation of a first-order IIR all-pass filter. I will link to another page that describes what this formula does much better than I can:
The one-sentence version is that an all-pass filter applies a fixed volume multiplier while introducing a frequency-dependent delay.
m_apf
is the corresponding buffer head address, d_apf
is the buffer size (for some reason the all-pass filters set a size instead of setting a tail address), and vAPF
is the per-filter volume coefficient.
The second all-pass filter output is the (nearly) final output sample of reverb processing.
Output
Before the SPU can play the reverb unit’s output, it needs to be resampled from 22050 Hz back to 44100 Hz.
The reverb unit apparently does this using the same 39-tap FIR filter that it uses to resample the input from 44100 Hz to 22050 Hz, which is rather strange. Normally you would use different filters for downsampling and upsampling. At any rate, it certainly makes no sense to apply a filter designed for 44100 Hz input to a 22050 Hz input, so the SPU must be upsampling from 22050 Hz to 44100 Hz before it applies the filter.
44100 Hz is exactly two times 22050 Hz, and the standard way to upsample by an integer factor is to stuff the input signal with zeroes. Insert a zero sample after each 22050 Hz input sample, apply the filter to the last N samples in the zero-stuffed signal, and then multiply by 2 at the end to account for half of the samples being synthesized zeroes. (Duplicating each input sample and not multiplying by 2 at the end can also work, but may introduce audio artifacts.)
Once the 22050 Hz output signal is upsampled to 44100 Hz, the FIR filter can be applied in exactly the same way that it was applied to input. Similar to how input volume is probably applied after the FIR filter, reverb output volume is probably also applied after the FIR filter so that volume changes will take effect immediately.
After the FIR filter and reverb output volume are applied, the reverb L and R samples are mixed in with the L and R samples from summing the 24 voice outputs. The mixer just adds them together.
One thing that’s not clear to me is whether main volume applies to reverb output. In the SNES APU, main volume does not apply to echo filter output - only to the voice outputs. PS1 SPU documentation doesn’t specify when exactly main volume is applied. In practice it doesn’t really matter because games generally set main volume to the maximum possible value and leave it there.
Finally, Results
For comparison, the BIOS startup sound without reverb (from the last post):
Now, with all reverb functionality implemented:
Much better!
For another example, here’s Valkyrie Profile’s normal battle theme without reverb, which sounds just a bit wrong:
With reverb, much better:
One More
I said this was going to be the last SPU post, but I think it makes sense to split the remaining SPU features into a separate post that will cover noise generation, pitch modulation, SPU IRQs, and the capture buffers.