Sega 32X

Sega released two hardware add-ons for the Genesis in an attempt to expand its hardware capabilities to enable more advanced gaming experiences. The first of these was the Sega CD, obviously a response to the PC Engine CD (which was fairly successful in Japan despite very poor sales in the rest of the world). The second was the 32X, a misguided attempt to bring the Genesis closer to Sega’s upcoming Saturn console. The 32X was a colossal failure commercially and Sega discontinued it after about a year, but it’s a pretty interesting piece of hardware technically and historically.

A Commercial Failure

The Sega CD is well known to have underperformed commercially, but it wasn’t a colossal failure. It ultimately sold more than 2 million units and got just over 200 game releases during its ~4 year lifespan. It was only discontinued after developers moved on to the Saturn/PS1/N64 generation of consoles.

The 32X fared much worse in comparison. It only sold around 600,000 units, only got 40 game releases (6 of which are technically Sega CD games that use the 32X hardware), and Sega discontinued it just over a year after launch. This all despite being about half the price of the Sega CD, $160 vs. $300.

Probably the biggest challenge for 32X was that it released in 1994, when everyone knew that Sega’s next console was coming soon. Why would consumers or developers bother with another Genesis add-on when the Genesis as a whole will probably be obsolete within a year? Sega tried to entice developers by claiming that they could share code between 32X and Saturn versions of their games because the consoles would have the same CPUs, but that wasn’t very true in practice. They do have the same CPUs, but pretty much everything else is different between 32X and Saturn. In particular, the Saturn has hardware-accelerated graphics rendering while the 32X requires games to perform all rendering in software running on the 32X CPUs.

The add-on’s fate was pretty much sealed after its poor launch sales, and Sega quickly abandoned it to focus on the Saturn.

Overview

So, what hardware is in the 32X?

At a very high level, the major components are:

  • A pair of Hitachi SH-2 CPUs clocked at ~23.01 MHz, each with its own 4KB cache
  • A new framebuffer-based VDP that displays pixels from a frame buffer populated by the CPUs
  • A 2-channel PWM sound chip that can play high-quality audio samples (with a CPU cost relative to quality)
  • 256KB of SDRAM shared between the two new CPUs

The 32X uses cartridges with ROM chips, just like the standalone Genesis. The 32X unit itself plugs into the Genesis cartridge slot and has its own cartridge slot on top. You can actually play regular Genesis games plugged into the 32X cartridge slot, although they won’t be enhanced in any way - it’s just a convenience thing so people could leave the 32X plugged in if they wanted to.

Double the CPUs, Not Double the Power

The add-on’s name comes from its pair of 32-bit CPUs, two SH-2 units from Hitachi’s then-new SuperH line. These are the exact same CPUs that were later used as the Saturn’s main CPUs, although the 32X clocks them about 20% slower than the Saturn does (~23.01 MHz vs. ~28.63 MHz).

SH-2 is a RISC CPU, much like the MIPS CPUs that Sega’s competition used in the PlayStation (R3000 / MIPS I) and the Nintendo 64 (R4300i / MIPS III). Like MIPS, SuperH has a load-store architecture, general-purpose registers, and a 5-stage instruction pipeline that enables most instructions to execute in a single cycle from software’s perspective (not including delays from pipeline stalls, e.g. memory accesses).

In fact, it’s possible for the SH-2 to execute more than one instruction per cycle! SH-2 has a 32-bit data bus, but SH-2 opcodes are only 16-bit, so it can fetch two instructions in a single cycle and in some cases execute them simultaneously. Memory reads and writes are the main instructions that prevent simultaneous execution.

SH-2 is not as “pure” of a RISC CPU as MIPS. The load and store instructions support a variety of different addressing modes, where MIPS load/store instructions only support indirect register with displacement addressing. SH-2 also has a handful of logical instructions that can operate directly on values in memory, notably the bitwise AND/OR/XOR/TST instructions.

The 32X and Saturn specifically use the SH7604 package which includes the SH-2 CPU along with a number of other hardware units: a 2-channel DMA controller, a division unit (signed 64-bit ÷ 32-bit division), two hardware timers, a serial communication interface, and 4KB of very fast internal RAM. Each CPU has its own set of these.

The 4KB internal RAM can be used in one of two modes:

  • 4KB CPU cache
  • 2KB CPU cache + 2KB fast RAM

The part of internal RAM used as CPU cache is a set-associative cache that holds both instructions and data, using a pseudo-LRU algorithm for cache eviction. As CPU clock speeds got into the double-digit MHz, RAM speeds weren’t able to keep up (at least not at larger RAM sizes), so a CPU cache is essential for good performance in hot loops. The SH-2 would easily run 5-10x slower if it had to read directly from RAM for every opcode fetch and load instruction.

The two SH-2s share the bus in a master-slave configuration supported by hardware. They normally operate entirely independently, but if they ever access the bus simultaneously then the master SH-2’s access gets priority and the slave SH-2 stalls until the master’s access completes. Because of this, having two CPUs does not lead to 2x the performance of a single CPU; SDRAM in particular tends to be a bottleneck when trying to do significant work on both CPUs simultaneously.

The CPUs can avoid accessing the bus if they’re working with memory entirely within their respective 4KB caches, but this also presents a programming challenge. Since each CPU has its own cache, developers needed to take care to avoid data coherency issues between the two CPUs. …Or they could sidestep the problem and only use the slave SH-2 as a dedicated audio processor, as most 32X games did.

The two CPUs can communicate with each other using the shared SDRAM, the SH7604 serial interface (a full duplex 8-bit interface between the two CPUs), or a set of communication ports built into the 32X (which the Genesis 68000 also has access to). It’s preferred to use the communication ports - the serial interface is incredibly slow in comparison, and using SDRAM to communicate will slow down the slave CPU due to bus contention.

Pure Software Rendering

The 32X VDP is very simple. It simply displays pixels from a frame buffer, with support for a few different pixel formats. All of the actual graphics rendering occurs in software running on the SH-2 CPUs - the VDP has no specialized hardware for drawing or rendering graphics.

This right here is the main reason that it was not at all practical for developers to share code between 32X and Saturn. The Saturn has specialized hardware for rendering both 2D and 3D graphics, while on 32X developers were required to implement all forms of rendering in software. All 3D graphics that you see in 32X games were drawn manually by the CPUs into the VDP’s frame buffer.

To be clear, the SH-2s are fast enough to do real-time 3D software rendering! The graphics are primitive even compared to Saturn/PS1, and 60 FPS is not realistic, but the SH-2s can render primitive 3D graphics in software at 20-30 FPS.

Virtua Fighter

Virtua Fighter does look a bit better in motion, but…yeah. To be fair, it almost certainly looked better on contemporary TVs which often blurred the video signal a bit.

The VDP has 256KB of frame buffer RAM, which is split into two separate 128KB frame buffers in order to enable a technique called double buffering. Games can prepare the next frame in one frame buffer (the “back buffer”) while the VDP is actively displaying pixels from the other frame buffer (the “front buffer”). There is no way for software to directly access the front buffer, but games can swap the two frame buffers during vertical blanking or while the 32X VDP is off.

One of the selling points of the 32X was that it supported 15-bit RGB color (32,768 colors), a huge improvement over the Genesis VDP’s 9-bit RGB color (512 colors not including shadowing/highlighting). In the most commonly used modes, the 32X can display up to 256 colors onscreen simultaneously (same as SNES), and it has 512 bytes of dedicated palette RAM to store these color values.

The VDP supports three different frame buffer pixel formats, which are:

  • Packed pixel mode (256-color): 8 bits per pixel
    • Pixel values are indices into palette RAM
    • This is by far the most commonly used mode
  • Direct color mode (32768-color): 16 bits per pixel
    • Pixel values are raw color values
    • The VDP’s 128KB frame buffers aren’t large enough to fit an entire frame in this mode, which would require 140KB; some lines must be duplicated
  • Run length mode (256-color + compressed): 16 bits per run
    • “Pixel” values contain an 8-bit index into palette RAM and an 8-bit run length (1-256); the VDP will display that many pixels of a single color before reading the next run from the frame buffer

One thing to note with the frame buffers is that lines do not necessarily have to be laid out in the order that they’re displayed. The first 512 bytes of the frame buffer contain something called a line table that the VDP uses to determine where each line’s data is located. The first two bytes contain the address of the first line’s pixels, the next two bytes contain the address of the second line’s pixels, and so on. This gives games some freedom over frame buffer layout, and this also makes it possible to duplicate lines without storing the pixel data twice (essential in direct color mode).

The VDP does have the ability to composite pixels between 32X frames and Genesis VDP output. Frame buffer pixel colors are 16-bit values, not 15-bit, and the VDP uses the 16th bit as an alpha bit that determines priority between the 32X VDP pixel and the Genesis VDP pixel.

ABGR

Many games make use of this to split rendering work between the Genesis hardware and the 32X hardware. For example, 2D tile-based graphics are significantly more efficient to render using the Genesis VDP, especially when you consider effects like scrolling and positioning - these are very cheap on the Genesis VDP but very expensive when rendering bitmap graphics in software.

On the other hand, the very fast SH-2 CPUs with their multiplication and division units are capable of drawing (primitive) 3D graphics in real time, plus they can perform graphical transformations on 2D graphics that the Genesis VDP can’t natively perform (e.g. scaling and rotation).

As an example, here’s a screenshot from Knuckles’ Chaotix:

Chaotix Combined

This is what the 32X VDP is rendering there, only the player sprites and the HUD:

Chaotix 32X

You can see that the Knuckles player sprite is enlarged, an effect that would be very difficult to do efficiently with only the Genesis hardware (at least without storing a copy of the sprite tile data for each different size).

This is what the Genesis VDP is rendering there, the entire background (including the spinning rings):

Chaotix Genesis

This takes advantage of the fact that the Genesis VDP can render fullscreen tile-based graphics much more efficiently than a software renderer can.

Sample-Based Audio

The 32X includes a stereo PWM (pulse width modulation) sound chip which is capable of playing high-quality audio samples. In contrast to PCM (pulse code modulation) sound chips which output a wave with amplitude equal to the PCM sample value, a PWM chip synthesizes audio waves by rapidly toggling between maximum amplitude and minimum amplitude.

During each sample period, the chip uses the sample value to determine how long it should stay at maximum amplitude (the pulse width) before dropping back down to minimum amplitude. At a high enough sample rate, the rapid toggling does not produce any audible noise; it sounds like a wave with a constant amplitude that is determined by the pulse width.

The chip has a configurable sample rate, though most games only use 22 KHz. The pulse width samples have 12-bit resolution, a nice improvement over the 8-bit PCM samples that the Genesis YM2612’s DAC channel can play. (The DAC channel technically supports 9-bit resolution, but the 9th bit can only be updated using an undocumented test register.) Using the PWM chip for sample playback also frees up the 6th YM2612 channel so that it can be used as an additional FM synthesis channel.

The major downside to this chip is that it has no internal RAM, so one of the CPUs needs to continuously feed it new samples at the configured sample rate. Most games use the slave SH-2 as a dedicated audio processor for this purpose.

The PWM chip does have a 3-sample FIFO for each channel so that the CPU doesn’t need to exactly match the audio sample rate (unlike with the YM2612’s DAC channel), but software still needs to ensure that the FIFO is never empty which puts a lot of load on one of the CPUs. That said, there is a PWM interrupt available to the SH-2s so that games don’t have to continuously poll the PWM FIFOs, and it’s also possible to configure one of the SH-2 DMA channels so that it will automatically send words to the PWM chip when the PWM interrupt would trigger.

Another limiting factor for 32X sample quality is storage. 32X games came on cartridges with no more than 4 MB of ROM, and sample-based audio can take a lot of space. PWM samples can be compressed pretty efficiently, but even so, games could only fit so much. Some games perform software mixing on the slave SH-2 to reduce the number of unique sounds that need to be stored.

Games typically used the PWM chip to play sounds that are difficult for FM synthesis sound chips, such as percussion and voice samples. As an example, here’s the intro stage music from Knuckles’ Chaotix (which on an unrelated note is an excellent use of the PSG chip):

Here’s just the part that comes from the 32X PWM chip:

Emulators (including mine) commonly emulate the PWM sound chip as if it’s a PCM sound chip because accurate PWM emulation is very computationally expensive. You would first need to generate a raw 1-bit audio signal at a frequency high enough to represent the 12-bit pulse width resolution at the desired sample rate, which for 22 KHz would be at least 90 MHz (22 KHz * 4096). Then you would need to resample that 90 MHz audio signal to a more reasonable sample rate (e.g. 48 KHz) using a low-pass filter, and for 48 KHz that low-pass filter will require at least 3,755 taps (2 * 90 MHz / 48 KHz) to correctly filter the 90 MHz signal down to 48 KHz without any audio aliasing. That would be an extraordinarily expensive filter calculation to do 48000 times per second, even if you shove it off to a separate thread.

The Genesis Is Still There Too

As with the Sega CD, all of the Genesis hardware continues to run in parallel with the 32X: the 68000 main CPU, the Z80 audio CPU, the Genesis VDP, the YM2612 FM sound chip, and the SN76489 PSG sound chip. That’s a lot of processors to keep track of, particularly given that the 32X adds two new CPUs, and this was probably one reason that some developers weren’t keen on the 32X (or the Sega CD for that matter).

At boot, all of the 32X hardware is disabled. Every 32X cartridge contains a 68000 boot ROM that will initialize the 32X, after which the SH-2s begin to execute their own boot ROMs that are located in the 32X hardware. Once all of the CPUs have finished their initialization process, they handshake with each other and then hand over control to the game code.

The 68000 and the Z80 can access 32X cartridge ROM directly, although it’s mapped differently than it is for standalone Genesis games. For reference, the Genesis cartridge memory map is just this:

  • $000000-$3FFFFF: Cartridge ROM (up to 4MB)

While the 32X is on, it changes to this:

  • $880000-$8FFFFF: First 512KB of cartridge ROM
  • $900000-$9FFFFF: Mappable 1MB ROM bank
  • $A15104: ROM bank register

Cartridge ROM can be mapped back to $000000-$3FFFFF using a 32X register, I believe because the Genesis VDP isn’t capable of doing VDP DMAs from $880000-$9FFFFF. However, the SH-2s can’t access ROM while it’s remapped back to the start of the 68000 address space.

While the 32X is on, the first 256 bytes of the 68000 address space ($000000-$0000FF) are mapped to a builtin ROM containing fixed 68000 exception vectors separate from the vectors in the first 256 bytes of cartridge ROM. The only exception is the VDP horizonal interrupt vector (HINT / level 4) which can be changed by writing to $000070-$000073, somewhat similar to Sega CD (though SCD supported this using a register rather than directly writing to the vector).

The 68000 can’t access 32X SDRAM but it can access most of the rest of the new hardware, including the VDP frame buffers and the PWM chip. However, 32X VDP access is mutually exclusive between the Genesis side and the 32X side, so games generally don’t do much with the frame buffers from the 68000.

The 68000 communicates with the two SH-2s primarily through a set of 8 read/write word-size communication ports in the 32X, which the 68000 and SH-2s both have access to. These are mapped to $A15120-$A1512F on the Genesis side and $00004020-$0000402F on the 32X side. The two SH-2s can also use these ports to communicate with each other in order to avoid bus contention from simultaneous SDRAM access.

There is also a process for 68000-to-32X DMA transfer where the 68000 can write data words into a DMA FIFO and then one of the SH-2s can transfer those words into 32X SDRAM using its builtin DMA controller. This DMA dance is necessary because the 68000 can’t directly access 32X SDRAM.

Finally, there’s a 32X register that enables the 68000 to trigger a command interrupt for one or both of the SH-2s. For example, it could interrupt one of the SH-2s when it’s ready to send data via SH-2 DMA so that the SH-2 can set up the DMA on its side.

End

All in all, if the 32X was meant to bridge the gap between the Genesis and the Saturn, it’s hard to say that it was successful. 32X takes a similar software rendering approach to coprocessors like the SNES Super FX and the Genesis SVP rather than working towards the hardware-accelerated 3D graphics that the next generation of consoles supported. Video games were rapidly moving towards 3D graphics, and it was quickly becoming apparent that software-rendered 3D was a technological dead-end compared to GPUs with hardware 3D capabilities.

It’s pretty neat what games were able to do with the hardware, especially given that almost all of the 32X games were reportedly developed on a very short timeline, but I personally think it was definitely a misguided release on Sega’s part.

updatedupdated2024-06-202024-06-20