Emulating the Titan Overdrive Demo

The Titan Overdrive and Overdrive 2 Mega Drive demos are well-known for pushing the Mega Drive / Genesis hardware to its limits. I decided to try the first one in my own Genesis emulator and see what it would take to get it working! Unlike Overdrive 2, the first Overdrive doesn’t rely on totally undocumented hardware features that nothing else uses.

Not Off to a Good Start

After booting, the demo ran way too fast and then completely crashed. This is about as far as it got:

Titan Overdrive Boot

Digging into trace logs made it clear that the demo was using the VBlank flag in the VDP status register to determine when a frame was complete. This flag is supposed to get set when the VDP finishes the final scanline of the frame, and it’s supposed to get cleared when the VDP begins processing the scanline immediately preceding the first visible scanline. The exact timing of when the flag gets cleared differs between NTSC consoles (US/Japan) and PAL consoles (Europe) because the vertical blanking period is much longer on PAL so that the console outputs 50 frames per second instead of 60.

Well, after looking at my VBlank flag code, I pretty quickly found the bug. This is the PAL logic before the fix:

1
2
3
4
5
6
// This OR is mecessary because the PAL V counter briefly wraps around to $00-$0A
// during VBlank.
// >300 comparison is because the V counter hits 0xFF twice, once at scanline 255
// and again at scanline 312.
(v_counter >= active_scanlines || self.state.scanline > active_scanlines)
    && !(v_counter == 0xFF && self.state.scanline > 300)

The V counter wraps from 0xFF to 0x00 shortly before the end of scanline 312, the final line before active display, and this caused the VBlank flag to briefly be set again when it should have stayed clear. This caused the demo to run roughly twice as fast and eventually crash.

Fixed logic:

1
2
(v_counter >= active_scanlines || self.state.scanline > active_scanlines)
    && !((v_counter == 0x00 || v_counter == 0xFF) && self.state.scanline > 300)

Blast Processing?

After the PAL VBlank flag fix, the demo ran to completion but with many broken effects. Here’s the first, the spinning “no nintendo inside” / “blast processing” board animation:

No Nintendo Inside

This effect isn’t doing anything particularly wild from a hardware perspective, but it requires pretty accurate timing to emulate correctly. For the effect, it’s simply using the VDP’s per-two-cell vertical scrolling mode and rewriting all of the vertical scroll values after each scanline to create the illusion of a 3D object.

The tricky part is how it times these rewrites: It appears to abuse the VDP’s VRAM access FIFO queue in order to start the rewrite at the earliest possible moment after the current scanline has finished rendering. The VDP does support horizontal line interrupts, but the interrupt doesn’t fire until about 10-16 pixels after the scanline is complete, and then the 68000 CPU takes about 44 CPU cycles (roughly 38 VDP pixels) to handle the interrupt before it starts executing any code from the HINT interrupt handler.

Background: During VBlank and when the display is disabled, the VDP can process VRAM writes faster than the 68000 can write to it. This is not the case during active display (including HBlank) - the VDP is actively fetching data that it needs for rendering, so the 68000 can only write a very limited amount of data per line in between VDP fetches. The VDP manages these writes by queueing into a 4-word FIFO queue and popping off the queue during specific “external access cycles” during which the VDP does not fetch any data for itself. If the 68000 writes to VRAM (or CRAM/VSRAM) while the FIFO queue is full, it stalls until the VDP is able to accept its write into the queue.

The timing trick used is that the code polls the VDP’s H counter to wait until the VDP is about halfway through the line, then it immediately writes 6 words to VRAM as fast as possible. The 68000 won’t be able to continue execution past the 6th write until 2 full words have been popped out of the VDP FIFO queue, at which point the code knows roughly how close to the end of the line it is (it does a few RAM copies before it actually disables display and starts the DMA to VSRAM).

As for the bug in my code, I had implemented some code to simulate delay when the 68000 writes to the VDP too quickly but the implementation was pretty wildly incorrect in a few ways, mainly in how I was handling the differences between VRAM writes (which require 2 external access slots to fully pop off the queue) and CRAM/VSRAM writes (which only require 1 access slot to pop but still take a full word in the FIFO queue). I fixed a few issues there and that got the effect working:

Blast Processing

Interrupt Timings

Next up is this screen, where there is supposed to be a sort of flame-like effect in the background. My emulator showed a bit of the effect at the start and then just this:

We Heart Evoke Broken

This one required digging through my 68000 disassembly logs to figure out. Without going into too much gory detail, the main routine polls the VBlank flag in the VDP status register to see when the frame is done, and each time the flag flips from 0 to 1 it increments a counter so that the code knows how long the effect has been running for. This screen also has the vertical interrupt enabled, and its VINT handler does a bunch of RAM-to-VRAM DMAs. One of those rewrites the color palettes to handle a fade-in at the start of the effect, which it does based on the counter that the main routine increments every frame.

The problem? My VDP implementation was generating the vertical interrupt too late, which caused the main routine to be partway through its counter increment when it jumped to the VINT routine, which in turn caused the VINT routine to get very confused about how long the effect had been running. Instead of writing the correct color palettes to CRAM it just wrote solid colors, as if the effect had just started.

I was actually already aware that my VDP generated the vertical interrupt late - I intentionally delayed it by 10-13 pixels compared to actual hardware because for some reason that fixed some pretty egregious music tempo bugs in Earthworm Jim. Well, I tested Earthworm Jim again after removing the VINT delay, and it worked perfectly fine! I guess the music bugs were caused by something else that I fixed in the meantime. Removing the VINT delay fixed the Overdrive effect:

We Heart Evoke

Double Helix

My emulator didn’t have any trouble with this effect, but I wanted to post a screenshot because it’s one of the cooler effects in the demo:

Double Helix

Mode 7? What’s That?

This effect shows a scrolling background with perspective, kind of similar to what you might see over on the SNES with Mode 7 + HDMA except implemented in software. My emulator rendered the effect itself fine but there was some garbage sitting in the middle of the screen the entire time:

Garbage Sprites

These were obviously sprites, but why were they just sitting there?

Again digging through logs, I noticed that the code set the sprite attribute table address to $B600 in VRAM at the start of this effect, but based on the contents of VRAM it seemed like it expected the table to be located at $B400. I went back to documentation and found a bit I missed the first time around:

Sprite Attribute Table Register

Apparently “should be 0” means “the hardware will ignore this bit”! This actually does make sense, since the sprite attribute table is 512 bytes in H32/256px mode but up to 640 bytes in H40/320px mode. I changed my code to ignore the lowest bit of this register while in H40 mode and that fixed this effect:

Perspective Background

Three Dimensions

It’s hard to see why this effect is broken in a screenshot but it was extremely noticeable in motion:

Jittery Lines

Every time the cube moved far enough up or down that it intersected the lines crossing the screen, the part of the line to the right of the cube would shift down 1 pixel, and then when the cube moved back the line would shift back into place.

Looking into how this is rendered, the cube and everything to the right of it are rendered using the 2 background layers, and the rest of the screen is rendered using sprites (yes, the part of the screen that looks like the background is rendered using sprites).

Not sure where else to look, I started messing with vertical scroll values and render timings to see if I could find anything that might fix it. Here’s something I tested that definitely did not fix it, subtracting 1 from the background V scroll values:

Bad Cube

I eventually found something that worked after looking at what writes it was performing in between scanlines. During HBlank after each scanline, it changes the background A nametable address (basically where the tile map is stored in VRAM) and then changes the vertical scroll values for both backgrounds. I tried rendering each line using the nametable address written during HBlank but not the updated V scroll values and that fixed it.

After some searching I stumbled upon this forum thread which describes the proper solution here: writes to nametable address registers should take effect roughly immediately, as they don’t appear to be latched like the rest of the VDP registers. (Most registers as well as the full-screen V scroll values are latched for the next scanline around when HINT is generated; this causes writes from HINT handlers to take effect after the next rendered scanline, not immediately.) Not emulating this correctly apparently also causes graphical glitches during a boss fight in The Adventures of Batman & Robin, per the linked thread.

Oh No. My Emulator Suxx

The 512-color screen is probably the most infamous screen in the demo because of its emulator detection, and naturally my emulator failed it:

Your Emulator Suxx

Thankfully one of the devs was kind enough to describe exactly what they did to show the “your emulator suxx” text. Rather than explicitly detecting emulators and conditionally showing the text, they take advantage of the fact that actual hardware reduces the number of sprites scanned when display is disabled during part of HBlank, and “your emulator suxx” is spelled out using sprites that actual hardware just doesn’t scan because they’re too far into the sprite attribute table.

I’m not confident that my implementation is the most accurate, but I did implement tracking for how many pixels display is disabled during HBlank, and I use that to reduce both the number of sprites scanned for line N+2 as well as the sprite pixel limit for line N+1 (both of these are reduced when display is disabled during HBlank on line N). Now my emulator doesn’t suxx anymore, at least according to this demo:

512 Colors

As an aside, Mickey Mania does something very similar to this during its 3D stages, and it depends on this hardware quirk to hide some sprites such as the moose’s shadow in the moose chase stage. I previously had a hack to handle Mickey Mania specifically that I was able to remove after implementing this fix.

The End

And that’s it! The demo has some other effects but none of them gave me any issues, or they were fixed alongside one of the bugs I described here. Here’s a link to the current WASM build of my emulator which includes all of these fixes:

https://jsgroth.dev/jgenesis/

updatedupdated2024-01-202024-01-20