Emulator Bugs: Metal Max 2 Kai

This is a post on a bug that broke Metal Max 2 Kai for the Game Boy Advance, a port of the Super Famicom game Metal Max 2.

Metal Max 2 Kai FlickeringMetal Max 2 Kai (slowed to 1/6 speed)

The text box is definitely not supposed to do that!

This was caused by slightly inaccurate timing. This is one of those games that tends to work fine in an emulator with very inaccurate timing but breaks if you have only semi-accurate timing, much like Battletoads on NES and Pinball Deluxe on Game Boy. At least in this case it’s just graphical glitches rather than the game freezing…

The core of the problem is this: the game’s horizontal interrupt handler reads the video processor’s VCOUNT register (current line counter), and it must not read VCOUNT before the counter increments to the next line. If the game’s HBlank IRQ handler ever reads VCOUNT before it increments, it may render the text box at the wrong vertical position for the rest of the frame. Since the graphics are only wrong for a single frame, and this won’t happen on every frame, this creates a flickering effect.

The GBA’s PPU (picture processing unit) takes 1232 cycles to fully process a line: roughly 47 cycles of preparing to render, 960 cycles to draw pixels to the screen (4 cycles per pixel), and then 225 cycles of horizontal blanking where the PPU is…not actually idle, because it can still perform sprite processing during HBlank. But the PPU does not do anything related to background rendering during HBlank, so games can freely modify registers related to background rendering or final image composition (e.g. BG scroll values or layer blending parameters).

GBA PPU LinePer-line PPU time during active display

I’m not sure that these lengths are exactly correct, but it is at least true that the HBlank bit in the PPU’s DISPSTAT register (display status) reads 1 for exactly 225 cycles per line.

If you define cycle 0 as being the first cycle where reading VCOUNT returns the new line value (and also the same cycle where the DISPSTAT HBlank bit begins to read 0), then the PPU triggers HBlank IRQ on cycle 1008, 1 cycle after HBlank begins. Thus there are 224 cycles between the PPU triggering HBlank IRQ and VCOUNT incrementing to the next line.

This is not very well-documented that I could find, but the GBA’s ARM7TDMI CPU does not see changes to its IRQ line immediately - there’s a 3-cycle delay before it sees any changes to IRQ. This is partially caused by a synchronizer that synchronizes the IRQ line to the CPU clock, and I think partially caused by some latency in the GBA interrupt hardware? The exact reasons for this delay are not completely clear to me. Regardless, the CPU won’t begin to handle an IRQ until at least 3 cycles after it triggers, and the delay will be longer if the CPU is mid-instruction when it sees the IRQ line change.

The actual IRQ handling process then takes 3 more cycles: 1 cycle where the CPU performs an opcode fetch at its current address (which gets thrown away), then 2 cycles to refill the instruction pipeline starting at the IRQ vector. On GBA, the discarded opcode fetch could take longer than 1 clock cycle due to memory latency, so the hardware IRQ handling could take more than 3 clock cycles in total.

This means that there will be at least 6 cycles between the PPU asserting HBlank IRQ and the CPU beginning to execute at the IRQ vector. On GBA this will often be longer than 6 cycles due to memory latency, but 6 cycles is the absolute minimum possible delay to executing the first instruction at the IRQ vector.

For HBlank IRQ, this leaves at most 218 cycles before VCOUNT increments to the next line after the CPU begins to execute the IRQ handler.

As it so happens, when this game handles HBlank IRQ, it takes exactly 218 cycles before it performs the VCOUNT read. If the CPU hits the minimum possible 6-cycle latency between HBlank IRQ and beginning to execute at the IRQ vector, it will read VCOUNT on the earliest possible cycle where it can see the incremented value. Awful.

I wish I could say that the game has some clever cycle-counted assembly code here, but I’m pretty confident that is not the case. The game’s interrupt handler has many instances of Thumb assembly code like this:

  ; R7 = 0x03007E1C (IWRAM)
  str r1, [r7, #12]
  ldr r0, [r7, #12]
  ldrb r1, [r0]

It writes a value to IWRAM and then immediately reads that value back into a different register.

Someone handwriting assembly would never do that RAM read-back because it’s 2 cycles slower than a register-to-register move, and that’s assuming it even needs to copy the value into a different register as opposed to just doing ldrb r1, [r1]. However, C compilers seem to often generate code like this, at least the C compilers that were used back in the early/mid 2000s. If this code was originally written in C, I think it’s probably safe to assume that it was not carefully cycle counted.

To be clear, I’m not judging the developers for (probably) using C! Writing ARM/Thumb assembly by hand is rather tedious, at least in large quantities. Using C is only a very strong signal that the developers did not intentionally make this game extremely dependent on precise GBA hardware timings.

The actual logic that breaks in the game’s IRQ handler is too convoluted for me to want to cover it in detail here, but at a high level: it has some logic that is supposed to run every 12 lines along with some other logic that’s supposed to run every 4 lines, and it determines when to run these based on reading VCOUNT rather than counting how many times the HBlank IRQ handler executes. If the game ever reads VCOUNT pre-increment during an HBlank where it’s supposed to execute either the every-12-line logic or the every-4-line logic, multiple assumptions in the code are broken, and these broken assumptions cause it to write incorrect BG0 vertical scroll values for the rest of the frame.

Here the game is using BG0 exclusively for the text box while using BG1 and BG2 for the rest of the background graphics, so the incorrect V scroll values only affect the text box.

If the game’s HBlank IRQ handler always reads VCOUNT pre-increment, that fixes the text box flicker, but it causes other graphical glitches like this one (top line is incorrect):

Line 0 Glitch

In my emulator, most of the timing was accurate enough for this game to work properly, but I was triggering HBlank IRQ one cycle too early relative to the VCOUNT increment. That one cycle difference was enough to break this game.

The author of NanoBoyAdvance wrote a test ROM that checks the timings of when various PPU events trigger relative to each other, which was extremely helpful in fixing this properly (along with some other PPU timing bugs).

This one actually surprised me because the majority of the GBA library seems to be not very timing-sensitive at all, and then here’s a game that requires some aspects of timing to be basically perfect (or just inaccurate enough, either works). Not even because it’s doing some clever timing tricks, either - it’s this timing-sensitive seemingly by accident.

Mario & Luigi: Superstar Saga is another game that’s known to be timing-sensitive, but from what I’ve seen it’s not as sensitive as Metal Max 2 Kai - it just has sprite flickering issues if timing is too far off. It actually has some sprite flickering even on actual hardware (it’s very noticeable in the intro sequence), but very inaccurate timing causes it to have much more flickering than it’s supposed to.

updatedupdated2025-10-052025-10-05