Donkey Kong Country 2 has a pretty well-known bug in the old SNES emulator ZSNES where some stages have spinning barrels that don’t work properly. One of the earliest pictured here, in the first stage of Krem Quay (third world):
Barrel Bayou
After you jump into the barrel, you’re supposed to be able to completely control its rotation by pressing left and right on the d-pad, with the barrel only rotating while you’re holding left or right. In ZSNES, this is horribly bugged. Tapping left or right makes the barrel spin forever in that direction, until you press the opposite direction…which simply makes it spin forever in the opposite direction.
This is more than just annoying - it makes these stages significantly more difficult than the developers intended, since later on the spinning barrels show up over spikes and other hazards:
Klobber Karnage
This used to be somewhat documented in threads on the ZSNES forums, but those unfortunately seem to have gone offline since last I looked at them, and I can’t find the relevant threads indexed in the Wayback Machine.
This bug is caused by ZSNES not emulating open bus behavior. I believe this was originally discovered by Anomie roughly two decades ago, who subsequently fixed the same bug in Snes9x. This original fix hardcoded the specific addresses to return the values that the game depends on rather than properly emulating open bus, but it fixed DKC2 and probably didn’t break anything else. The bug was never fixed in ZSNES, which is now a long abandoned project (last release in 2007).
Purely out of curiosity, I wanted to dig into this a little more to figure out what exactly in the game code causes these barrels to spin forever in an emulator that doesn’t emulate open bus behavior.
Open Bus Behavior and 65816 Memory Addressing
On older platforms like the SNES, reading from an invalid memory address usually does not crash the program. There are cases where accessing specific invalid addresses can cause the hardware to lock up, but I don’t believe this can happen on SNES.
Instead, reading from an invalid address usually triggers open bus behavior, where the CPU re-reads the last value that was put on the data bus. SNES specifically has several different internal buses that can retain different open bus values, but this doesn’t affect DKC2.
The main SNES CPU is a 65C816 (aka 65816). There’s some other hardware around it as part of the Ricoh 5A22 S-CPU package, such as a multiplication/division unit and a DMA unit, but the core CPU is a 65816.
65816 is a 16-bit extension of the 6502, a very popular 8-bit CPU used in many systems including the NES (with slight modifications). The 65816 is mostly backwards compatible with 6502 software, which was not important for the SNES (which has no NES backwards compatibility) but was very important for the Apple IIGS that this CPU was originally designed for.
I personally think the 65816 ISA is pretty awkward. 8-bit vs. 16-bit operation is based on new processor status flags M (accumulator / memory size) and X (index register size) rather than being encoded into opcode bits, so software needs to frequently execute the new instructions SEP (set processor flags) and REP (reset processor flags) to manually adjust register and memory access sizes as needed. This also makes 65816 disassembly extraordinarily painful without tracing execution in an emulator, since some instructions vary in length depending on the current processor flags - e.g. an immediate operand can be either 1 byte (8-bit) or 2 bytes (16-bit).
Beyond that (and slightly more relevant to this post), addressing more than 64 KB of memory requires dealing with memory banking which is not fun. The 65816 has a 24-bit address bus, but most addresses are created by combining an 8-bit bank with a 16-bit offset. This is sort of similar to how the earliest x86 CPUs segment memory into 64 KB segments, except 65816 has no address overlap between different 64 KB memory banks.
24-bit 65816 address
Many instructions still operate using 16-bit addresses internally, like on 6502, plus the program counter is still 16-bit. There’s a new 8-bit program bank register (PBR / K) used for instruction fetches, and a new 8-bit data bank register (DBR / B) used for instructions and addressing modes that produce a 16-bit memory address rather than 24-bit. Software needs to manually track and update these bank registers as needed. There are long jump instructions that simultaneously update PBR and PC, but regular jump instructions and conditional branch instructions cannot jump between different program banks.
The hardware stack and the direct page (65816’s replacement for the zero page) are not banked - they are always located within memory bank $00.
The SNES memory map is very much designed around the 65816’s memory banking. It’s much more useful to think of SNES memory addresses as a separate 8-bit bank and 16-bit offset rather than a single 24-bit address.
When you’re inside one of these spinning barrels, Donkey Kong Country 2 reads from addresses $2000 and $2001 in bank $B3. In some other banks these addresses would map to either the cartridge or RAM, but in bank $B3 they are not mapped to anything, so reading from them is open bus behavior. Why does the game do this?
Disassemble!
Here’s a disassembly of the part of the game code that performs the open bus read, generated from an execution trace and then edited a bit for clarity (e.g. replacing the relative branch offset with a label). This is part of a routine that’s executed once per frame, beginning when you release left/right on the gamepad while you’re in a spinning barrel:
|
|
This routine accesses a few memory addresses in the $0000-$2001 range. Some of them are through absolute addressing modes that use the current data bank of $B3, while others use direct page addressing modes that always access bank $00. The direct page itself is located at $0000 here, same as the 6502 zero page.
Banks $B3 and $00 happen to have the same memory map for this $0000-$2001 address range. In banks $00-$3F and $80-$BF, $0000-$1FFF always maps to the first 8 KB of the console’s 128 KB of working RAM (WRAM). $2000-$20FF is entirely unmapped, so the and $2000
instruction is an open bus read.
It’s using a few WRAM addresses here that seem to contain the following, based on what values the game writes to them and what it uses them for:
- $0EE6 ($48 + X=$0E9E): The current barrel orientation
- $0E0A ($0028 + Y=$0DE2): Per-frame rotation amount, as a change to barrel orientation
- $0032: Seems to be used as just a temporary variable
I imagine the exact orientation/rotation locations in WRAM are different for different barrels.
The barrel orientation appears to be on a scale where 0x0000 is pointing straight down, 0x4000 is pointing straight left, etc.
Barrel orientation values
The rotation amount determines the barrel rotation speed. For the barrel I looked at, the game sets the rotation amount to 0x0300 when rotating clockwise and 0xFD00 (-0x0300) when rotating counterclockwise. This makes a full 360 degree rotation take just over 85 frames, a little less than 1.5 seconds at 60 frames per second.
Okay, starting to step through this routine:
|
|
This part is straightforward: It loads the current orientation, adds the rotation amount, and stores the result in a temporary variable. It executes CLC (clear carry flag) before ADC (add with carry) because the 65816, like the 6502, does not have an add without carry instruction.
Next is the interesting part:
|
|
It XORs the updated orientation with the previous orientation, bitwise ANDs the result with an open bus read from $2000, and then branches based on whether the bitwise AND produced zero. The spinning forever bug triggers when the branch is always taken because the bitwise AND result is always zero.
On actual hardware, the 16-bit open bus read from $2000 always returns 0x2020. This is because the last byte read from the bus is always the high byte of the $2000 absolute address encoded in the instruction bytes, little-endian:
2D 00 20
Machine code for and $2000
(3 bytes)
Since the 65816 only has an 8-bit data bus, it implements 16-bit reads by performing two consecutive 8-bit reads, which in this case will both return 0x20. Hence the 16-bit value 0x2020.
So, in practice, that part of the routine behaves equivalently to this:
|
|
Moving on, when the AND result is zero and the beq continue
branch is taken, it does this:
|
|
It loads the pre-XOR rotated orientation from the temporary variable, writes it to the permanent orientation location in WRAM, then returns. The rotation will continue next frame.
When the AND is non-zero and the branch is not taken, it does this:
|
|
First, it zeroes out the rotation amount. 65816 has a dedicated instruction STZ (store zero) for zeroing out a memory location, but STZ doesn’t support any Y-indexed addressing modes like what the game uses here (absolute indexed Y).
Next, it loads the pre-XOR rotated orientation, adds 0x1000, and masks out all but the highest 3 bits. This is a crude but fast way of approximately rounding to the nearest multiple of 0x2000.
Finally, it writes the rounded orientation to the permanent location in WRAM and then returns.
All together, in a higher level language, this routine is doing this:
|
|
With the open bus read returning 0x2020, the XOR-then-AND result will be non-zero when adding the rotation amount to orientation changes either bit 5 (0x0020) or bit 13 (0x2000). Given a rotation amount of 0x0300 or 0xFD00, bit 5 is always 0, so only bit 13 can ever change.
For convenience, here’s the orientation values diagram again:
Looking at this, an orientation change of 0x2000 corresponds to a single-step change in cardinal or ordinal direction. This means that bit 13 will change when the barrel either reaches or passes over one of these 8 directions. Whether it changes upon reaching or upon passing over depends on the rotation direction, but it’s not really significant from a player perspective since it’s only a 1-frame difference and only in specific cases.
Rounding to the nearest multiple of 0x2000 ensures that the stopped barrel points exactly in a cardinal or ordinal direction, since it may have passed over the direction on the final rotation frame.
So, if you replace the open bus read with a constant 0x2000, I think this logic makes sense! When you release the d-pad, the barrel continues to rotate in the same direction until it reaches the next cardinal or ordinal direction, and then the rotation stops with the barrel pointing exactly in that direction.
Conclusion
At this point I am pretty sure the open bus read was simply caused by a typo.
I think that and $2000
instruction (absolute addressing) was supposed to be and #$2000
(immediate addressing). and $2000
just happens to work because the 16-bit open bus read from $2000 returns a value that is functionally equivalent to 0x2000 in this logic as long as the per-frame rotation amount always has its lowest 6 bits set to 0.
The incorrect opcode is executed at bank $B3 offset $EDAC, which maps to $33EDAC in the game’s 4 MB of ROM. Changing this byte from 0x2D (AND with absolute addressing) to 0x29 (AND with immediate addressing) makes the spinning barrels work correctly even if open bus reads always return 0. The exact location in ROM probably varies between different revisions of the game; I only looked at one revision.
This was purely an academic exercise since the game works perfectly fine in just about every SNES emulator other than the long-obsolete ZSNES, but my curiosity is satisfied.