This is the first post in a series on emulating the main Sega Genesis sound chip, the Yamaha YM2612 FM synthesis chip, also known as the OPN2.
To date, the YM2612 is pretty easily the most difficult-to-emulate sound chip that I have worked on. It’s not extremely complex in concept, but it has an incredible amount of specific details and quirks in how exactly it works, and many of them need to be emulated exactly correctly for game audio to sound correct. Debugging mistakes is also very, very difficult due to all of the modulation and feedback, where for example a minor mistake in envelope emulation can manifest as some instruments sounding completely wrong.
These posts are not going to describe how to implement a cycle-accurate YM2612 emulator (mine is not), but I will do my best to describe how the chip works at a low level, from the perspective of someone who emulated it for the first time with modern documentation and resources available.
I found a number of minor bugs and oversights in my own implementation while writing these posts, so if nothing else, writing them was useful for that!
This first post will mostly cover how the YM2612 is integrated into the Genesis and how the CPUs interface with it.
Primary Source
First, I should state that nearly all of my information on the YM2612 comes from this very long thread plus resources linked from it: https://gendev.spritesmind.net/forum/viewtopic.php?t=386
In particular, there’s a lot of great information posted by Nemesis (Exodus author) on how exactly this chip works. I do not believe I would have been able to throw together my own YM2612 implementation without the information in this thread.
Do be wary that many earlier posts in this thread contain inaccurate information that is corrected in later posts. As one of the more obvious examples, the first post on how the ADSR envelope generators work has some major errors that are corrected many pages later.
Mask of Destiny (BlastEm author) made a post that links to some of the most useful pages in that thread, as well as lots of other generally useful resources for anyone writing a Genesis emulator: https://gendev.spritesmind.net/forum/viewtopic.php?f=2&t=2227
One linked source is a translated manual for the YM2608 chip, which is very closely related to the YM2612. This is a particularly useful source, though note that the YM2608 is not exactly the same as the YM2612. I’ll try to note some of the biggest differences where they’re relevant.
Sega’s official Genesis documentation on the YM2612 is almost useless. Lots of major inaccuracies and it omits a lot of important information.
Four-Operator FM Synthesis
I attempted to overview how Yamaha FM synthesis works at a very high level in a section of my previous post on Konami’s VRC7 mapper for the Famicom. The YM2612 is similar in concept, though the details are very different from VRC7 and other OPL chips.
While these chips are called “FM” (frequency modulation), the hardware implementation is really phase modulation: some sine wave generators are used to dynamically adjust the phase of other sine wave generators. This is true for both VRC7 and YM2612.
The YM2612 has 6 audio channels, each with 4 sine wave generators called operators. Having 4 operators is a very significant change from the 2-operator FM synthesis of OPL2 and OPLL/VRC7, and it enables the chip to produce a much larger variety of sounds.
A channel’s 4 operators can be arranged in 1 of 8 different configurations, called “algorithms”. The algorithm determines whether any particular operator is a carrier (contributes directly to channel output) or a modulator (phase modulates another operator). Modulators may phase modulate multiple other operators depending on the algorithm, but there is no algorithm that makes any operator both a modulator and a carrier simultaneously.
Each channel’s operator 1 (counting from 1) is unique in that it supports self-feedback, like the OPLL/VRC7’s modulator. This means that it can optionally phase modulate itself using the sum of its last two operator outputs.
The first 4 algorithm options use the first 3 operators as modulators and the 4th operator as a carrier. The second 4 algorithm options each have multiple carriers, with the last algorithm making all 4 operators carriers. For algorithms with multiple carriers, the channel output is the sum of all carrier outputs.
 YM2612/YM2608 algorithms, from the YM2608 manual
YM2612/YM2608 algorithms, from the YM2608 manual
The chip also includes a number of features beyond the FM synthesis operators themselves: a low frequency oscillator (LFO) used to power vibrato and tremolo effects, two hardware timers for use by software, and a DAC channel that outputs raw 8-bit PCM samples if enabled.
The YM2608 manual mentions support for ADPCM sample playback and some “rhythm” functionality, but neither of these features is present in the YM2612.
To start, let’s cover how the YM2612 is integrated into the Genesis.
Clock
The Genesis drives the YM2612 using the exact same clock signal that it uses to drive the 68000 CPU, a roughly 7.67 MHz clock (NTSC). The exact frequency is 53693175 Hz / 7 for NTSC consoles and 53203424 Hz / 7 for PAL consoles, those 8-digit numbers being the respective Genesis master clock frequencies.
The YM2612 internally divides its master clock by 6, leading to an effective clock rate of ~7.67 MHz / 6 = ~1.28 MHz in the NTSC Genesis.
In an emulator, the important thing is that the YM2612’s master clock is the 68000 CPU clock, so it should get ticked 1 internal YM2612 cycle per 6 68000 CPU clock cycles. If you’re tracking timings in Genesis master clock cycles, the YM2612 should get 1 internal cycle per 42 mclk cycles.
The YM2612 generates a full output sample once every 24 internal clock cycles…kind of. In actual hardware it repeatedly cycles through its 6 channels at a rate of 1 channel per 4 cycles and it multiplexes the channel outputs through its DAC, similar to the Famicom Namco 163 expansion audio chip (though without the audio aliasing caused by that chip’s low cycling rate). Emulators don’t usually emulate this multiplexing - they typically mix the channels instead of multiplexing them, and they generate a mixed sample once every 24 clock cycles.
This leads to an effective sample rate of about 53267 Hz for NTSC (53693175 Hz / 7 / 6 / 24) and slightly lower for PAL.
Almost everything inside the YM2612 updates at the sample clock rate or at some divider of the sample clock rate, though different components update at different times throughout the 24-cycle sampling period. Emulators do not usually emulate this very precise timing - they usually update all components at once every 24 internal cycles.
Interface
The Genesis has a split bus, with the 68000 CPU on one side and the Z80 CPU on the other. It has a bus arbiter for managing access from one side to the other, as well as for allowing the 68000 to control the Z80 by setting its BUSREQ and RESET lines. BUSREQ removes the Z80 from the bus so that the 68000 can freely access hardware on the Z80 side of the bus.
The Z80 can access the 68000 side of the bus at any time, but every access introduces a variable delay for both CPUs as the bus arbiter inserts wait states to avoid a bus conflict.
The Z80 is meant to be a dedicated audio processor. To support this, the YM2612 is on the Z80 side of the bus so that the Z80 can access it without needing to cross over to the 68000 side and incur difficult-to-predict delays from the bus arbiter.
(Interestingly, the SN76489 PSG chip is on the 68000 side of the bus, but SN76489 interactions are generally significantly less timing-sensitive compared to driving the YM2612’s DAC channel.)
The 68000 can access the YM2612 through its Z80 memory map window at $A00000-$A0FFFF, as long as it first removes the Z80 from the bus by setting its BUSREQ line using the bus arbiter. Some games primarily drive audio using the 68000, such as Sonic 1 - it only really uses the Z80 for playing samples through the YM2612’s DAC channel. Sonic 1 controls the YM2612 FM synthesis channels and the PSG using the 68000.
In fact, Sonic 1 is pretty much fully playable without emulating the Z80 at all! You need to emulate the bus arbiter registers at $A11100 and $A11200, and you won’t get any of the audio samples that it plays through the YM2612’s DAC channel, but you’ll get audio output from the FM synthesis channels and the PSG.
A number of Genesis games are playable without emulating the Z80 (this is not the SNES with its insanely limited 65816/SPC700 communication interface), but it’s very common for audio to be completely missing with no Z80 emulation. Sonic 1 is a bit of an anomaly in how much it drives audio from the 68000.
Write Ports
The YM2612 has four 8-bit write ports mapped to $4000-$4003 in the Z80 memory map. These ports are mirrored repeatedly throughout the address range $4000-$5FFF.
According to official documentation, these are:
- $4000: Address port, group 1 (channels 1-3 + global registers)
- $4001: Data port, group 1
- $4002: Address port, group 2 (channels 4-6)
- $4003: Data port, group 2
When software wants to write to a channel 1-3 register or a global register, it first writes the 8-bit register address to $4000, then it writes the 8-bit register value to $4001. For channel 4-6 registers, it writes the register address to $4002 and then the register value to $4003.
…That’s not actually how it works, though. In reality the chip only has one data port that’s mapped to both $4001 and $4003:
- $4000: Address port + Set group 1 flag
- $4002: Address port + Set group 2 flag
- $4001 / $4003: Data port
The chip remembers whether $4000 or $4002 was last written to, and data port writes will go to either group 1 or group 2 based on which address port was last written. Titan’s Overdrive 2 demo depends on this because it performs all of its data port writes through $4001, though I’m not sure if any games depend on this behavior.
Looking at the 8-bit YM2612 register addresses, they all fall into three address ranges:
- $20-$2F: Global registers
- $30-$9F: Operator control registers
- $A0-$BF: Channel control registers
For both the $30-$9F registers and the $A0-$BF registers, the lowest 2 bits of the register address are used as the channel index. For example:
| Example Register Addresses | Group 1 Channel | Group 2 Channel | 
|---|---|---|
| $30, $34, $A0, $A4 | 1 | 4 | 
| $31, $35, $A1, $A5 | 2 | 5 | 
| $32, $36, $A2, $A6 | 3 | 6 | 
| $33, $37, $A3, $A7 | None | None | 
For the $30-$9F registers, bits 2 and 3 are used as the operator index, but bitswapped! Concretely:
- 00 = Operator 1
- 01 = Operator 3
- 10 = Operator 2
- 11 = Operator 4
The YM2608 manual correctly documents this bitswapping. Sega’s official Genesis documentation on the YM2612 is wrong in this regard.
Examples:
| Example Register Addresses | Operator | 
|---|---|
| $30, $31, $32, $40, $41, $42 | 1 | 
| $34, $35, $36, $44, $45, $46 | 3 | 
| $38, $39, $3A, $48, $49, $4A | 2 | 
| $3C, $3D, $3E, $4C, $4D, $4E | 4 | 
You could parse these out of the address like so:
|  |  | 
Read Port
The YM2612 has a single read port which is mapped to $4000 in the Z80 memory map. It contains 3 meaningful bits:
- Bit 7: Busy flag
- Bit 1: Timer B overflow flag
- Bit 0: Timer A overflow flag
That’s it. The other 5 bits are undefined, and the YM2612 does not expose any other information through reads.
The busy flag is most commonly used. It indicates whether the YM2612 is currently processing a register write, in which case software should wait to perform any other YM2612 register writes. When games need to write to multiple registers they’ll typically have a loop where they write to the first register, poll the busy flag until it’s 0, write to the next register, poll the busy flag until it’s 0, and repeat until all registers have been written.
The two timer overflow bits are the only way for software to get feedback from the two YM2612 timers because they’re not connected to any of the Z80’s interrupt lines. Software that uses the timers needs to poll these bits to know when a timer interval has passed.
Busy flag behavior doesn’t seem to be completely consistent between different model consoles, and it probably also depends on when exactly the register write occurs within the chip’s internal cycling between channels and operators. I leave the busy flag 1 for 32 internal YM2612 cycles after a data port write and that seems to work ok with everything I’ve tested. (Source for that number)
At least in earlier consoles, the busy flag reads as 1 for much longer than it takes the YM2612 to process the write, which it will always do within 24 YM2612 cycles and sometimes do within many fewer cycles - it depends on which register was written to. This makes the busy flag less useful than simply counting Z80 cycles in between writes, though this was likely not known to most developers back in the 80s and 90s.
Read Port Mirroring: Discrete YM2612 vs. YM3438
The read port is mirrored at $4001-$4003…on some consoles. There are at least two games that are highly sensitive to read port mirroring behavior: Earthworm Jim and Hellfire.
This behavior difference is based on whether the console contains a discrete YM2612 chip or a YM3438, a slightly modified CMOS version of the YM2612.
With the YM3438, $4001-$4003 mirror $4000. Reading from any of these addresses returns the busy flag and timer overflow bits. This chip was used in the Model 1 VA7, Model 2 VA0-VA1, Model 2 VA3-VA4, and Model 3 consoles.
With the discrete YM2612, reading from $4001-$4003 is officially undefined. It seems like what happens on actual hardware is that reading from $4001-$4003 returns the last value that was read from $4000, but it decays to 0 after a certain amount of time has passed since the last $4000 read. This chip was used in the Model 1 VA0-VA6 and Model 2 VA2 consoles.
Earthworm Jim occasionally reads the busy flag from $4002 instead of $4000, and on models with the discrete YM2612, this causes extremely noticeable audio stuttering. This happens because the previous $4000 read was sometimes only to poll for one of the timer overflow bits, and if the busy flag was set during that read, the game’s audio driver will spinloop polling $4002 until the status value decays to 0.
The discrete YM2612 recording is using a decay period of around a quarter-second’s worth of cycles. This produces results similar to hardware recordings of this game on consoles with a discrete YM2612.
Hellfire has the opposite problem: It frequently reads the busy flag from $4001 and $4003 instead of $4000, and if it can actually read the busy flag from these addresses, the music will play much slower than it’s supposed to. This means that the music only plays correctly on consoles with a discrete YM2612.
The game just so happens to write to the YM2612 registers in such a way that even on actual hardware, none of the writes are dropped despite it not correctly reading the busy flag.
How to handle this in an emulator is an implementation decision. Making $4001-$4003 reads always return 0 would make both Earthworm Jim and Hellfire sound correct, but it’s not accurate to any actual hardware, and it could break other games. Always using discrete YM2612 behavior breaks Earthworm Jim, and always using YM3438 behavior breaks Hellfire.
The most reasonable solution is probably to offer an option of what behavior to emulate, maybe with something like an auto-detect option that automatically uses the ideal behavior for these two games. Other games generally only try to access the read port at $4000.
Both of these behaviors are accurate to actual hardware - they’re just accurate to different versions of the hardware.
First Audio Output: DAC Channel
The DAC channel is by far the easiest thing to emulate inside the YM2612, so let’s start with that.
The DAC channel is controlled entirely by 3 registers:
- $2A: DAC channel PCM sample (unsigned 8-bit)
- $2B: DAC channel enabled (Bit 7)
- $B6: L/R panning flags and LFO sensitivity for channels 3 (group 1) and 6 (group 2)
When enabled via register $2B, the DAC channel replaces channel 6’s output with the 8-bit PCM sample value that was last written to register $2A. It respects the channel 6 L/R panning flags in register $B6 (group 2), but otherwise none of the channel 6 configuration affects the DAC channel.
PCM samples are interpreted as unsigned 8-bit values (0-255), but for output they’re converted to signed 8-bit by applying a bias of -128. This signed 8-bit sample is then bit shifted to match the scale of FM synthesis channel outputs.
This isn’t really relevant until later, but FM channel outputs are signed 14-bit, so the DAC channel implementation is as simple as:
|  |  | 
The YM2612’s DAC only has a 9-bit digital input, but it’s not possible to set the truncated lower 5 bits using the DAC channel, so that’s not important for now.
For generating an output sample, you can pretend that you’re mixing 6 channels whose outputs are each on an i14 scale, except right now only 1 channel is emulated:
|  |  | 
You could also accumulate the channel outputs into an i32 instead of converting to floating-point, though you’ll likely want to convert to floating-point eventually in order to ensure that volume is scaled correctly.
Wire this up to an audio output, and as long as your emulated Z80 timing is fairly accurate, this is enough to (technically) get some YM2612 audio in games! Including the iconic “SEGA” intro sound in the Sonic games:
It’s a start!
DAC channel output tends to sound very crunchy and noisy. This is partly because there’s no FIFO or anything to buffer incoming samples - games must constantly send samples to the YM2612 at the desired sample rate using very carefully timed code.
Playing at high sample rates doesn’t leave the Z80 with much time to do anything else, plus the timing gets thrown off by bus arbiter delays if the Z80 ever needs to access the 68000 side of the bus (e.g. to read from cartridge ROM). Also, the 68000 needs to remove the Z80 from the bus to safely read from the controller ports, which pretty much every game does at least once per frame.
Audio quality is further degraded by the chip effectively nearest-neighbor resampling up to 53267 Hz. This introduces lots of additional audio aliasing and noise, particularly if the source data is at a very low sample rate.
To Be Continued
The next post will cover a probably more interesting topic: exactly how the phase generators in the FM synthesis channels work.