Sega Megadrive – 5: Fonts and Text

The Hello World example was pretty simplistic, with only the necessary font glyphs created and all of the tile IDs hard coded to write the phrase. It can be taken a few steps further without too much work, allowing us to write arbitrary strings at any tile coordinates, in a variety of colours.

First, we need a complete font. I’ll abandon my embarrassing programmer art and instead convert a nice, tidy, opensource font to the pattern format, and keep it in a separate file to be loaded and dumped at any time using some Load/Unload routines – like how I’d expect to work with other art assets in the future. This also means I’ll need to deal with organising some locations in VDP memory to store arbitrary pieces of art, since up until now I’ve been uploading patterns to VRAM address 0x000, and this won’t work when dealing with more than one asset.

Secondly, I’ll write a text display subroutine, which accepts the font, string, X and Y coordinates and colour palette as parameters, which will be used to build the tile descriptor words before sending them to the VDP.

The font

I’ll need a nice font. For now I’ll be converting the font into a bitmap format and using a tool to convert each glyph into an assembly snippet, but if I need to do this sort of thing often I might put my C++ skills into gear and write a tool to do it automatically. I like tools. I won’t use every known character – after all, we’ve only got 64kb of graphics memory to play with, and it’s unlikely I’ll be making use of characters other than alphanumerical, full-stop and comma, and a few others. If I happen to need any more, I can always go back and add them at a later date.

The font needs to be perfectly legible at a size of 7×7 (8×8 tiles, but leaving a one-line space). It wasn’t easy to find one that matches the specification that’s also free to use, but low and behold I found an absolute beauty – a 7×7 pixel font under the Creative Commons license (link to the font and license in references section):

It needs a bit of tidying up first – I won’t make use of the smaller alpha characters, nor will I need all of those special characters, and a few of them don’t look like they’re 7×7 pixels in size either, but it’s a great start. Here’s my trimmed and corrected version – I’ve removed unneccesary characters, resized the brackets and created a new forward-slash from scratch, and aligned each character to an 8×8 grid (taking care that the bottom and right line of each cell remained blank, except for the comma):

Next, it needs converting to pattern data. For this, I used a tool called BMP2Tile, which dumps out tile data in assembly. To use this, I exported the font as a BMP file, opened it up in BMP2Tile, pressed the * key to select the entire image, then File -> Save Tiles -> In ASM. It dumps out a file containing each tile in ASM format, but it needed a few corrections making. I removed the size metadata (I’ll be writing my own) and replaced all 0’s with 1’s, and all F’s with 0’s so that the background is transparent, and the text will use colour 1. I could also go one step further and fill in the font face with a different colour, I might backtrack and do this at a later date, but for now I don’t want to waste any more palette entries on just a font.

Font attributes

As mentioned earlier, I’ll need to solve the problem of fitting more than one asset into the VDP at a time – I can’t just write artwork to VRAM address 0x000, there needs to be some organisation of what will fit where. To do this, and be able to refer to the correct tile IDs when setting up plane tiles, we need to know the address of the font, the size of the font in tiles, and the index of the first tile. Instead of sitting there counting it all, we can make use of the assembler’s preprocessor:

PixelFont: ; Font start address

dc.l    $01111100
dc.l    $11000110
dc.l    $10111010
dc.l    $10000010
dc.l    $10111010
dc.l    $10101010
dc.l    $11101110
dc.l    $00000000

; Rest of font data...

PixelFontEnd                                 ; Font end address
PixelFontSizeB: equ (PixelFontEnd-PixelFont) ; Font size in bytes
PixelFontSizeW: equ (PixelFontSizeB/2)       ; Font size in words
PixelFontSizeL: equ (PixelFontSizeB/4)       ; Font size in longs
PixelFontSizeT: equ (PixelFontSizeB/32)      ; Font size in tiles
PixelFontVRAM:  equ 0x0100                   ; Dest address in VRAM
PixelFontTileID: equ (PixelFontVRAM/32)      ; ID of first tile

Now we have some defines for all of the font’s sizes and addresses in various units, and they’ll be correct wherever we include the font file in code. I’ve chosen the arbitrary VDP address 0x0100 to upload the font to, simply as a demonstration (and to make sure the addressing works correctly when I implement the code), but I’m sure when I start making use of more artwork I’ll need to sit and plan the VDP’s memory layout properly.

LoadFont subroutine

This shouldn’t be too difficult, I did it in the last article, but this time we need to specify arbitrary fonts of any size, from any location, to any destination. This means we need to pass some parameters to a subroutine. There’s a few ways of achieving this – move the parameters to registers, or push data to the stack. Moving params to registers is the quickest (in terms of clock cycles) method, but we only have a limited amount of registers, and when the game code starts to get complex it would be difficult to juggle all of the registers around. The latter method allows us to specify a large amount of parameters, but since the subroutine would still need to make use of some registers internally we’d need some way of backup up and restoring them when entering and exiting the subroutine. For simplicity’s sake, I’ll go with the former method – moving parameters to registers – and if it starts to cause issues at a later date I’ll backtrack and change it.

Here’s what I came up with:

; a0 - Font address (l)
; d0 - VRAM address (w)
; d1 - Num chars (w)

swap     d0                   ; Shift VRAM addr to upper word
add.l    #vdp_write_tiles, d0 ; VRAM write cmd + VRAM destination address
move.l   d0, vdp_control      ; Send address to VDP cmd port

subq.b   #0x1, d1             ; Num chars - 1
move.w   #0x07, d2            ; 8 longwords in tile
move.l   (a0)+, vdp_data      ; Copy one line of tile to VDP data port
dbra     d2, @LongCopy
dbra     d1, @CharCopy


I’ve also defined the VDP control and data ports, as well as the VDP tile write command + address, since they’re likely to be used often. Using the subroutine should be pretty simple:

; Load font
lea        PixelFont, a0       ; Move font address to a0
move.l    #PixelFontVRAM, d0   ; Move VRAM dest address to d0
move.l    #PixelFontSizeT, d1  ; Move number of characters (font size in tiles) to d1
jsr        LoadFont            ; Jump to subroutine

As long as a palette has been uploaded too, we can use the Regen debugger to view the contents of VRAM and confirm that everything is in its right place:

Mapping ASCII characters

My aim is to be able to write arbitrary strings, defined in the ROM somewhere. The assembler encodes text characters as ASCII, which means I’ll need some method of converting each ASCII character to the font’s tile IDs. In my first attempt at this, I was only using alpha characters, and since character A in ASCII is 65 I could get away with just adding 65 to each byte in the string. Now that I’ve introduced numerical and special characters, I’ll need to come up with something else. I intend to ensure that every font I make sticks to the same characters and layout, so the simplest method would be to create a table which maps ASCII codes to tile IDs of the font. It certainly won’t be the fastest method, it means using a lookup table when drawing every character, but it’ll do for now. Perhaps a better method would be to encode the string itself to match the font tile IDs, but that would complicate development. If I need to do some optimisation, I’ll look into it.

ASCIIStart: equ 0x20 ; First ASCII code in table

dc.b 0x00   ; SPACE (ASCII code 0x20)
dc.b 0x28   ; ! Exclamation mark
dc.b 0x2B   ; " Double quotes
dc.b 0x2E   ; # Hash
dc.b 0x00   ; UNUSED
dc.b 0x00   ; UNUSED
dc.b 0x00   ; UNUSED
dc.b 0x2C   ; ' Single quote
dc.b 0x29   ; ( Open parenthesis
dc.b 0x2A   ; ) Close parenthesis
dc.b 0x00   ; UNUSED
dc.b 0x2F   ; + Plus
dc.b 0x26   ; , Comma
dc.b 0x30   ; - Minus
dc.b 0x25   ; . Full stop
dc.b 0x31   ; / Slash or divide
dc.b 0x1B   ; 0 Zero
dc.b 0x1C   ; 1 One
dc.b 0x1D   ; 2 Two
dc.b 0x1E   ; 3 Three
dc.b 0x1F   ; 4 Four
dc.b 0x20   ; 5 Five
dc.b 0x21   ; 6 Six
dc.b 0x22   ; 7 Seven
dc.b 0x23   ; 8 Eight
dc.b 0x24   ; 9 Nine
dc.b 0x2D   ; : Colon
dc.b 0x00   ; UNUSED
dc.b 0x00   ; UNUSED
dc.b 0x00   ; UNUSED
dc.b 0x00   ; UNUSED
dc.b 0x27   ; ? Question mark
dc.b 0x00   ; UNUSED
dc.b 0x01   ; A
dc.b 0x02   ; B
dc.b 0x03   ; C
dc.b 0x04   ; D
dc.b 0x05   ; E
dc.b 0x06   ; F
dc.b 0x07   ; G
dc.b 0x08   ; H
dc.b 0x09   ; I
dc.b 0x0A   ; J
dc.b 0x0B   ; K
dc.b 0x0C   ; L
dc.b 0x0D   ; M
dc.b 0x0E   ; N
dc.b 0x0F   ; O
dc.b 0x10   ; P
dc.b 0x11   ; Q
dc.b 0x12   ; R
dc.b 0x13   ; S
dc.b 0x14   ; T
dc.b 0x15   ; U
dc.b 0x16   ; V
dc.b 0x17   ; W
dc.b 0x18   ; X
dc.b 0x19   ; Y
dc.b 0x1A   ; Z (ASCII code 0x5A)

There we go, ASCII characters from 0x20 to 0x5A, mapped to font tile IDs. When looking them up, I’ll need to add 0x20 to the ASCII code, so I’ve also defined this for readability.

Drawing text

The methods used to get the text on screen should be very similar to the previous article – set up the Plane A tile IDs. We already have the ID of the first tile in VRAM (PixelFontTileID), so we just need to offset that by the tiles in the ASCII map. For the time being, I’ll be looking up the table whilst it is still in ROM, but I have doubts about the speed of reading data from cartridge so in future I may move the table into a location in main RAM to make the lookups faster (unless, of course, I discover that there’s no major difference). The same may go for the string data itself.

The first step is to calculate the destination address in VRAM. Since I plan to support specifying the X and Y coordinates in tiles, the address needs to be offset by 64 for each horizintal line (in H40 mode), plus 1 for each vertical tile:

; a0 (l) - String address
; d0 (w) - First tile ID of font
; d1 (bb)- XY coord (in tiles)
; d2 (b) - Palette

clr.l    d3                     ; Clear d3 ready to work with
move.b   d1, d3                 ; Move Y coord (lower byte of d1) to d3
mulu.w   #0x0040, d3            ; Multiply Y by line width (H40 mode - 64 lines horizontally) to get Y offset
ror.l    #0x8, d1               ; Shift X coord from upper to lower byte of d1
add.b    d1, d3                 ; Add X coord to offset
mulu.w   #0x2, d3               ; Convert to words
swap     d3                     ; Shift address offset to upper word
add.l    #vdp_write_plane_a, d3 ; Add PlaneA write cmd + address
move.l   d3, vdp_control        ; Send to VDP control port

It’s the most complex thing I’ve written yet, but hopefully the comments should explain it well enough. There’s a new opcode here – ror (roll right) – which shifts bits to the right by a specified offset (up to 8). Here, ror.l #0x08, d1 is used to shift the X coord from the upper to the lower byte of a word in d1, since the swap opcode can only operate on a longword, swapping two words around. The least significant bit gets brought back round to the most significant, who’s place is determined by the operation size (so a byte-sized ror operation with offset of 1 on 0001 would give us 1000). There’s also a corresponding rol (roll left) opcode, which isn’t demonstrated here. The offset is converted to words (since the tile descriptors are 1 word in size) and added to the ‘write to plane A’ VDP command + address, which I’ve defined for ease of use.

Next, we need to set up the word-sized tile descriptor, which contains the palette ID, the pattern ID, and flip bits (not used here). The palette ID fits into two bits, and belongs in bits 14 and 15 of the tile descriptor word, so we’ll start with that. I can use the ror opcode again for this, but since it can only move bits a maximum of 8 places at a time, it’ll need doing twice in order to shift the ID up 13 bits:

clr.l    d3                     ; Clear d3 ready to work with again
move.b   d2, d3                 ; Move palette ID (lower byte of d2) to d3
rol.l    #0x8, d3               ; Shift palette ID to bits 14 and 15 of d3
rol.l    #0x5, d3               ; Can only rol bits up to 8 places in one instruction

Now we need to loop round each byte in the string, adding the pattern ID of the text glyph to d2, before sending the complete tile descriptor word to the VDP. Our exit case for the loop will be a string terminator 0x0 (so we also need to make sure our strings actually end in 0x0), and along the way we need to convert the ASCII byte to a pattern ID using the ASCII table:

lea      ASCIIMap, a1           ; Load address of ASCII map into a1

move.b   (a0)+, d2              ; Move ASCII byte to lower byte of d2
cmp.b    #0x0, d2               ; Test if byte is zero (string terminator)
beq.b    @End                   ; If byte was zero, branch to end

sub.b    #ASCIIStart, d2        ; Subtract first ASCII code to get table entry index
move.b   (a1,d2.w), d3          ; Move tile ID from table (index in lower word of d2) to lower byte of d3
add.w    d0, d3                 ; Offset tile ID by first tile ID in font
move.w   d3, vdp_data           ; Move palette and pattern IDs to VDP data port
jmp      @CharCopy              ; Next character


Hopefully it should be self-explanatory, with the exception of that move.b  (a1,d2.w), d3 line. The parenthesis mean to offset the source address of the move command – so we’re moving the byte at address a1 + d2 to d3. This is how array access is done in 68k assembler. I haven’t yet tested, but I’m assuming the same can be done for the destination addresses, so offsets into the array can be written to as well.

The subroutine relies on the string being zero-terminated, else it will continue to loop until it finds one and just displays garbage. For each string, I’ll need to remember to append the zero manually, unlike in languages like C where strings inside double-quotes are automatically one byte longer than the string was defined, to hold the terminator.


Since the font includes the ” character, if we were to use it in a string constant we will need the equivalent of an ‘escape character’ in C, and that is to prefix the ” with another “. This seems to be unique to the ASM68K assembler, the C escape characters are used in other assemblers.

Here’s the finished result, showing off a few different strings, colour palettes and X/Y coordinates:

; Load font
lea       PixelFont, a0        ; Move font address to a0
move.l    #PixelFontVRAM, d0   ; Move VRAM dest address to d0
move.l    #PixelFontSizeT, d1  ; Move number of characters (font size in tiles) to d1
jsr       LoadFont             ; Jump to subroutine

; Draw text
lea       String1, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0501, d1          ; XY (5, 1)
move.l    #0x0, d2             ; Palette 0
jsr       DrawTextPlaneA       ; Call draw text subroutine

lea       String2, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0502, d1          ; XY (5, 2)
move.l    #0x1, d2             ; Palette 1
jsr       DrawTextPlaneA       ; Call draw text subroutine

lea       String3, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0503, d1          ; XY (5, 3)
move.l    #0x2, d2             ; Palette 2
jsr       DrawTextPlaneA       ; Call draw text subroutine

lea       String4, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0504, d1          ; XY (5, 4)
move.l    #0x3, d2             ; Palette 3
jsr       DrawTextPlaneA       ; Call draw text subroutine

lea       String5, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0106, d1          ; XY (1, 6)
move.l    #0x3, d2             ; Palette 3
jsr       DrawTextPlaneA       ; Call draw text subroutine

lea       String6, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0107, d1          ; XY (1, 7)
move.l    #0x3, d2             ; Palette 3
jsr       DrawTextPlaneA       ; Call draw text subroutine

  ; Text strings (zero terminated)
  dc.b "0123456789",0
  dc.b ",.?!()""':#+-/",0
  dc.b "OVER THE LAZY DOG",0

  ; Include art assets
  include 'fonts\pixelfont.asm'

There’s plenty of improvements which can be made in future – there’s only support for uppercase letters (although the ASCII table could map any lowercase characters to the uppercase pattern IDs just for completeness), there’s no text wrapping at the end of a line (although perhaps some higher-level UI code could handle that). It would also be quite easy to be able to specify a font’s colour in the LoadFont subroutine, which would just replace any 1’s in the patterns as it copies.

It’s also unlikely that the code will be the fastest and most optimal method to do this sort of thing, but I’m still learning.



This source contains some corrections and improvements to init.asm posted previously:


Sega Megadrive – 4: Hello, world!

Time to get serious. I’ve got as far as getting my assembler, emulator and debugger working, I’ve learned some basics of 68000 assembly language, and the Megadrive is now initialised and ready to do something. Unfortunately this step wasn’t any easier, the VDP is a complicated beast to get going and has many quirks. Anyway, the aim of this article – however long – is to explain how I got “HELLO WORLD” on screen.

It’s not as simple as printf(“Hello, world!”). The machine has no standard I/O library, no debug text system, and no concept of a font whatsoever. Tiles representing text glyphs need to be created from scratch and moved to the correct positions on the VDP, as do the colour palettes used to paint them. I’ll make a start with all of the theory that I’ve learned on palettes, patterns and planes first.


The Megadrive’s VDP represents a colour in 9 bits, using 3 bits each for the red, green and blue components. With 3 bits, each component has 8 possible values, therefore the VDP is capable of displaying 512 colours. Colours must be predefined, and are stored in a section of VDP memory in tables of 16 colours – called palettes. This section of memory is called CRAM (colour RAM), and there’s space for 64 colour entries, therefore the VDP can store 4 palettes of 16 colours at any one time. The palettes can be swapped in and out from main RAM at any time, so this isn’t a global restriction throughout the life of the program. A typical palette is defined something like this:

   dc.w 0x0000 ; Colour 0 - Transparent
   dc.w 0x000E ; Colour 1 - Red
   dc.w 0x00E0 ; Colour 2 - Green
   dc.w 0x0E00 ; Colour 3 - Blue
   dc.w 0x0000 ; Colour 4 - Black
   dc.w 0x0EEE ; Colour 5 - White
   dc.w 0x00EE ; Colour 6 - Yellow
   dc.w 0x008E ; Colour 7 - Orange
   dc.w 0x0E0E ; Colour 8 - Pink
   dc.w 0x0808 ; Colour 9 - Purple
   dc.w 0x0444 ; Colour A - Dark grey
   dc.w 0x0888 ; Colour B - Light grey
   dc.w 0x0EE0 ; Colour C - Turquoise
   dc.w 0x000A ; Colour D - Maroon
   dc.w 0x0600 ; Colour E - Navy blue
   dc.w 0x0060 ; Colour F - Dark green

…and that looks like this:

The colour names were guesses, I’m no artiste. Entry 0 of any palette is used to determine a transparent pixel, and is used as the background colour by default.


Patterns are blocks of image data 8 x 8 pixels in size. Each pixel colour is represented using one nybble – the ID of the colour inside a palette – so a pattern can be represented in 8 longwords of data. Here’s an example, the letter H:

   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11111110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x00000000

Assuming this pattern uses the palette given in the example above (so colour 1 represents red) and the background colour was white (colour 0 represents transparency, regardless of the value in the palette), we’d expect it to look like this if layed out on a grid:

If the 1’s were replaced with 2, it would be a green H, and if the 0’s were replaced with D it would sit on a maroon background. I haven’t utilised all of the space – there’s one line blank to the right and bottom of the glyph – to ensure that when font patterns are sat adjacent to each other there’s a very small gap, to ensure they are legible.


A plane is a kind of canvas, and the Megadrive’s VDP has 4 of them – two scrolling planes (plane A and plane B), a window plane, and a sprite plane. The scrolling and window planes can display grids made up of tiles of image patterns, positioned at predetermined cells depending on the VDP’s display mode (32×28 or 40×28 cells). The two scrolling planes can scroll lines of pixels (or groups of lines), or the entire contents left or right. The window plane is still a mystery to me, it can be moved around using the X and Y position in VDP registers 17 and 18, but it cannot overlap plane A. I don’t quite understand how it CAN’T overlap plane A, since the A and B planes can only scroll and not move around in their entirety. I’ll revisit this later.

The sprite plane can display patterns at arbitrary X and Y coordinates, and flip them vertically or horizontally. It also features priorities for each sprite, so their draw order can be defined. I’ll write up more about sprites in a further article, there’s quite a lot to them and since I’ll be doing the text display on plane A they’re beyond the scope of this post.

Preparing the VDP for writing data

This bit hurt my brain. In its basic form, moving palette and pattern data to the VDP comprises two operations: set the operation type and destination address through the control port, then move the data through the data port. Sounds simple, but the operation type and address need to be amalgamated into one longword, with a rather obscure bit structure. I’ll try to explain as best as I understand it myself. Here’s the operation/address longword split up into bits and nybbles:


The A’s hold the destination address, the B’s hold the operation type, and the 0’s are always 0. Let’s start with the address. The bits for the destination address need to be laid out in this pattern:

--DC BA98 7654 3210 ---- ---- ---- --FE

where 0 is the least significant bit, F is the most significant. For example, if we wanted to write to the VDP’s memory at address 0xC000 (which is the address of Plane A’s tile information, set via register 2 in our initialisation code), we’d first convert the address to a binary word:

1100 0000 0000 0000

and then rearrange it according to the bit template above:

0000 0000 0000 0000 0000 0000 0000 0011

Next, we need to set up the other bits to describe the type of operation we’re performing. Using six bits, we can describe the following operations:

  • 000000 – VRAM Read
  • 100000 – VRAM Write
  • 000100 – CRAM Read
  • 110000 – CRAM Write
  • 001000 – VSRAM Read
  • 101000 – VSRAM Write

These also need to be laid out into a specific order:

10-- ---- ---- ---- ---- ---- 5432 ----

So if we need to write to a VRAM address, we get:

01-- ---- ---- ---- ---- ---- 0000 ----

Put the address and the operation type together, and we get:

1000 0000 0000 0000 0000 0000 0000 0011

which in HEX is 0x40000003. Now we can move it to the VDP’s control port (I/O address 0x00C00004) to tell it we’re about to write data to VRAM address 0xC000:

move.l #0x40000003, 0x00C00004

I’m really not sure why this has to be so complex. Perhaps the bits are laid out in order of importance, so that they can be immediately acted on before the rest of the data is received. Perhaps we’re able to write a single word or byte to describe certain operations plus a small amount of data, so the bit layout needs to support this. For example, you only need to write a word of data to tell the VDP to change a register value. In any case, working this out is a bit of a pain when working regularly with the VDP, so I managed to find a javascript tool to calculate the longword for me. You’ll find it in the references section below.

Once the operation type and destination address have been written to the control port, the data itself can now be written to the data port. The VDP data port accepts data in bytes or words only, so if we need to write more data than that (which in 99% of cases, we will) then we could either increment the address manually and write it to the control port again, or make use of a feature called autoincrement. Autoincrement will – as the title vaguely suggests – automatically increment the destination address after each write to the port. Not only does this mean we can feed the data port a whole stream of information in one go, but it also means we can perform a longword write to the port, and it will be treated as two seperate word writes. To enable autoincrement, we set the autoincrement register (VDP register 15) to the amount of bytes we’d like it to increment by, which I’ll set as 2 and leave it:

   move.w #0x8F02, 0x00C00004   ; Set autoincrement to 2 bytes

Writing the data

Writing a palette

Let’s start with writing the palette. Palette 0 belongs in address 0x0000 of CRAM, so first we need to setup the VDP to write to CRAM (operation type 110000). Using the bit template above, a write operation to CRAM address 0x000 gives us 0xC0000003:

   move.l #0xC0000003, 0x00C00004 ; Set up VDP to write to CRAM address 0x0000

Next, assuming that autoincrement is still set to 2 bytes, we can move the palette data to the VDP’s data port at 0x00C00004 in one big loop:

   lea Palette, a0          ; Load address of Palette into a0
   move.l #0x07, d0         ; 32 bytes of data (8 longwords, minus 1 for counter) in palette

   move.l (a0)+, 0x00C00000 ; Move data to VDP data port, and increment source address
   dbra d0, @Loop

A new opcode here – LEA (load effective address) – which is a quicker way (both typing and CPU cycles) of loading the address of a label into an address register, verses using move.l.

We now have the opportunity to get our very first thing on screen, and confirm that everything blindly coded so far (the header, the initialisation code, the palette upload) is correct – we can use VDP register 7 to set the background colour to one of the colours in this palette. Bits 0-3 (first nybble) of register 7 represent the colour ID, and bits 4-7 (second nybble) represent the palette ID. So, using the example palette data above, we can set the background colour to pink (colour 8) using:

   move.w #0x8708, 0x00C00004  ; Set background colour to palette 0, colour 8

Build and run the ROM, and here we go:

Finally, 267 lines of code later, and we have something on screen! Fortunately, getting something a little more interesting than a big coloured window didn’t involve much work from this point.

Writing the patterns

The next step in the Hello World adventure is to design – and move to the VDP – some patterns representing all of the letters required to write the phrase. We’ll need H, E, L, O, W, R, and D.

   dc.l 0x11000110 ; Character 0 - H
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11111110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x00000000

   dc.l 0x11111110 ; Character 1 - E
   dc.l 0x11000000
   dc.l 0x11000000
   dc.l 0x11111110
   dc.l 0x11000000
   dc.l 0x11000000
   dc.l 0x11111110
   dc.l 0x00000000

   dc.l 0x11000000 ; Character 2 - L
   dc.l 0x11000000
   dc.l 0x11000000
   dc.l 0x11000000
   dc.l 0x11000000
   dc.l 0x11111110
   dc.l 0x11111110
   dc.l 0x00000000

   dc.l 0x01111100 ; Character 3 - O
   dc.l 0x11101110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11101110
   dc.l 0x01111100
   dc.l 0x00000000

   dc.l 0x11000110 ; Character 4 - W
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11010110
   dc.l 0x11101110
   dc.l 0x11000110
   dc.l 0x00000000

   dc.l 0x11111100 ; Character 5 - R
   dc.l 0x11000110
   dc.l 0x11001100
   dc.l 0x11111100
   dc.l 0x11001110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x00000000

   dc.l 0x11111000 ; Character 6 - D
   dc.l 0x11001110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11001110
   dc.l 0x11111000
   dc.l 0x00000000

Don’t stare at it too long, it makes funny patterns on your brain. There’s one character missing – the SPACE inbetween the words. That’s sort of already been implemented for us – a whole pattern of 0’s (transparency) will do the job, the VDP’s VRAM is already full of zeroes, and every tile ID on planes A, B and W is already set to zero, so an entire screen of blank patterns is already being displayed. If we skip the first pattern (32 bytes) when we write the font to the VDP, then pattern ID 0 will be a blank space.

So, we need to write this data to VRAM (that’s operation type 100000) at an offset of 0x20 (skips the first pattern). Using the bit template in the last section, that should give us the VDP command 0x40100000. 7 characters, 32 bytes each, that’s 56 longwords – let’s go:

   move.l #0x40200000, 0x00C00004 ; Set up VDP to write to VRAM address 0x0020
   lea Characters, a0             ; Load address of Characters into a0
   move.l #0x37, d0               ; 32*7 bytes of data (56 longwords, minus 1 for counter) in the font

   move.l (a0)+, 0x00C00000       ; Move data to VDP data port, and increment source address
   dbra d0, @Loop

Again, this assumes we haven’t touched the autoincrement register, and it’s still set to 2 bytes. Now the font data is in the VDP’s memory, sitting dormant until we set one of the planes up to paint them.

Matching Patterns to Tiles

As mentioned before, pattern #0 is already being drawn to every tile of planes A, B and W. To get some of these characters on screen, we need to change those tiles’ pattern IDs to those of the patterns we’d like to draw. The data that describes how each tile is drawn lives in VRAM, and there’s a block of data for each plane – the addresses for these are set up in VDP registers 2, 3, and 4, for planes A, W and B respectively. For this article, I’ll be drawing the text to plane A, which has been set to address 0xC000 in VDP register 3. All information needed to describe the tile fits into one word, and again we need to shuffle some bits around to match a template:



  • Bit A – Low or high plane (I don’t quite understand this yet)
  • Bits B – Colour palette ID (0, 1, 2 or 3)
  • Bit C – Horizontal flip (0 = drawn as-is, 1 = flip the tile horizontally)
  • Bit D – Vertical flip (0 = drawn as-is, 1 = flip the tile vertically)
  • Bits E – the ID of the pattern to be drawn

So, if we’d like to draw a pattern using colour palette 0, with no flipping, then it’s as easy as writing the pattern ID to the tile’s address. Let’s test it by setting plane A tile 0 to pattern ID 1, which should be the letter H. First, we need to put together the VDP command to write to VRAM (operation type 100000) at address 0xC000 using the bit template – this should give us 0x40000003.

   move.l #0x40000003, 0x00C00004 ; Set up VDP to write to VRAM address 0xC000 (Plane A)
   move.w #0x0001, 0x00C00000     ; Low plane, palette 0, no flipping, tile ID 1

Assemble and run, and we should see the Letter H in the top-left hand corner of the screen:

To keep this article simple, I won’t dwell into changing the palette or applying flipping, there’s no need yet.

Now it should be simple to display the rest of the characters; assuming autoincrement is still set to 2 we can write to consecutive tiles one by one:

   move.l #0x40000003, 0x00C00004 ; Set up VDP to write to VRAM address 0xC000 (Plane A)

   ; Low plane, palette 0, no flipping, plus tile ID...
   move.w #0x0001, 0x00C00000     ; Pattern ID 1 - H
   move.w #0x0002, 0x00C00000     ; Pattern ID 2 - E
   move.w #0x0003, 0x00C00000     ; Pattern ID 3 - L
   move.w #0x0003, 0x00C00000     ; Pattern ID 3 - L
   move.w #0x0004, 0x00C00000     ; Pattern ID 4 - O
   move.w #0x0000, 0x00C00000     ; Pattern ID 0 - Blank space
   move.w #0x0005, 0x00C00000     ; Pattern ID 5 - W
   move.w #0x0004, 0x00C00000     ; Pattern ID 4 - O
   move.w #0x0006, 0x00C00000     ; Pattern ID 6 - R
   move.w #0x0003, 0x00C00000     ; Pattern ID 3 - L
   move.w #0x0007, 0x00C00000     ; Pattern ID 7 - D

A tidier way would be to have a table of the pattern IDs and use a loop to write the data, but since the next article will be about writing a proper text display routine there’s no real need to complicate this supposedly “simple” example any further.
Here’s the finished result:



Sega Megadrive – 3: Awaking the Beast

This bit was difficult. When the Megadrive is turned on, you get a blank slate. Nothing is initialised for you – the RAM is full of garbage, the controller ports are dead, and the VDP is cold, alone and scared – you have to restore some sanity and set each piece up one by one. What makes it even more difficult, is that you get no visual feedback that it’s been done correctly until you’ve set up enough things to start displaying something on screen – and that takes a LOT of code.

I’ve found various tutorials and code samples showing how to initialise the Megadrive, to the point where we can begin doing some VDP work and get a few pixels showing. Unfortunately they were a little complex for me, I lost some hair trying to get it to work with my chosen assembler, a lot of things were left unexplained, and I’ve had to do some research to fill in the gaps. Now that I know how each step works I’ve since rewritten the code, breaking things down into smaller steps and commenting every line. Here’s each step explained:

1. Checking the Reset Button

The first thing to figure out is if we need to do anything at all. If the player pressed the reset button, then everything will already have been setup and we can just jump straight to the action again. From all the sample code I’ve seen, two separate reset indicators are checked – one is the physical button on the console, but I can’t find any information about the other one. Perhaps it has something to do with the expansion port, so that future addon hardware (the MegaCD or the 32X) can trigger a software reset. Anyway, here’s how to check:

EntryPoint:          ; Entry point address set in ROM header
   tst.w 0x00A10008  ; Test mystery reset (expansion port reset?)
   bne Main          ; Branch if Not Equal (to zero) - to Main
   tst.w 0x00A1000C  ; Test reset button
   bne Main          ; Branch if Not Equal (to zero) - to Main

If the results of the test are non-zero, then a soft reset has occurred and we can branch straight to Main, skipping all of this initialisation.

We test two addresses – they’re not addresses in main memory, but mapped to some specific hardware ports. Addresses starting from 0x00A00000 are not those of main RAM, but are the system I/O areas, which point to various ports or the memory of other coprocessors within the Megadrive. Most of the system I/O addresses can be found in a technical manual straight from Sega themselves, which can be found in the references at the bottom of this post.

2. Clearing the RAM

When the system is powered up, the RAM could be in any old state. Most good emulators clear it when loading a ROM, but this isn’t going to be of much help when I finally get hold of some development hardware and start scratching my head at the garbled mess on screen. We know the Megadrive’s RAM is 64kb in size, and technically we know where its address mappings begin and end since we’ve defined that in the ROM header, but it seems to be common practise to rely on the machine’s ability to wrap around the end of the physical addresses back to the beginning, and clear it from 0x00000000 backwards.

If we put 0x00000000 into an address register, and then use pre-decrement when writing a zero to that address, we’ll wrap around to the end of memory and clear the last byte:

move.l #0x00000000, d0     ; Place a 0 into d0, ready to copy to each longword of RAM
move.l #0x00000000, a0     ; Starting from address 0x0, clearing backwards
move.l #0x00003FFF, d1     ; Clearing 64k's worth of longwords (minus 1, for the loop to be correct)
move.l d0, -(a0)           ; Decrement the address by 1 longword, before moving the zero from d0 to it
dbra d1, @Clear            ; Decrement d0, repeat until depleted

I’ve purposely written a whole longword to the d1 register, where just a word-sized MOVE would suffice for the byte count 0x3FFF. This is because I have no idea if the registers will have been cleared or not when the system was powered on. Better safe than crashy.

3. Writing the TMSS

The Trade Mark Security Signature – or TMSS – was a feature put in by Sega to combat unlicensed developers from releasing games for their system, which is a kind of killswitch for the VDP. It’s the pinnacle of security systems, a very sophisticated encryption key which is almost uncrackable. You write the string “SEGA” to 0x00A14000.

This was only implemented in the second hardware version of the Megadrive, so we need to test the system’s version number at mapped I/O address 0x00A10001 before proceeding. This points to a byte of read-only memory, possibly on another chip, which stores the version ID (bits 0-3), CPU clock/region (bit 6 on = 7.60mhz PAL, off = 7.67mhz NTSC), and domestic/overseas model (bit 7). We only need to test the bottom four bits (one nybble):

   move.b 0x00A10001, d0      ; Move Megadrive hardware version to d0
   andi.b #0x0F, d0           ; The version is stored in last four bits, so mask it with 0F
   beq @Skip                  ; If version is equal to 0, skip TMSS signature
   move.l #'SEGA', 0x00A14000 ; Move the string "SEGA" to 0xA14000

I’m unsure at what point the signature is checked and VDP killswitch activated, whether it’s by time or the first VDP command is sent. Either way, the VDP is now safe. There’s also a new opcode there – ANDI (immediate logic AND), which ANDs two values, storing the result in d0.

4. Initialising the Z80

Next, we can begin initialising each of the Megadrive’s coprocessors, starting with the Zilog Z80. The Z80 is the same 8-bit chip used in the Sega Master System, and in the Megadrive it acts as both a controller for the PSG and FM sound chips, and a backwards compatibility processor for playing Master System games (with an appropriate adapter for the cartridge). The Z80 has its own set of registers, and various command and data ports for sending it instructions and information, as do the other coprocessors. It also has 8kb of RAM to itself. To send it commands, or some data, we can simply MOVE values to mapped I/O addresses.

The Z80 needs a few things doing – first, we need to request access to its bus, so that it can listen to us. We request – or release – control of the bus by writing 0x0100 or 0 to its BUSREQ port, and then wait in a loop until we have control, by reading this same port. We also need to stop it running by holding it in a reset state – again by writing a 1 to one of its ports. Whilst we’re holding it in this state, we can freely write a program to its RAM. Finally, we release control of the bus and let go of the reset state, and it can then be left alone to act on the data.

   move.w #0x0100, 0x00A11100 ; Request access to the Z80 bus, by writing 0x0100 into the BUSREQ port
   move.w #0x0100, 0x00A11200 ; Hold the Z80 in a reset state, by writing 0x0100 into the RESET port

   btst #0x0, 0x00A11100   ; Test bit 0 of A11100 to see if the 68k has access to the Z80 bus yet
   bne @Wait               ; If we don't yet have control, branch back up to Wait

Here’s a new opcode, BTST (bit test). It does the same as TST, but only compares the least significant bits.

Now the 68000 has access to the Z80’s bus, and the chip is held in a reset state, so we can write the program data to its memory. This is mapped from 0xA000000.

   move.l #Z80Data, a0      ; Load address of data into a0
   move.l #0x00A00000, a1   ; Copy Z80 RAM address to a1
   move.l #0x29, d0         ; 42 bytes of init data (minus 1 for counter)
   move.b (a0)+, (a1)+      ; Copy data, and increment the source/dest addresses
   dbra d0, @Copy

   move.w #0x0000, 0x00A11200 ; Release reset state
   move.w #0x0000, 0x00A11100 ; Release control of bus

Now the chip starts running again, and begins executing the program written to its memory. I keep glossing over this ‘program’ since I don’t yet have any clue as to what it does! I’ll get some documentation and dissect it bit by bit once I start doing some audio work.

   dc.w 0xaf01, 0xd91f
   dc.w 0x1127, 0x0021
   dc.w 0x2600, 0xf977
   dc.w 0xedb0, 0xdde1
   dc.w 0xfde1, 0xed47
   dc.w 0xed4f, 0xd1e1
   dc.w 0xf108, 0xd9c1
   dc.w 0xd1e1, 0xf1f9
   dc.w 0xf3ed, 0x5636
   dc.w 0xe9e9, 0x8104
   dc.w 0x8f01

5. Initialising the PSG

This one is the Programmable Sound Generator. It can generate square waves and white noise for procedurally creating sounds. As with the Z80 program, I have no idea what the sample data does yet, I’ll look into it at a later date. Copying data to the PSG is a lot simpler than the Z80, since we can just write the data straight to its RAM through an I/O address without requesting bus access:

   move.l #PSGData, a0      ; Load address of PSG data into a0
   move.l #0x03, d0         ; 4 bytes of data
   move.b (a0)+, 0x00C00011 ; Copy data to PSG RAM
   dbra d0, @Copy

   dc.w 0x9fbf, 0xdfff

6. Initialising the VDP

The VDP – or Visual Display Processor – is the most complex of the coprocessors. It’s a dedicated graphics chip for displaying sprites and patterns, and warrants its own chapter, which I’ll write up in the next post – getting something on screen.

The VDP has its own set of registers (24 of them), as well as 64kb of dedicated RAM. Communication with the VDP is via two ports – the control port and the data port, which are I/O addresses mapped to 0x00C00004 and 0x00C00000 respectively. The control port is used for setting registers, and supplying a VDP RAM address ready to send data through the data port. The VDP can only send and receive data in bytes or words, but we can make use of a feature which automatically increments the destination address for us, and it will treat a longword write as two separate word writes. More about this feature in the next post.

Each of the VDP’s registers are used to set its various graphics modes, plane addresses and scrolling settings, amongst other things. We initialise the VDP by setting all of these registers, using a word-size command sent to the control port:

  • The top nybble is the command – 0x8XXX means set register value
  • The next nybble is the register number – so 0x80XX = set register 0, 0x81XX = set register 1, etc
  • The bottom byte is the data – so 0x82FF writes FF into register 2

To make things easier, we just keep one big table of all of the VDP’s register values, and copy the whole lot in one go:

 move.l #VDPRegisters, a0 ; Load address of register table into a0
 move.l #0x18, d0         ; 24 registers to write
 move.l #0x00008000, d1   ; 'Set register 0' command (and clear the rest of d1 ready)

 move.b (a0)+, d1         ; Move register value to lower byte of d1
 move.w d1, 0x00C00004    ; Write command and value to VDP control port
 add.w #0x0100, d1        ; Increment register #
 dbra d0, @Copy

Explanations (albeit short explanations) of the VDP registers can be found in chapter 4 of the SEGA2 doc (I’ve added a link to an HTML version in the references). Below is the minimum of things enabled to get started, but these registers will be revisited quite often as I work with more graphics features.

   dc.b 0x20 ; 0: Horiz. interrupt on, plus bit 2 (unknown, but docs say it needs to be on)
   dc.b 0x74 ; 1: Vert. interrupt on, display on, DMA on, V28 mode (28 cells vertically), + bit 2
   dc.b 0x30 ; 2: Pattern table for Scroll Plane A at 0xC000 (bits 3-5)
   dc.b 0x40 ; 3: Pattern table for Window Plane at 0x10000 (bits 1-5)
   dc.b 0x05 ; 4: Pattern table for Scroll Plane B at 0xA000 (bits 0-2)
   dc.b 0x70 ; 5: Sprite table at 0xE000 (bits 0-6)
   dc.b 0x00 ; 6: Unused
   dc.b 0x00 ; 7: Background colour - bits 0-3 = colour, bits 4-5 = palette
   dc.b 0x00 ; 8: Unused
   dc.b 0x00 ; 9: Unused
   dc.b 0x00 ; 10: Frequency of Horiz. interrupt in Rasters (number of lines travelled by the beam)
   dc.b 0x08 ; 11: External interrupts on, V/H scrolling on
   dc.b 0x81 ; 12: Shadows and highlights off, interlace off, H40 mode (40 cells horizontally)
   dc.b 0x34 ; 13: Horiz. scroll table at 0xD000 (bits 0-5)
   dc.b 0x00 ; 14: Unused
   dc.b 0x00 ; 15: Autoincrement off
   dc.b 0x01 ; 16: Vert. scroll 32, Horiz. scroll 64
   dc.b 0x00 ; 17: Window Plane X pos 0 left (pos in bits 0-4, left/right in bit 7)
   dc.b 0x00 ; 18: Window Plane Y pos 0 up (pos in bits 0-4, up/down in bit 7)
   dc.b 0x00 ; 19: DMA length lo byte
   dc.b 0x00 ; 20: DMA length hi byte
   dc.b 0x00 ; 21: DMA source address lo byte
   dc.b 0x00 ; 22: DMA source address mid byte
   dc.b 0x00 ; 23: DMA source address hi byte, memory-to-VRAM mode (bits 6-7)

7. Initialising the Controller Ports

The controller ports are generic 9-pin I/O ports, and are not particularly tailored to any device. They have five mapped I/O address each – CTRL, DATATX, RX and S-CTRL:

  • CTRL controls the I/O direction and enables/disables interrupts generated by the port
  • DATA is used to send/receive data to or from the port (in bytes or words) when the port is in parallel mode
  • TX and RX are used to send/receive data in serial mode
  • S-CTRL is used to get/set the port’s current status, baud rate and serial/parallel mode.

The SEGA2 doc mentions three controller ports – Controller 1, Controller 2, and EXP. I’m guessing EXP is the 9-pin expansion port on the back of the version 1 Genesis, perhaps intended for basic non-joypad peripherals that didn’t require the full expansion port on the bottom of the unit.

   ; Set IN I/O direction, interrupts off, on all ports
   move.b #0x00, 0x000A10009 ; Controller port 1 CTRL
   move.b #0x00, 0x000A1000B ; Controller port 2 CTRL
   move.b #0x00, 0x000A1000D ; EXP port CTRL

8. Clearing the Registers and Tidying Up

Now everything should be initialised ready for some real work, but it would be best if the actual game code could start with a clean slate. Some rubbish is still in the registers, so let’s clear it:

   move.l #0x00000000, a0    ; Move 0x0 to a0
   movem.l (a0), d0-d7/a1-a7 ; Multiple move 0 to all registers

Here’s a very useful opcode – MOVEM (move multiple). It can move data to/from a list of registers or register ranges, for example d0,d3,d5 or a3-a5. A common use for it would be to backup/restore all of the registers to/from the stack, in a single instruction.

Next, the status register. The only thing I currently understand about the status register is that certain opcodes can leave the results of an operation in it, like a return value in C/C++. After some reading, it turns out that it can also store the stack pointer register used for interrupts (so that the JMP to an interrupt routine doesn’t trample over the real stack), enable or disable interrupts, and to enable or disable tracing (calls a routine after every opcode, useful for storing callstacks for an exception handler).

   ; Init status register (no trace, A7 is Interrupt Stack Pointer, no interrupts, clear condition code bits)
   move #0x2700, sr

And that’s it! The system is initialised, albeit in a very minimal state, ready to do some work. I’ll come back and amend the init code later if I need more functionality out of the machine. Now to jump to the main game code, which I’ve labelled as __main in a separate ASM file. I’ve also labelled the JMP itself as Main, so that we branch here if the reset button has been pressed and the initialisation is skipped:

   jmp __main ; Jump to the game code!




Sega Megadrive – 2: So, assembly language, then…

I’ve been toying with the idea of learning an assembly language for some considerable time. I tried – and failed – to get to grips with 68k ASM on the Atari STe, but that was mostly not being able to figure out how to get the DevPac IDE to stop crashing. Perhaps I had a bad disk, or not enough RAM in my STe (I think it was the measly 512k model). I’ve since given 68k a second shot, on the Megadrive, and this stuff is finally beginning to sink in. This post shows the things I’ve learned so far, some of the troubles I ran into, and some of things I still find confusing.

I’ve already got 10 or so years (three of those professionally) of C and C++ programming under my belt, so I’ve had a good head start, and I’m hopeful that this won’t to be too tricky to learn. I’m already familiar with some of the more advanced concepts of programming, such as working with raw bytes, bitwise operations, address alignment, and the best types of coffee to buy to make coding sessions more productive. So, here goes…

68k Assembly – The Basics

One line of 68k assembly code equals one CPU instruction (called an opcode) plus its parameters, so it’s an almost bare-metal experience working directly with the hardware. It’s one step up from working with machine code directly. Therefore, the programs used to create binaries only assemble the code into CPU instructions, there’s no real compiling involved. Fortunately, that means assembling is really fast, and you know exactly what you’re getting. Unfortunately, that means you have to do all of the hard work yourself, there’s limited language ‘features’ to help out – functions, enums, classes and structs, templates – just forget about them.

The purpose of most opcodes is to perform an operation on one or more bytes of data. This could be to move bytes from one location to another, or perform some arithmetic on them. The CPU is incapable of performing most tasks on the data whilst it is in main RAM, instead it has its own localised storage spaces (physically on the chip) where data is temporarily stored so it can be manipulated. These spaces are called registers, and the 68000 has 16 of them. 8 of them are general purpose registers – this is where the majority of arithmetic work will be done. Each general purpose register is 32 bits in size. The other 8 are address registers, and are only used for storing addresses of main memory for fetching or returning data from it, so they’re basically pointers that are attached to the CPU.

The general purpose registers have names d0 – d7, and the address registers a0 – a7. So, the fourth general purpose register is called d3, and the second address register is called a1. Some registers have aliases for ease of use. For example, a7 is commonly used as the stack pointer, and can also be referred to in code as ‘sp’.

Opcodes can perform operations using data from varying sources – main memory, one or more registers, or an immediate value (an integer, hex value, or binary value). Here’s a few examples of the MOVE opcode, it takes the first parameter, and moves it to the register or address in the second parameter:

 move.l #$10, d0   ; Moves the hex value 0x10 (decimal 16) to register d0
 move.l #%0101, d0 ; Moves the binary value 0101 (decimal 5) to register d0
 move.l #12, d0    ; Moves the decimal value 12 to register d0
 move.l d1, d0     ; Moves the value stored in register d1 to register d0
 move.l 0x8000, d0 ; Moves the value stored at address 0x8000 to register d0
 move.l d0, 0x8000 ; Moves the value stored in register d0 to address 0x8000
 move.l (a0), d0   ; Moves the value stored at the address in a0 to register d0
 move.l d0, (a0)   ; Moves the value stored in register d0 to the address stored in register a0

The first three examples show how to move immediate values to a register, signified by the # symbol before the value. An immediate value can be a hex value (prefixed with either $ or 0x), a binary value (prefixed with %), or a decimal value (no prefix). So to specify the immediate hex value 12, use #$12 or #0x12, to specify the binary value 0011 use #%0011, or for the decimal value 128 use #128. Example 4 shows how to move the contents of a register to another, and examples 5 and 6 show how to move the contents stored at an address in main memory to a register, and vice versa. Examples 7 and 8 show the same thing, but that main memory address is stored in the register a0. The brackets around register (a0) specify that the value at the address stored in a0 is to be moved, not the address itself, similar to the dereference operator in C/C++. Omitting the brackets would just move the address.

Not all opcodes can deal with data from all sources. Some can only operate on data in registers, some may or may not be able to use immediate values, and only select few opcodes can deal with data straight from main memory. A list of all of the 68k’s opcodes, including details of their usage and which source/destination values are permitted, are in the 68k Instruction Set PDF in the references section below.

The .l after the opcode is the size of the operation, in this case moving a longword of data (4 bytes). Opcodes can operate on bytes (.b, 8 bits), words (.w, 2 bytes) or longwords (.l, 4 bytes). Not all opcodes can operate on all data sizes, I’ve been checking the Instruction Set for which sizes are supported.

A few opcodes

I’ve been doing this for three months, and so far I’ve only used about 10 opcodes. It’s impressive how simple low-level computing like this can be, and even more impressive looking at some of the amazing games created with so few building blocks. Here’s a small guide to some of the opcodes I’ve found to be most useful:


The four basic arithmetic opcodes – add, subtract, multiply and divide. Add does exactly what it says on the tin. It adds the value in the first parameter (immediate value or register contents) to the register in the second parameter, and stores the result in that register. There’s a couple of variants of it – ADDI means add immediate, which only adds an immediate value to the contents of a register, ADDA adds a value to an address (NOT the value stored at the address, just the address itself), ADDQ which can very quickly add small immediate values (1 – 8), and ADDX which I’ve yet to figure out. There’s several variants because some are more expensive than others. I haven’t yet done any real optimisation to my code, but I guess paying attention to these small differences in opcode variants would be a good start when I get round to it. If I only needed to add 4 to a value, ADDQ would be faster than ADDI, for example.

add.l #0x10, d1  ; Adds the value 0x10 to register d1, and stores the result in d1
                 ; - longword size operation, so it uses all of the data in
                 ; the register

add.w d1, d2     ; Adds the contents of d1 to the contents of d2, and stores the
                 ; result in d2 - word size operation, so the top two bytes of both
                 ; registers are not referenced, and remain intact

addq.b #0x5, d3  ; Quickly adds 5 to the value in d3, storing the result in d3
                 ;  - byte size operation, so the upper three bytes are not
                 ; referenced, and stay intact

The last example is of byte size, so if d3 contained 0x000000FF the result would become 0x00000004, and would NOT roll to 0x00000100. It would need to be a word or longword size operation to do that.

MUL, SUB and DIV are used pretty much the same as ADD, and also have several variants. The Instruction Set doc shows each of their nuances and acceptable operation sizes.


CLR stands for clear. It sets a register (or data at an address), or part of a register depending on the operation size, to zero. It only takes one parameter, and that’s the register or address:

clr.l d0     ; Clears the whole of d0
clr.w (a0)   ; Clears the bottom word (2 bytes) of the data at the address in a0
clr.b d0     ; Clears the bottom byte of d0, leaving the rest intact


JMP means jump. It moves the program counter (the pointer to the current instruction) to another location, and continues executing. The address can be specified in hex, or more conveniently, using a label:

   jmp SomeLabel   ; An infinite loop!


JSR means to jump to subroutine. It does the same as JMP, but stores the original address of the program counter (by pushing it to the stack) before jumping, so that it can return later. RTS, meaning return to subroutine, pops the original address from the stack and does the jump back:

   move.l #0x8 d0   ; Do something useful
   jsr Label        ; Jump to Label
   move.l #0x12 d0  ; Will return here when RTS is called

   move.l #0x04, d0 ; Do something else
   rts              ; Return back


This means decrement and branch. It does the same as a jump, but tests to see if a register is zero first. If that register is non-zero, it decrements that register by 1, and then branches. If the register is zero, it doesn’t branch, and just continues to the next line. It’s a common tool used for implementing loops:

   move.b #0x6, d0 ; Looping round 7 iterations (includes the 0th iteration)

   add.l #0x1, d1  ; Add 1 do register d1
   dbra d0, Label  ; Test to see if d0 is zero yet, and if not decrement it and
                   ; jump back up to Label
   clr.l d1        ; Loop has finished, clear d1

CMP and Bcc

Bcc, meaning branch on condition, is a collection of various branch opcodes which only branch if the condition code of the status register adheres to some condition. The status register seems to be the state of the CPU after an operation, and each opcode leaves its condition code in a different state after execution, as a sort of return value. For example, the CMP opcode (meaning compare) will store the result of subtracting two values into the status register’s condition code. After that, the Bcc variant BEQ (branch if equal to zero) can test the result of that comparison, and branch or not based on it. It’s a common way to implement an IF statement.

Here’s a demonstration of most of the above opcodes, including a CMP and BEQ. It’s a subroutine which counts the number of characters in a null-terminated string, by iterating through each byte and checking if it is 0, whilst keeping count of each iteration:

   clr.l d0          ; Clear d0, ready to begin counting

   move.b (a0)+, d1  ; Move byte from address in a0 to d1, and then increment the address by 1 byte
   cmp.b #0x0, d1    ; Test if byte is zero
   beq.b @End        ; If byte was zero, branch to end
   addq.l #0x1, d0   ; Increment counter
   jmp @FindTerm     ; Jump back to FindTerm to loop round again

   rts               ; End of search, return back. Result is in r0

Example usage:

   move.l #StringAddr, a0  ; Move address of string to a0
   jsr GetStringLength     ; Jump to the GetStringLength subroutine
                           ; Length of string will now be stored in d0

   dc.b "HELLO WORLD", 0   ; A zero-terminated string

In the example, I’ve also introduced two new concepts. One is the + symbol after moving a value from (a0). This means post-increment; the address in a0 will be incremented by 1 byte after it has been read, similar to int a = b++ in the C++ language. The second concept is the @ symbol before the label FindTerm. This means the label is local – when referencing the label @FindTerm, it uses the address of the most recently defined @FindTerm label. This means you can have duplicate label names (loop could be a common name, perhaps) without any ambiguity.

That’s it for now. It doesn’t look like much, but I’ve managed to get as far as drawing text and sprites with no other opcodes than the ones listed, so they’re pretty powerful. There’s a few others I’ve touched briefly, like ROL and ROR, which shift bits left or right, but they don’t become useful until dealing with VDP addresses.


Sega Megadrive – 1: Getting Started

My favourite videogames console of all time – the Sega Megadrive. I’ve been pretty excited about getting started on this machine for many years now, and has been the catalyst which finally kicked me into learning some assembly language.

Now, I’ve jumped the gun a bit, since I was originally planning to work on these platforms in chronological order, which means the Nintendo Entertainment System is going to wait in the queue for a while (don’t be fooled, the Sega Master System II was released AFTER the Megadrive, because Sega are nuts like that). I also don’t yet have any development hardware (I’m currently in negotiations with some sellers, though), so I’ll be starting out with a PC emulator with debugging features. The Sonic disassembly packages over at Sonic Retro contain a modified (fixed for Windows 7) version of SN Systems’ 68000 assembler, which was a low cost alternative to Sega’s tools at the time, and used by many Megadrive developers.

The point is, I’m just too excited to leave this console alone, and if anything will kickstart my motivation for this project with a flying leap it will be the Megadrive.

The Sega Megadrive technical specifications

A quick and naive list of the console’s basics, but it’s all I need to know to get started:

  • CPU: Motorola 68000 at 7.61 mhz
  • Slave CPU: Zilog z80
  • Main memory: 64kb
  • Video: Yamaha YM7101 VDP (Video Display Processor)
  • Video memory: 64kb
  • Audio: Yamaha YM2612 FM chip, Texas Instruments SN76489 chip
  • Game media: Cartridge
  • Programming language: 68k Assembler language
  • Known development hardware: Official Sega Genesis dev unit, Cross Products MegaCD unit

The Tools

As mentioned, I’ve managed to get hold of the SN Systems ASM68K assembler, a command line tool for MS-DOS. Since there was no official IDE or text editor included, nor can I find any clues as to what was commonly used at the time, I’ll be using Microsoft Visual Studio, simply because I’m familiar with its keyboard shortcuts.

Until I can get hold of some hardware, I’ll be making use of a PC emulator which has some debugging features. After some searching around, it seems the MAME emulator MESS does a good job, Gens with the KMod plugin is capable of debugging, and I’ve also had Regen recommended to me on the forums. I’m inclined to start with MESS since it uses the same debugging shortcut keys as Visual Studio.

Testing the Assembler

Since documentation seems scarce, I’ve used the Sonic the Hedgehog disassemblies from Sonic Retro as a guide. The package contains a batch file used to build the Sonic source, and the assembler bit simply boils down to:

ASM68K.EXE source.asm,destination.bin

Let’s test something out:

   move.l #0xF, d0 ; Move 15 into register d0
   move.l d0, d1   ; Move contents of register d0 into d1
   jmp Loop        ; Jump back up to 'Loop'

…and that assembles just fine:

SN 68k version 2.53
Assembly completed.
0 error(s) from 4 lines in 0.1 seconds

I won’t pretend that I just came up with that assembly snippet like it was natural, it’s been a while since I last touched some 68k assembly (on the Atari STe) and it was the result of an hour or so of trawling through documentation and example code to refresh my memory!

The Megadrive ROM header

Unfortunately, it’s not as simple as loading up the generated ROM into an emulator and hitting Debug. A Megadrive ROM needs a header, which contains some meta info about the ROM, and a block of CPU vectors used to initialise the 68000 before the code gets executed. The header takes up 512 bytes at the very top of the ROM, and looks a little something like this:

	; ******************************************************************
	; Sega Megadrive ROM header
	; ******************************************************************
	dc.l   0x00FFE000      ; Initial stack pointer value
	dc.l   EntryPoint      ; Start of program
	dc.l   Exception       ; Bus error
	dc.l   Exception       ; Address error
	dc.l   Exception       ; Illegal instruction
	dc.l   Exception       ; Division by zero
	dc.l   Exception       ; CHK exception
	dc.l   Exception       ; TRAPV exception
	dc.l   Exception       ; Privilege violation
	dc.l   Exception       ; TRACE exception
	dc.l   Exception       ; Line-A emulator
	dc.l   Exception       ; Line-F emulator
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Spurious exception
	dc.l   Exception       ; IRQ level 1
	dc.l   Exception       ; IRQ level 2
	dc.l   Exception       ; IRQ level 3
	dc.l   HBlankInterrupt ; IRQ level 4 (horizontal retrace interrupt)
	dc.l   Exception       ; IRQ level 5
	dc.l   VBlankInterrupt ; IRQ level 6 (vertical retrace interrupt)
	dc.l   Exception       ; IRQ level 7
	dc.l   Exception       ; TRAP #00 exception
	dc.l   Exception       ; TRAP #01 exception
	dc.l   Exception       ; TRAP #02 exception
	dc.l   Exception       ; TRAP #03 exception
	dc.l   Exception       ; TRAP #04 exception
	dc.l   Exception       ; TRAP #05 exception
	dc.l   Exception       ; TRAP #06 exception
	dc.l   Exception       ; TRAP #07 exception
	dc.l   Exception       ; TRAP #08 exception
	dc.l   Exception       ; TRAP #09 exception
	dc.l   Exception       ; TRAP #10 exception
	dc.l   Exception       ; TRAP #11 exception
	dc.l   Exception       ; TRAP #12 exception
	dc.l   Exception       ; TRAP #13 exception
	dc.l   Exception       ; TRAP #14 exception
	dc.l   Exception       ; TRAP #15 exception
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)

	dc.b "SEGA GENESIS    "									; Console name
	dc.b "(C)SEGA 1992.SEP"									; Copyrght holder and release date
	dc.b "YOUR GAME HERE                                  "	; Domestic name
	dc.b "YOUR GAME HERE                                  "	; International name
	dc.b "GM XXXXXXXX-XX"									; Version number
	dc.w 0x0000												; Checksum
	dc.b "J               "									; I/O support
	dc.l 0x00000000											; Start address of ROM
	dc.l __end												; End address of ROM
	dc.l 0x00FF0000											; Start address of RAM
	dc.l 0x00FFFFFF											; End address of RAM
	dc.l 0x00000000											; SRAM enabled
	dc.l 0x00000000											; Unused
	dc.l 0x00000000											; Start address of SRAM
	dc.l 0x00000000											; End address of SRAM
	dc.l 0x00000000											; Unused
	dc.l 0x00000000											; Unused
	dc.b "                                        "			; Notes (unused)
	dc.b "JUE             "									; Country codes

Note that the assembler requires code and data to be tabbed one to the right, I’ll look into why this is necessary at a later date. Labels, however seem happy with no tabs.

The top section is a block of CPU vectors, read in when the system boots, and are used to initialise various registers and interrupt addresses. The first longword is the value of the stack pointer register when the system boots, although the rest of the registers must be initialised manually so I’m confused as to why this one must be explicitly set. The EntryPoint is the address of the first line of code that gets run, and the majority of the rest point to an exception routine to catch errors. Eventually I plan to write a proper exception handler for each type of problem, and print to screen some information which would help me diagnose the issue.

The HBlankInterrupt and VBlankInterrupt are routines that get called when the electron beam in the TV reaches the right hand side of the screen, and when the beam hits the bottom right before switching off and moving back to the top left. I guess modern LCD and plasma  TVs don’t have this concept, but from the examples I’ve seen the timing for these interrupts being called is clock-accurate, so they’re perfect for implementing timers.

The second block is some information about the cartridge, hopefully the comments are self explanatory. The ROM/SRAM start and end addresses make sense to me since a cartridge and its savegame space (if any) can be of variable size, but I’ve yet to discover why the RAM start and end addresses need explicitly defining. The checksum is not read by the boot code itself and nothing is done with it, it’s only there for the programmer to implement a check if they wish.

All of the addresses can just be specified in hex, but the assembler allows for labels which makes things a great deal easier. EntryPoint, Exception, __end, HBlankInterrupt and VBlankInterrupt will need defining:

   move.l #0xF, d0 ; Move 15 into register d0
   move.l d0, d1   ; Move contents of register d0 into d1
   jmp Loop        ; Jump back up to 'Loop'

   rte   ; Return from Exception

   rte   ; Return from Exception

__end    ; Very last line, end of ROM address

EntryPoint just loops around the little snippet I used to test out the assembler above. Both H/VBlankInterrupts and the Exception handler do nothing and return for now, I’ll experiment with those later. __end contains no code, it’s just a marker for the address of the last byte of the ROM. I’ve prefixed the label with underscores, simply to indicate that it’s not a subroutine and shouldn’t be called explicitly.
Ok, it should be ready to build and run!

Debugging the ROM

The ROM assembles with the ASM68k.EXE line demonstrated earlier. My chosen emulator, MESS, needs to be configured to enable the debugging features. After running MESS once, a mess.ini file is generated alongside the .exe, which contains a debug flag which can be set to 1. Now the ROM can be run using:

mess64.exe genesis -cart test.bin

MESS fires up, loads the ROM, and displays a debugging window. Unfortunately, I ran into a problem: the disassembly window shows garbage. The opcodes are mostly ‘ori’ and ‘illegal’, and I couldn’t make head or tail of my code:

After some digging around and tearing my hair out, the guys at pointed out that the first 15 bytes of my ROM didn’t belong there (I’m assuming the assembler added some sort of meta data to the start of the binary, perhaps for the SN debugger), and would need removing before it would work. After deleting those bytes using a hex editor (or assembling with the /p option), the ROM seems to work:

Much better, the opcodes are recognisable now. Time to test it out – MESS uses the same keyboard shortcuts for debugging as Visual Studio:

  • F9 – Set/unset breakpoint
  • F10 – Step over
  • F11 – Step into
  • SHIFT + F11 – Step out

So, after a single step (F10) the program counter moves straight to the address specified as the entry point in the header and executes it, and the value 0xF is moved into register r0. After a second step the contents of r0 (still 0xF) are moved into register r1, and after a third step the program counter is jumped back up to the first line again:

It’s not exactly Crysis, but it demonstrates that everything is in the right place and ready for the next part – initialising the Megadrive.