Sega Megadrive – 2: So, assembly language, then…

I’ve been toying with the idea of learning an assembly language for some considerable time. I tried – and failed – to get to grips with 68k ASM on the Atari STe, but that was mostly not being able to figure out how to get the DevPac IDE to stop crashing. Perhaps I had a bad disk, or not enough RAM in my STe (I think it was the measly 512k model). I’ve since given 68k a second shot, on the Megadrive, and this stuff is finally beginning to sink in. This post shows the things I’ve learned so far, some of the troubles I ran into, and some of things I still find confusing.

I’ve already got 10 or so years (three of those professionally) of C and C++ programming under my belt, so I’ve had a good head start, and I’m hopeful that this won’t to be too tricky to learn. I’m already familiar with some of the more advanced concepts of programming, such as working with raw bytes, bitwise operations, address alignment, and the best types of coffee to buy to make coding sessions more productive. So, here goes…

68k Assembly – The Basics

One line of 68k assembly code equals one CPU instruction (called an opcode) plus its parameters, so it’s an almost bare-metal experience working directly with the hardware. It’s one step up from working with machine code directly. Therefore, the programs used to create binaries only assemble the code into CPU instructions, there’s no real compiling involved. Fortunately, that means assembling is really fast, and you know exactly what you’re getting. Unfortunately, that means you have to do all of the hard work yourself, there’s limited language ‘features’ to help out – functions, enums, classes and structs, templates – just forget about them.

The purpose of most opcodes is to perform an operation on one or more bytes of data. This could be to move bytes from one location to another, or perform some arithmetic on them. The CPU is incapable of performing most tasks on the data whilst it is in main RAM, instead it has its own localised storage spaces (physically on the chip) where data is temporarily stored so it can be manipulated. These spaces are called registers, and the 68000 has 16 of them. 8 of them are general purpose registers – this is where the majority of arithmetic work will be done. Each general purpose register is 32 bits in size. The other 8 are address registers, and are only used for storing addresses of main memory for fetching or returning data from it, so they’re basically pointers that are attached to the CPU.

The general purpose registers have names d0 – d7, and the address registers a0 – a7. So, the fourth general purpose register is called d3, and the second address register is called a1. Some registers have aliases for ease of use. For example, a7 is commonly used as the stack pointer, and can also be referred to in code as ‘sp’.

Opcodes can perform operations using data from varying sources – main memory, one or more registers, or an immediate value (an integer, hex value, or binary value). Here’s a few examples of the MOVE opcode, it takes the first parameter, and moves it to the register or address in the second parameter:

 move.l #$10, d0   ; Moves the hex value 0x10 (decimal 16) to register d0
 move.l #%0101, d0 ; Moves the binary value 0101 (decimal 5) to register d0
 move.l #12, d0    ; Moves the decimal value 12 to register d0
 move.l d1, d0     ; Moves the value stored in register d1 to register d0
 move.l 0x8000, d0 ; Moves the value stored at address 0x8000 to register d0
 move.l d0, 0x8000 ; Moves the value stored in register d0 to address 0x8000
 move.l (a0), d0   ; Moves the value stored at the address in a0 to register d0
 move.l d0, (a0)   ; Moves the value stored in register d0 to the address stored in register a0

The first three examples show how to move immediate values to a register, signified by the # symbol before the value. An immediate value can be a hex value (prefixed with either $ or 0x), a binary value (prefixed with %), or a decimal value (no prefix). So to specify the immediate hex value 12, use #$12 or #0x12, to specify the binary value 0011 use #%0011, or for the decimal value 128 use #128. Example 4 shows how to move the contents of a register to another, and examples 5 and 6 show how to move the contents stored at an address in main memory to a register, and vice versa. Examples 7 and 8 show the same thing, but that main memory address is stored in the register a0. The brackets around register (a0) specify that the value at the address stored in a0 is to be moved, not the address itself, similar to the dereference operator in C/C++. Omitting the brackets would just move the address.

Not all opcodes can deal with data from all sources. Some can only operate on data in registers, some may or may not be able to use immediate values, and only select few opcodes can deal with data straight from main memory. A list of all of the 68k’s opcodes, including details of their usage and which source/destination values are permitted, are in the 68k Instruction Set PDF in the references section below.

The .l after the opcode is the size of the operation, in this case moving a longword of data (4 bytes). Opcodes can operate on bytes (.b, 8 bits), words (.w, 2 bytes) or longwords (.l, 4 bytes). Not all opcodes can operate on all data sizes, I’ve been checking the Instruction Set for which sizes are supported.

A few opcodes

I’ve been doing this for three months, and so far I’ve only used about 10 opcodes. It’s impressive how simple low-level computing like this can be, and even more impressive looking at some of the amazing games created with so few building blocks. Here’s a small guide to some of the opcodes I’ve found to be most useful:


The four basic arithmetic opcodes – add, subtract, multiply and divide. Add does exactly what it says on the tin. It adds the value in the first parameter (immediate value or register contents) to the register in the second parameter, and stores the result in that register. There’s a couple of variants of it – ADDI means add immediate, which only adds an immediate value to the contents of a register, ADDA adds a value to an address (NOT the value stored at the address, just the address itself), ADDQ which can very quickly add small immediate values (1 – 8), and ADDX which I’ve yet to figure out. There’s several variants because some are more expensive than others. I haven’t yet done any real optimisation to my code, but I guess paying attention to these small differences in opcode variants would be a good start when I get round to it. If I only needed to add 4 to a value, ADDQ would be faster than ADDI, for example.

add.l #0x10, d1  ; Adds the value 0x10 to register d1, and stores the result in d1
                 ; - longword size operation, so it uses all of the data in
                 ; the register

add.w d1, d2     ; Adds the contents of d1 to the contents of d2, and stores the
                 ; result in d2 - word size operation, so the top two bytes of both
                 ; registers are not referenced, and remain intact

addq.b #0x5, d3  ; Quickly adds 5 to the value in d3, storing the result in d3
                 ;  - byte size operation, so the upper three bytes are not
                 ; referenced, and stay intact

The last example is of byte size, so if d3 contained 0x000000FF the result would become 0x00000004, and would NOT roll to 0x00000100. It would need to be a word or longword size operation to do that.

MUL, SUB and DIV are used pretty much the same as ADD, and also have several variants. The Instruction Set doc shows each of their nuances and acceptable operation sizes.


CLR stands for clear. It sets a register (or data at an address), or part of a register depending on the operation size, to zero. It only takes one parameter, and that’s the register or address:

clr.l d0     ; Clears the whole of d0
clr.w (a0)   ; Clears the bottom word (2 bytes) of the data at the address in a0
clr.b d0     ; Clears the bottom byte of d0, leaving the rest intact


JMP means jump. It moves the program counter (the pointer to the current instruction) to another location, and continues executing. The address can be specified in hex, or more conveniently, using a label:

   jmp SomeLabel   ; An infinite loop!


JSR means to jump to subroutine. It does the same as JMP, but stores the original address of the program counter (by pushing it to the stack) before jumping, so that it can return later. RTS, meaning return to subroutine, pops the original address from the stack and does the jump back:

   move.l #0x8 d0   ; Do something useful
   jsr Label        ; Jump to Label
   move.l #0x12 d0  ; Will return here when RTS is called

   move.l #0x04, d0 ; Do something else
   rts              ; Return back


This means decrement and branch. It does the same as a jump, but tests to see if a register is zero first. If that register is non-zero, it decrements that register by 1, and then branches. If the register is zero, it doesn’t branch, and just continues to the next line. It’s a common tool used for implementing loops:

   move.b #0x6, d0 ; Looping round 7 iterations (includes the 0th iteration)

   add.l #0x1, d1  ; Add 1 do register d1
   dbra d0, Label  ; Test to see if d0 is zero yet, and if not decrement it and
                   ; jump back up to Label
   clr.l d1        ; Loop has finished, clear d1

CMP and Bcc

Bcc, meaning branch on condition, is a collection of various branch opcodes which only branch if the condition code of the status register adheres to some condition. The status register seems to be the state of the CPU after an operation, and each opcode leaves its condition code in a different state after execution, as a sort of return value. For example, the CMP opcode (meaning compare) will store the result of subtracting two values into the status register’s condition code. After that, the Bcc variant BEQ (branch if equal to zero) can test the result of that comparison, and branch or not based on it. It’s a common way to implement an IF statement.

Here’s a demonstration of most of the above opcodes, including a CMP and BEQ. It’s a subroutine which counts the number of characters in a null-terminated string, by iterating through each byte and checking if it is 0, whilst keeping count of each iteration:

   clr.l d0          ; Clear d0, ready to begin counting

   move.b (a0)+, d1  ; Move byte from address in a0 to d1, and then increment the address by 1 byte
   cmp.b #0x0, d1    ; Test if byte is zero
   beq.b @End        ; If byte was zero, branch to end
   addq.l #0x1, d0   ; Increment counter
   jmp @FindTerm     ; Jump back to FindTerm to loop round again

   rts               ; End of search, return back. Result is in r0

Example usage:

   move.l #StringAddr, a0  ; Move address of string to a0
   jsr GetStringLength     ; Jump to the GetStringLength subroutine
                           ; Length of string will now be stored in d0

   dc.b "HELLO WORLD", 0   ; A zero-terminated string

In the example, I’ve also introduced two new concepts. One is the + symbol after moving a value from (a0). This means post-increment; the address in a0 will be incremented by 1 byte after it has been read, similar to int a = b++ in the C++ language. The second concept is the @ symbol before the label FindTerm. This means the label is local – when referencing the label @FindTerm, it uses the address of the most recently defined @FindTerm label. This means you can have duplicate label names (loop could be a common name, perhaps) without any ambiguity.

That’s it for now. It doesn’t look like much, but I’ve managed to get as far as drawing text and sprites with no other opcodes than the ones listed, so they’re pretty powerful. There’s a few others I’ve touched briefly, like ROL and ROR, which shift bits left or right, but they don’t become useful until dealing with VDP addresses.


4 thoughts on “Sega Megadrive – 2: So, assembly language, then…

  1. Hey!
    I just started reading the 68k reference manual, and it states
    that there are 16 registers, d0-d7 and a0-a7, and that a7 is the sp. Is there a difference in the
    Sega MD?


  2. I’m new to all this and trying to get my head around the address registers.

    As the guide mentions bytes, words or longwords where one longword would be a registers maximum storage capacity but with bytes or words they fill from the right first to the left? Like 0001 0011 where the left is the “upper byte”

    Quoting “upper three bytes are not
    ; referenced, and stay intact”

    So it sort of makes sense to me with the whole base2 thing why theres nothing that will fill 3 bytes (other than 3 separate bytes of data being pushed into the register?)

    But with address registers I was getting really confused with the whole putting a bracket around the address register (a0) to stop the actual address moving but just the value that address holds. I looked up the megadrive memory map, would range from 0000h to 1FFFh (RAM) or 0 to 8191 in decimal? I mean how do we assign values to addresses? do we have any control of the address the data goes to? If it is random, how do we know where it goes?

    So specifically what I can’t get my head around is this:

    move.b (a0)+, d1 ; Move byte from address in a0 to d1, and then increment the address by 1 byte”

    so @findterm is a label created for the conditional loop/ branch loop, sort of like creating an IF function where its searching for a specific value and either loops IF it doesnt find it or branches to the end label IF it does find it…

    And from what I can figure out each time it doesn’t find the value its after, it just searches the next address and loops until it finds the value it wants. What’s really doing my head in is the address registers i.e. why is there a byte in the address register a0? is a0 not just pointing to a RAM address that has data stored at that location? and how does this relate to the memory map of the megadrive mentioned? with only 8 address registers how is this enough to cover the entire amount of RAM? so that we know exactly where in RAM certain data is stored? and can then point to it using the address register? or is this restricted and the same RAM shared with the VDP (which i haven’t gotten to yet). Also what if the address register holds an entire 4 bytes and we just want to reference to one of them single bytes? Im guessing if you only want to store a byte or a word we lose the capacity to store the other 2 or 3 bytes in that register?

    Probably jumping way ahead of myself but I just want it to make 100% sense before I move on. TBH I even got confused when you said each register can hold a maximum of 32bits and a long word would fill a register which is 4 bytes but then I realised 32bits = 4 bytes 😛

    finally does this mean each memory address stored in the register is 32bits/ 4bytes? Like the actual address data is what fills the a0 – a7 rather than the data AT the address filling that register? (which would just be a double up of that data) I think I’ve confused myself.

    • Hey Sprax, did you figure this out yet? I can answer some of it from my memory, although don’t take this as gospel, it has been a while!

      I think some of the confusion lies around what the brackets mean in the instruction you mentioned. (A0)+ means look at the memory address A0 is pointing to, not the address in A0 itself.

      So imagine A0 is pointing to address 0000FF00 and you call
      move.b (a0)+, d1 ;
      The afterwards D1 will contain whatever value was stored at 0000FF00 and A0 will now point to 0000FF02, because of the + sign for post increment (Note since you are only moving a byte it might actually point to 0000FF01, but from memory I think it always uses even addresses, double check this part for sure!)

      So next you’d likely have some instruction that does a compare (CMP) and follows that up with a branch (BEQ) and finally a jump (JMP) back to @FindTerm if it hasn’t branched.

      This time round your move.b instruction will take the byte from 0000FF02 and put it in D0 and then increase the address pointed to by A0 to 0000FF04.

      Make sense?

      Hopefully this also explains how, with only 8 address registers, we can cover the entire amount of RAM; we just change the address each of the registers is pointing to when we want to read a specific spot. How you keep track of what location is storing what I have no idea! I guess labels is the way to go but I can still see it getting very messy when you have lots of things to store!

      I can’t remember much about the byte, word, longword thing and how it affects different parts, if you can’t find it in the documentation I recommend creating yourself a little test program and just experimenting with the debugger, you’ll learn a lot through experimentation.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s