I’ve been toying with the idea of learning an assembly language for some considerable time. I tried – and failed – to get to grips with 68k ASM on the Atari STe, but that was mostly not being able to figure out how to get the DevPac IDE to stop crashing. Perhaps I had a bad disk, or not enough RAM in my STe (I think it was the measly 512k model). I’ve since given 68k a second shot, on the Megadrive, and this stuff is finally beginning to sink in. This post shows the things I’ve learned so far, some of the troubles I ran into, and some of things I still find confusing.
I’ve already got 10 or so years (three of those professionally) of C and C++ programming under my belt, so I’ve had a good head start, and I’m hopeful that this won’t to be too tricky to learn. I’m already familiar with some of the more advanced concepts of programming, such as working with raw bytes, bitwise operations, address alignment, and the best types of coffee to buy to make coding sessions more productive. So, here goes…
68k Assembly – The Basics
One line of 68k assembly code equals one CPU instruction (called an opcode) plus its parameters, so it’s an almost bare-metal experience working directly with the hardware. It’s one step up from working with machine code directly. Therefore, the programs used to create binaries only assemble the code into CPU instructions, there’s no real compiling involved. Fortunately, that means assembling is really fast, and you know exactly what you’re getting. Unfortunately, that means you have to do all of the hard work yourself, there’s limited language ‘features’ to help out – functions, enums, classes and structs, templates – just forget about them.
The purpose of most opcodes is to perform an operation on one or more bytes of data. This could be to move bytes from one location to another, or perform some arithmetic on them. The CPU is incapable of performing most tasks on the data whilst it is in main RAM, instead it has its own localised storage spaces (physically on the chip) where data is temporarily stored so it can be manipulated. These spaces are called registers, and the 68000 has 16 of them. 8 of them are general purpose registers – this is where the majority of arithmetic work will be done. Each general purpose register is 32 bits in size. The other 8 are address registers, and are only used for storing addresses of main memory for fetching or returning data from it, so they’re basically pointers that are attached to the CPU.
The general purpose registers have names d0 – d7, and the address registers a0 – a7. So, the fourth general purpose register is called d3, and the second address register is called a1. Some registers have aliases for ease of use. For example, a7 is commonly used as the stack pointer, and can also be referred to in code as ‘sp’.
Opcodes can perform operations using data from varying sources – main memory, one or more registers, or an immediate value (an integer, hex value, or binary value). Here’s a few examples of the MOVE opcode, it takes the first parameter, and moves it to the register or address in the second parameter:
move.l #$10, d0 ; Moves the hex value 0x10 (decimal 16) to register d0 move.l #%0101, d0 ; Moves the binary value 0101 (decimal 5) to register d0 move.l #12, d0 ; Moves the decimal value 12 to register d0 move.l d1, d0 ; Moves the value stored in register d1 to register d0 move.l 0x8000, d0 ; Moves the value stored at address 0x8000 to register d0 move.l d0, 0x8000 ; Moves the value stored in register d0 to address 0x8000 move.l (a0), d0 ; Moves the value stored at the address in a0 to register d0 move.l d0, (a0) ; Moves the value stored in register d0 to the address stored in register a0
The first three examples show how to move immediate values to a register, signified by the # symbol before the value. An immediate value can be a hex value (prefixed with either $ or 0x), a binary value (prefixed with %), or a decimal value (no prefix). So to specify the immediate hex value 12, use #$12 or #0x12, to specify the binary value 0011 use #%0011, or for the decimal value 128 use #128. Example 4 shows how to move the contents of a register to another, and examples 5 and 6 show how to move the contents stored at an address in main memory to a register, and vice versa. Examples 7 and 8 show the same thing, but that main memory address is stored in the register a0. The brackets around register (a0) specify that the value at the address stored in a0 is to be moved, not the address itself, similar to the dereference operator in C/C++. Omitting the brackets would just move the address.
Not all opcodes can deal with data from all sources. Some can only operate on data in registers, some may or may not be able to use immediate values, and only select few opcodes can deal with data straight from main memory. A list of all of the 68k’s opcodes, including details of their usage and which source/destination values are permitted, are in the 68k Instruction Set PDF in the references section below.
The .l after the opcode is the size of the operation, in this case moving a longword of data (4 bytes). Opcodes can operate on bytes (.b, 8 bits), words (.w, 2 bytes) or longwords (.l, 4 bytes). Not all opcodes can operate on all data sizes, I’ve been checking the Instruction Set for which sizes are supported.
A few opcodes
I’ve been doing this for three months, and so far I’ve only used about 10 opcodes. It’s impressive how simple low-level computing like this can be, and even more impressive looking at some of the amazing games created with so few building blocks. Here’s a small guide to some of the opcodes I’ve found to be most useful:
ADD, SUB, MUL, DIV
The four basic arithmetic opcodes – add, subtract, multiply and divide. Add does exactly what it says on the tin. It adds the value in the first parameter (immediate value or register contents) to the register in the second parameter, and stores the result in that register. There’s a couple of variants of it – ADDI means add immediate, which only adds an immediate value to the contents of a register, ADDA adds a value to an address (NOT the value stored at the address, just the address itself), ADDQ which can very quickly add small immediate values (1 – 8), and ADDX which I’ve yet to figure out. There’s several variants because some are more expensive than others. I haven’t yet done any real optimisation to my code, but I guess paying attention to these small differences in opcode variants would be a good start when I get round to it. If I only needed to add 4 to a value, ADDQ would be faster than ADDI, for example.
add.l #0x10, d1 ; Adds the value 0x10 to register d1, and stores the result in d1 ; - longword size operation, so it uses all of the data in ; the register add.w d1, d2 ; Adds the contents of d1 to the contents of d2, and stores the ; result in d2 - word size operation, so the top two bytes of both ; registers are not referenced, and remain intact addq.b #0x5, d3 ; Quickly adds 5 to the value in d3, storing the result in d3 ; - byte size operation, so the upper three bytes are not ; referenced, and stay intact
The last example is of byte size, so if d3 contained 0x000000FF the result would become 0x00000004, and would NOT roll to 0x00000100. It would need to be a word or longword size operation to do that.
MUL, SUB and DIV are used pretty much the same as ADD, and also have several variants. The Instruction Set doc shows each of their nuances and acceptable operation sizes.
CLR stands for clear. It sets a register (or data at an address), or part of a register depending on the operation size, to zero. It only takes one parameter, and that’s the register or address:
clr.l d0 ; Clears the whole of d0 clr.w (a0) ; Clears the bottom word (2 bytes) of the data at the address in a0 clr.b d0 ; Clears the bottom byte of d0, leaving the rest intact
JMP means jump. It moves the program counter (the pointer to the current instruction) to another location, and continues executing. The address can be specified in hex, or more conveniently, using a label:
SomeLabel: jmp SomeLabel ; An infinite loop!
JSR and RTS
JSR means to jump to subroutine. It does the same as JMP, but stores the original address of the program counter (by pushing it to the stack) before jumping, so that it can return later. RTS, meaning return to subroutine, pops the original address from the stack and does the jump back:
move.l #0x8 d0 ; Do something useful jsr Label ; Jump to Label move.l #0x12 d0 ; Will return here when RTS is called Label: move.l #0x04, d0 ; Do something else rts ; Return back
This means decrement and branch. It does the same as a jump, but tests to see if a register is zero first. If that register is non-zero, it decrements that register by 1, and then branches. If the register is zero, it doesn’t branch, and just continues to the next line. It’s a common tool used for implementing loops:
move.b #0x6, d0 ; Looping round 7 iterations (includes the 0th iteration) Label: add.l #0x1, d1 ; Add 1 do register d1 dbra d0, Label ; Test to see if d0 is zero yet, and if not decrement it and ; jump back up to Label clr.l d1 ; Loop has finished, clear d1
CMP and Bcc
Bcc, meaning branch on condition, is a collection of various branch opcodes which only branch if the condition code of the status register adheres to some condition. The status register seems to be the state of the CPU after an operation, and each opcode leaves its condition code in a different state after execution, as a sort of return value. For example, the CMP opcode (meaning compare) will store the result of subtracting two values into the status register’s condition code. After that, the Bcc variant BEQ (branch if equal to zero) can test the result of that comparison, and branch or not based on it. It’s a common way to implement an IF statement.
Here’s a demonstration of most of the above opcodes, including a CMP and BEQ. It’s a subroutine which counts the number of characters in a null-terminated string, by iterating through each byte and checking if it is 0, whilst keeping count of each iteration:
GetStringLength: clr.l d0 ; Clear d0, ready to begin counting @FindTerm: move.b (a0)+, d1 ; Move byte from address in a0 to d1, and then increment the address by 1 byte cmp.b #0x0, d1 ; Test if byte is zero beq.b @End ; If byte was zero, branch to end addq.l #0x1, d0 ; Increment counter jmp @FindTerm ; Jump back to FindTerm to loop round again @End: rts ; End of search, return back. Result is in r0
move.l #StringAddr, a0 ; Move address of string to a0 jsr GetStringLength ; Jump to the GetStringLength subroutine ; Length of string will now be stored in d0 StringAddr: dc.b "HELLO WORLD", 0 ; A zero-terminated string
In the example, I’ve also introduced two new concepts. One is the + symbol after moving a value from (a0). This means post-increment; the address in a0 will be incremented by 1 byte after it has been read, similar to int a = b++ in the C++ language. The second concept is the @ symbol before the label FindTerm. This means the label is local – when referencing the label @FindTerm, it uses the address of the most recently defined @FindTerm label. This means you can have duplicate label names (loop could be a common name, perhaps) without any ambiguity.
That’s it for now. It doesn’t look like much, but I’ve managed to get as far as drawing text and sprites with no other opcodes than the ones listed, so they’re pretty powerful. There’s a few others I’ve touched briefly, like ROL and ROR, which shift bits left or right, but they don’t become useful until dealing with VDP addresses.