If you examine the source code of a lot of different emulators, they are usually a big morass of embedded macros. At first, I thought that the people that wrote them were trying to be clever and avoid duplicate code, but the cleverness greatly obfuscated (complicated) the code. But as I debugged all the typos and cut & paste errors in my "clearly coded" 8080 emulator it occurred to me - a big advantage of the embedded macro approach is that it greatly cuts down on mistakes by virtually eliminating the cut and paste errors.
So I tried an approach to try to get the best of both worlds - straightforward, clean code but with the mistake avoidance of the macro approach. I wrote a file full of data about the 6502 opcodes, and wrote scripts to generate several different things. Then assuming the source data is correct, the generated data will also be correct. The same data can generate:
The disassembler
Tables of instructions by opcode
Table of instructions in alphabetical order
Table of instructions grouped by addressing mode
Most of the emulator
The scripts are written in python (tested in python 2.7). I formatted the data is formatted as csv - python has built-in functions to manipulate CSV. The data in the file looks like this:
0x4c,JMP,ABS,3,3,czidbvn 0x69,ADC,IMM,2,2,CZidbVN 0x0a,ASL,ACC,1,2,CZidbvN
The fields are: opcode, nemonic, addressing mode, instruction bytes, cycles to execute, effected flags.
The disassembler can be generated completely. Each instruction's addressing mode tells the disassembler how many bytes in the instruction (this could also be inferred by the addressing mode), and how to format the assembly instructions.
The python code to generate the disassembler is pretty simple. First, it builds a dictionary where the key is the opcode and the value is an array with the rest of the data.
ops=csv.reader(open(sys.argv[1], 'rb')) opdict = {} for op in ops: #ignore lines of formats I don't recognize # allows for blanks and comments if len(op) == 6: opcode=int(op[0],16) opdict[opcode] = [op[1], op[2], op[3], op[4], op[5]]
The contents of the dictionary after this code executes:
{ 0: ['BRK', 'IMP', '1', '7', 'czidbvn'], 1: ['ORA', 'INDX', '2', '6', 'cZidbvN'], 5: ['ORA', 'ZP', '2', '3', 'cZidbvN'], 6: ['ASL', 'ZP', '2', '5', 'CZidbvN'], 8: ['PHP', 'IMP', '1', '3', 'czidbvn'], 9: ['ORA', 'IMM', '2', '2', 'cZidbvN'], #etc....
Then I sort the keys, and loop through the keys, and use the addressing mode to know how to print the disassembler for each op. Here is a flavor of that python code:
opnums = opdict.keys() opnums.sort() for op in opnums: #disassemble depends on addressing mode addressingmode = opdict[op][1] if addressingmode == "IMP": print 'case 0x%02x: sprintf(opstr, "%s"); break;' % (op, opdict[op][0]) elif addressingmode == "ACC": print 'case 0x%02x: sprintf(opstr, "%s A"); break;' % (op, opdict[op][0]) elif addressingmode == "IMM": print 'case 0x%02x: sprintf(opstr, "%s %s", opcodes[1]); count = 2; break;' % (op, opdict[op][0],"#$%02x") elif addressingmode == "ABS":
Running the script prints out C code that looks like this:
case 0x00: sprintf(opstr, "BRK"); break; case 0x01: sprintf(opstr, "ORA ($%02x,X)", opcodes[1]); count = 2; break; case 0x05: sprintf(opstr, "ORA $%02x", opcodes[1]); count = 2; break; case 0x06: sprintf(opstr, "ASL $%02x", opcodes[1]); count = 2; break; case 0x08: sprintf(opstr, "PHP"); break; case 0x09: sprintf(opstr, "ORA #$%02x", opcodes[1]); count = 2; break;
I paste that into a C shell program that is very similar to the code I wrote for the 8080 disassembler.
Here is some code printed out by my disassembler. This is the beginning of the program to the Missle Command arcade game (the first instruction of the program starts in memory at $7B4C).
7B4C A2 FF LDX #$ff 7B4E 9A TXS 7B4F D8 CLD 7B50 A2 00 LDX #$00 7B52 8A TXA 7B53 78 SEI 7B54 95 00 STA $00,X 7B56 9D 00 01 STA $0100,X 7B59 CA DEX 7B5A D0 F8 BNE $f8 7B5C A9 40 LDA #$40 7B5E 8D 00 48 STA $4800 7B61 AD 00 49 LDA $4900 7B64 29 40 AND #$40 7B66 F0 04 BEQ $04 7B68 58 CLI 7B69 4C 13 50 JMP $5013 7B6C 4C 87 7C JMP $7c87 7B6F 8D 00 4C STA $4c00 7B72 E6 E9 INC $e9 7B74 AD 00 49 LDA $4900 7B77 10 34 BPL $34 7B79 E6 8D INC $8d 7B7B A5 C6 LDA $c6
You can download a copy of my CSV file, the python script to generate the disassembler, and the C source (including the shell) to the disassembler below. It is source code that will compile for command line as described here. The python script was tested on python 2.7 but is really basic - it will probably work at least back to python 2.4.
I also used the same technique and data to generate the 6502 reference page that I added to the site. I'll include those files too.
View my 6502 disassembler (also in the github project)
View project of the 6502 reference material generator
← Prev: 6502-addressing-modes Next: 6502-assembler →Post questions or comments on Twitter @realemulator101, or if you find issues in the code, file them on the github repository.