6502 Disassembler

If you examine the source code of a lot of different emulators, they are usually a big morass of embedded macros. At first, I thought that the people that wrote them were trying to be clever and avoid duplicate code, but the cleverness greatly obfuscated (complicated) the code. But as I debugged all the typos and cut & paste errors in my "clearly coded" 8080 emulator it occurred to me - a big advantage of the embedded macro approach is that it greatly cuts down on mistakes by virtually eliminating the cut and paste errors.

So I tried an approach to try to get the best of both worlds - straightforward, clean code but with the mistake avoidance of the macro approach. I wrote a file full of data about the 6502 opcodes, and wrote scripts to generate several different things. Then assuming the source data is correct, the generated data will also be correct. The same data can generate:

The scripts are written in python (tested in python 2.7). I formatted the data is formatted as csv - python has built-in functions to manipulate CSV. The data in the file looks like this:

   0x4c,JMP,ABS,3,3,czidbvn    
   0x69,ADC,IMM,2,2,CZidbVN    
   0x0a,ASL,ACC,1,2,CZidbvN    

The fields are: opcode, nemonic, addressing mode, instruction bytes, cycles to execute, effected flags.

The disassembler can be generated completely. Each instruction's addressing mode tells the disassembler how many bytes in the instruction (this could also be inferred by the addressing mode), and how to format the assembly instructions.

The python code to generate the disassembler is pretty simple. First, it builds a dictionary where the key is the opcode and the value is an array with the rest of the data.

   ops=csv.reader(open(sys.argv[1], 'rb'))    
   opdict = {}    
   for op in ops:    
    #ignore lines of formats I don't recognize    
    # allows for blanks and comments    
    if len(op) == 6:    
        opcode=int(op[0],16)    
        opdict[opcode] = [op[1], op[2], op[3], op[4], op[5]]    

The contents of the dictionary after this code executes:

   {    
   0: ['BRK', 'IMP', '1', '7', 'czidbvn'],    
   1: ['ORA', 'INDX', '2', '6', 'cZidbvN'],    
   5: ['ORA', 'ZP', '2', '3', 'cZidbvN'],    
   6: ['ASL', 'ZP', '2', '5', 'CZidbvN'],    
   8: ['PHP', 'IMP', '1', '3', 'czidbvn'],    
   9: ['ORA', 'IMM', '2', '2', 'cZidbvN'],    
   #etc....    

Then I sort the keys, and loop through the keys, and use the addressing mode to know how to print the disassembler for each op. Here is a flavor of that python code:

   opnums = opdict.keys()    
   opnums.sort()    
   for op in opnums:    
    #disassemble depends on addressing mode    
    addressingmode = opdict[op][1]    
    if addressingmode == "IMP":    
        print 'case 0x%02x: sprintf(opstr, "%s"); break;' % (op, opdict[op][0])    
    elif addressingmode == "ACC":    
        print 'case 0x%02x: sprintf(opstr, "%s A"); break;' % (op, opdict[op][0])    
    elif addressingmode == "IMM":    
        print 'case 0x%02x: sprintf(opstr, "%s %s", opcodes[1]); count = 2; break;' % (op, opdict[op][0],"#$%02x")    
    elif addressingmode == "ABS":    

Running the script prints out C code that looks like this:

   case 0x00: sprintf(opstr, "BRK"); break;    
   case 0x01: sprintf(opstr, "ORA ($%02x,X)", opcodes[1]); count = 2; break;    
   case 0x05: sprintf(opstr, "ORA $%02x", opcodes[1]); count = 2; break;    
   case 0x06: sprintf(opstr, "ASL $%02x", opcodes[1]); count = 2; break;    
   case 0x08: sprintf(opstr, "PHP"); break;    
   case 0x09: sprintf(opstr, "ORA #$%02x", opcodes[1]); count = 2; break;    

I paste that into a C shell program that is very similar to the code I wrote for the 8080 disassembler.

Here is some code printed out by my disassembler. This is the beginning of the program to the Missle Command arcade game (the first instruction of the program starts in memory at $7B4C).

   7B4C A2 FF    LDX #$ff    
   7B4E 9A       TXS    
   7B4F D8       CLD    
   7B50 A2 00    LDX #$00    
   7B52 8A       TXA    
   7B53 78       SEI    
   7B54 95 00    STA $00,X    
   7B56 9D 00 01 STA $0100,X    
   7B59 CA       DEX    
   7B5A D0 F8    BNE $f8    
   7B5C A9 40    LDA #$40    
   7B5E 8D 00 48 STA $4800    
   7B61 AD 00 49 LDA $4900    
   7B64 29 40    AND #$40    
   7B66 F0 04    BEQ $04    
   7B68 58       CLI    
   7B69 4C 13 50 JMP $5013    
   7B6C 4C 87 7C JMP $7c87    
   7B6F 8D 00 4C STA $4c00    
   7B72 E6 E9    INC $e9    
   7B74 AD 00 49 LDA $4900    
   7B77 10 34    BPL $34    
   7B79 E6 8D    INC $8d    
   7B7B A5 C6    LDA $c6    

You can download a copy of my CSV file, the python script to generate the disassembler, and the C source (including the shell) to the disassembler below. It is source code that will compile for command line as described here. The python script was tested on python 2.7 but is really basic - it will probably work at least back to python 2.4.

I also used the same technique and data to generate the 6502 reference page that I added to the site. I'll include those files too.

View my 6502 disassembler (also in the github project)

View project of the 6502 reference material generator

← Prev: 6502-addressing-modes   Next: 6502-assembler →


Post questions or comments on Twitter @realemulator101, or if you find issues in the code, file them on the github repository.