Disassembler pt 1

A disassembler is a program that simply translates a stream of hex numbers back into assembly language source. This is exactly the same operation we did by hand in the last section - a perfect opportunity to be automated. By writing this piece of code we get familiar with the processor, and we get a handy piece of debugging code we'll need when we write the CPU emulator.

Here is an algorithm for disassembling 8080 code:

  1. Read the code into a buffer

  2. Get a pointer to the beginning of the buffer

  3. Use the byte at the pointer to determine the opcode

  4. Print out the name of the opcode using the bytes after the opcode as data, if applicable

  5. Advance the pointer the number of bytes used by that instruction (1, 2, or 3 bytes)

  6. If not at the end of the buffer, go to step 3

I'll include a couple of instructions below to get the flavor of the routine. I will make the full routine available for download, but I encourage you to write it yourself. It won't take that long and you'll learn the 8080 instruction set along the way.

   /*    
    *codebuffer is a valid pointer to 8080 assembly code    
    pc is the current offset into the code    

    returns the number of bytes of the op    
   */    

   int Disassemble8080Op(unsigned char *codebuffer, int pc)    
   {    
    unsigned char *code = &codebuffer[pc];    
    int opbytes = 1;    
    printf ("%04x ", pc);    
    switch (*code)    
    {    
        case 0x00: printf("NOP"); break;    
        case 0x01: printf("LXI    B,#$%02x%02x", code[2], code[1]); opbytes=3; break;    
        case 0x02: printf("STAX   B"); break;    
        case 0x03: printf("INX    B"); break;    
        case 0x04: printf("INR    B"); break;    
        case 0x05: printf("DCR    B"); break;    
        case 0x06: printf("MVI    B,#$%02x", code[1]); opbytes=2; break;    
        case 0x07: printf("RLC"); break;    
        case 0x08: printf("NOP"); break;    
        /* ........ */    
        case 0x3e: printf("MVI    A,#0x%02x", code[1]); opbytes = 2; break;    
        /* ........ */    
        case 0xc3: printf("JMP    $%02x%02x",code[2],code[1]); opbytes = 3; break;    
        /* ........ */    
    }    

    printf("\n");    

    return opbytes;    
   }    

I learned a lot of things about the 8080 while writing this and examining every opcode.

  1. I learned that most instructions are one byte, but some are 2 and some are 3. The code above assumes the instruction was one byte, but the 2 and 3 byte instructions change the value of the "opbytes" variable so an accurate instruction size is returned.

  2. The 8080 has registers named A, B, C, D, E, H, and L. There is also a program counter (PC) and a dedicated stack pointer (SP).

  3. Some instructions work on registers in pairs: B and C is a pair, as is DE, and HL.

  4. A is special - a lot of instructions operate on it.

  5. HL is special, and is used as the address any time data is read or written to memory.

  6. I got curious about the "RST" instruction so I read the data book a little. I noticed that they execute code at fixed locations and mentioned interrupt handling. So more reading told me that all that code at the beginning of ROM are interrupt service routines (ISRs). Interrupts can be software generated via the RST instruction, or generated by sources external to the 8080.

To finish this up into a working program, I'll just whip up a routine that does the following:

  1. Opens up a file full of compiled 8080 code

  2. Reads it into a memory buffer

  3. Skips through the memory buffer calling Disassemble8080Op

  4. Advance the PC by the amount returned by Disassemble8080Op

  5. quit at the end of the buffer

It might go a little something like this:

   int main (int argc, char**argv)    
   {    
    FILE *f= fopen(argv[1], "rb");    
    if (f==NULL)    
    {    
        printf("error: Couldn't open %s\n", argv[1]);    
        exit(1);    
    }    

    //Get the file size and read it into a memory buffer    
    fseek(f, 0L, SEEK_END);    
    int fsize = ftell(f);    
    fseek(f, 0L, SEEK_SET);    

    unsigned char *buffer=malloc(fsize);    

    fread(buffer, fsize, 1, f);    
    fclose(f);    

    int pc = 0;    

    while (pc < fsize)    
    {    
        pc += Disassemble8080Op(buffer, pc);    
    }    
    return 0;    
   }    

In part 2 we will examine the output of disassembling the Space Invaders ROM.

← Prev: diving-in   Next: memory-maps →


Post questions or comments on Twitter @realemulator101, or if you find issues in the code, file them on the github repository.