Debugging Tales

I'm going to tell you two debugging stories. Debugging code like this can be hard. I want to illustrate some of the approaches you might use if you have problems with your code.

Story One, bug in LXI B,(word)

It was really awesome when I got my platform code working and I could see the game drawing for the first time. It would run for 30 seconds or so before it crashed (obviously there were still some bugs). One thing I noticed first was that part of the Attract screen wasn't drawing. There should be a part where it says *SCORE ADVANCE TABLE* and below that a table where it says Saucer=? Mystery, alien 1 = 30, etc. The table of scores wasn't drawing.

How I debugged it

This was many many instructions into the game, probably tens of millions. I couldn't compare the result manually to the emulator. I dinked with it for a while with no success. I looked through the code analysis from Computer Archeology, and searched for "SCORE ADVANCE TABLE". I found it at address 0x1ca3, and code that used 0x1ca3 at address 0x1815.

I went into my emulator and added a line of code like this at the top of Emulate8080Op:

    if (state->pc == 0x1815)    
        printf("0x1815\n");    

Then I set a breakpoint on the printf in the debugger. When that breakpoint was hit, I started stepping through the program. I watched the PC and each instruction, comparing it with the commented program from Computer Archeology. When I got to 0x1825 something went wrong. My next instruction was DCX D (0x1D). The commented program said it should be CALL. Looking at the implementation of my LXI B it was clear what I'd done wrong. I forgot to advance the PC past the 2 data bytes of the LXI instruction. So I fixed that up and re-ran the program. The SCORE ADVANCE TABLE drew fine - success!

        case 0x01:                          //LXI   B,word    
            state->c = opcode[1];    
            state->b = opcode[2];    
            //I'd forgotten to advance the PC or had accidently deleted    
            //this line    
            state->pc += 2;    
            break;    

The program didn't go off the rails because the code that it executed when my PC got off-track was legal code. It just did something different.

This bug probably only took me an hour to find and fix.

Story Two, garbage and crashes

After I fixed LXI, I was still having problems. I was seeing random symptoms including:

Strike 1

I suspected that memory was getting corrupted somehow so I set about trying to fix it. First, I made a function to write the game's RAM. Then I went into every opcode that wrote to memory and piped it through the function. These functions would prevent opcode writing into the ROM area or outside the RAM area. I could also use this function to catch any writes to suspicious addresses.

   static void WriteMem(State8080* state, uint16_t address, uint8_t value)    
   {    
       if (address < 0x2000)    
       {    
           // printf("Writing ROM not allowed %x\n", address);    
           return;    
       }    
       if (address >=0x4000)    
       {    
           // printf("Writing out of Space Invaders RAM not allowed %x\n", address);    
           return;    
       }

       state->memory[address] = value;    
   }    

With this function, I discovered that the game does indeed sometimes write above it's RAM area, just above 0x4000. Those writes would not explain the garbage on the screen, and I wasn't positive that they don't happen anyway. Since the source for other Space Invaders emulators clamp the writes, I suspect that this is normal behavior.

Strike 2

I was still suspecting that some of my opcodes had bugs. I worked for another day, examining failure after failure, making little progress.

I decided to integrate another emulator into my project. There are several 8080 emulators out there, but a lot of them are in other languages, and some of them spread the emulation out over 10+ files. I found one written in C that was only a few files, integrated it. Alas it had not been tested very well - I fixed several bugs in it before I finally gave up on it and started searching around again.

Finally I settled on the Z80 emulator from an old old version of the MAME. Once I started with that one, I was able to integrate it into my project and write wrapping code for it in a couple of hours. With a few fits & starts I was able to run my emulator with it, but I found that they both misbehaved the same way.

After watching crash after crash, I started to think maybe the problem wasn't opcodes, maybe it was something else. I started noticing that the Stack Pointer, which starts at 0x2400, was getting awful low, less than 0x2300.

So even though integrating the emulator was a swing and a miss, I was getting closer.

Tip foul

I suspected that the stack or the SP was getting corrupt somehow. I added some code to the top of the Emulate8080Op routine:


    if ((state->sp != 0) &&(state->sp < 0x2300))    
           printf("Stack getting dangerously low %04x\n", state->sp);    

    // Alert If more than 2 bytes have changed in the stack since last time    
       if ( abs(lastSP - state->sp) > 2)    
           printf("Stack Squash?\n");    
       lastSP = state->sp;    

The SP shouldn't change more than 2 bytes per instruction. This code would tell me if the SP changed drastically. The Computer Archeology code tells me that if the stack gets under 0x22ff, it will be overwriting game data structures.

The next run told me these two things: the stack was growing down into the game data (it was 0x22e4 when I looked), and the sp was not getting corrupt. Getting closer.

I added more debugging code to the emulator:

   int     last1000index=0;    
   uint16_t last1000[1000];    
   uint16_t last1000sp[1000];    
   uint16_t lastSP;

   void PrintLast1000(void)    
   {    
       int i;    
       for (i=0; i<100;i++)    
       {    
           int j;    
           printf("%04d ", i*10);    
           for (j=0; j<10; j++)    
           {    
               int n = i*10 + j;    
               printf("%04x %04x  ", last100[n], last100sp[n]);    
               if (n==last1000index)    
                   printf("**");    
           }    
           printf ("\n");    
       }    
   }    

   int Emulate8080Op(State8080* state)    
   {    
    unsigned char *opcode = &state->memory[state->pc];    

       last1000[last1000index] = state->pc;    
       last1000sp[last1000index] = state->sp;    
       last1000index++;    
       if (last1000index>1000)    
           last1000index = 0;    

    if ((state->sp != 0) &&(state->sp < 0x2300))    
           printf("Stack getting dangerously low %04x\n", state->sp);    

This code would record the PC and SP for the last 1000 instuctions executed. So I could stop at a breakpoint in gdb and issue call (void) PrintLast1000() to see them at any time. I set the breakpoint on the the printf("Stack getting dangerously low line. When it got hit the first time, I looked at the last 1000 instructions. I noticed that the stack was growing down mostly from instructions at low memory addresses, which I knew was interrupt handlers.

Found it!

Now I think I know what the problem is. If one interrupt handler didn't get finished before the next one fired, then registers just continue to get pushed on the stack and never popped. So I added one last piece of debugging code to the top of Emulate8080Op:

    if (state->pc == 0x8)    
        printf("RST 1\n");    
    else if (state->pc == 0x10)    
        printf("RST 2\n");    
    else if (state->pc == 0x87)    
        printf("Leaving RST\n");    

0x8 is the target for RST 1, and 0x10 is the entry for RST 2. Reading the code, they both exit at 0x87. If this was working the way I thought it did, then I'd see:

   RST 1    
   Leaving RST    
   RST 2    
   Leaving RST    
   RST 1    
   Leaving RST    
   RST 2    
   Leaving RST    

But what I saw was:

   RST 1    
   RST 2    
   RST 1    
   Leaving RST    
   Leaving RST    
   Leaving RST    
   RST 2    
   RST 1    
   RST 2    
   RST 1    
   RST 2    
   Leaving RST    

The interrupt instuctions were getting called on top of each other. While I was looking through the code for the address of the interrupt I noticed this:

   0081 NOP    
   0082 POP    H    
   0083 POP    D    
   0084 POP    B    
   0085 POP    PSW    
   0086 EI    
   0087 RET    

Then I knew - I had made a mistake in my interrupt code. The EI instruction at 0x0086 told me that the code expects interrupts to be held off through the interrupt processing. I added state->int_enable = 0; to my GenerateInterrupt code in the 8080 emulator:

   void GenerateInterrupt(State8080* state, int interrupt_num)    
   {    
       //perform "PUSH PC"    
       Push(state, (state->pc & 0xFF00) >> 8, (state->pc & 0xff));

       //Set the PC to the low memory vector    
       state->pc = 8 * interrupt_num;    

       //mimic "DI"    
       state->int_enable = 0;    
   }    

Since the machine code checks the interrupt state before it generates another one, we can't get another one until EI gets called. After this change the game was running 100% correctly. It was a good day.

I probably looked for this bug for 20-30 hours all together.

← Prev: cocoa-port-pt-3---invadersview   Next: cocoa-port-pt-4---keyboard →