Disassembler pt 2

Run your disassembler on the invaders.h ROM file and let's look at the output.

   0000 NOP    
   0001 NOP    
   0002 NOP    
   0003 JMP    $18d4    
   0006 NOP    
   0007 NOP    
   0008 PUSH   PSW    
   0009 PUSH   B    
   000a PUSH   D    
   000b PUSH   H    
   000c JMP    $008c    
   000f NOP    
   0010 PUSH   PSW    
   0011 PUSH   B    
   0012 PUSH   D    
   0013 PUSH   H    
   0014 MVI    A,#$80    
   0016 STA    $2072    
   0019 LXI    H,#$20c0    
   001c DCR    M    
   001d CALL   $17cd    
   0020 IN     #$01    
   0022 RRC    
   0023 JC     $0067    
   0026 LDA    $20ea    
   0029 ANA    A    
   002a JZ     $0042    
   002d LDA    $20eb    
   0030 CPI    #$99    
   0032 JZ     $003e    
   0035 ADI    #$01    
   0037 DAA    
   0038 STA    $20eb    
   003b CALL   $1947    
   003e SRA    A    
   003f STA    $20ea    

   /*    
   0000000 00 00 00 c3 d4 18 00 00 f5 c5 d5 e5 c3 8c 00 00    
   0000010 f5 c5 d5 e5 3e 80 32 72 20 21 c0 20 35 cd cd 17    
   0000020 db 01 0f da 67 00 3a ea 20 a7 ca 42 00 3a eb 20    
   0000030 fe 99 ca 3e 00 c6 01 27 32 eb 20 cd 47 19 af 32    
   */    

The first instructions match what we hand assembled before. After that, you can see some new instructions. I pasted the hex data in below for reference. Notice that if you compare the memory with the instructions, it looks like the addresses are stored backward in memory. They are. This is called little endian - little endian machines like the 8080 store the smaller bytes of numbers in memory first. (See below for more on endian-ness)

In Part 1 I mentioned that this code is the ISR code for Space Invaders. Code for interrupts 0, 1, 2, ... 7 start at address $0, $8, $20, ... $38. It looks like the 8080 just gives 8 bytes for each ISR. Space Invaders seems to get around this sometimes by just jumping to another address with more space. (It does that at $000c).

It also appears that ISR 2 is longer than the space allocated to it. Its code goes over $0018 (ISR 3's place). I guess Space Invaders doesn't expect to see anything using interrupt #3.

The Space Invaders ROM file you find on the internet has 4 parts. I'll explain this later, but for now if you want to follow the next section, you need to combine the 4 files into one. On Unix:

   cat invaders.h > invaders    
   cat invaders.g >> invaders    
   cat invaders.f >> invaders    
   cat invaders.e >> invaders    

Now run your disassembler on the resulting "invaders" file. When the program starts from $0000, the first thing it does is jumps to $18d4. I'd consider this the start of the program. Let's take a look at that code real quick.

   18d4 LXI    SP,#$2400    
   18d7 MVI    B,#$00    
   18d9 CALL   $01e6    

OK - it does 2 things and calls $01e6. I'm going to just paste some of the jumpy code into one code section here:

   01e6 LXI    D,#$1b00    
   01e9 LXI    H,#$2000    
   01ec JMP    $1a32    
   .....    
   1a32 LDAX   D    
   1a33 MOV    M,A    
   1a34 INX    H    
   1a35 INX    D    
   1a36 DCR    B    
   1a37 JNZ    $1a32    
   1a3a RET    

As you saw in the Space Invaders memory map, the some of these addresses are interesting. $2000 is the start of the program's "work ram". $2400 is the start of the video memory.

Lets annotate this code a little for what happens right at startup:

   18d4 LXI    SP,#$2400  ; SP=$2400 - Establish stack for whole program    
   18d7 MVI    B,#$00     ; B=0    
   18d9 CALL   $01e6    
   .....    
   01e6 LXI    D,#$1b00   ; DE=$1B00    
   01e9 LXI    H,#$2000   ; HL=$2000    
   01ec JMP    $1a32    
   .....    
   1a32 LDAX   D          ; A = (DE), so whatever was in memory at $1B00    
   1a33 MOV    M,A        ; Store A into (HL), so to $2000    
   1a34 INX    H          ; HL = HL + 1 (now $2001)    
   1a35 INX    D          ; DE = DE + 1 (now $1B01)    
   1a36 DCR    B          ; B = B - 1 (now 0xff because it wrapped around from 0)    
   1a37 JNZ    $1a32      ; loop, will be taken until b=0    
   1a3a RET    

This code looks like it is going to copy 256 bytes from $1b00 to $2000. Why? I don't know. It is possible for you to follow through this program a long way and speculate on what it is doing.

There is a problem here. If you have an arbitrary chunk of memory that includes 8080 code, it probably has data interleaved in it.

For example, the sprites for the characters in a game may be mixed in with the code. When your disassembler hits that chunk of memory, it is going to think it's code and continue to chew on it. Unless it gets lucky, any code disassembled after that chunk of data may or may not be right.

For now, there isn't a whole lot you can do about this. Just be aware the problem exists. If you see things like:

That there is probably data in there that renders some portion of your disassembly unusable. You might have to restart it an an offset if this happens to you.

It turns out that Space Invaders has some runs of zeros in it periodically. If our disassembly ever gets off, the zeros will force it to sort of reset itself.

For a thorough analysis of the Space Invaders code, look here.

Endianness

Depending on the processor, bytes are stored differently in memory, and the storage depends on the size of the data. Big endian machines store data from biggest byte to smallest byte. Little endian store from smallest to biggest. If you write a 32-bit integer 0xAABBCCDD to memory on each machine, it will look like this in memory:

Little endian: $DD $CC $BB $AA

Big endian: $AA $BB $CC $DD

I started programming Motorola processors that all used big-endian, so that seems more "natural" to me, but I've gotten used to little endian.

My disassembler and emulator completely avoids the endian issue by only reading and writing 1-byte at a time. If you want to use, say a 16-bit read to read an address from the ROM, be aware that code is not portable between host CPU architectures.

← Prev: developing-on-the-command-line   Next: emulator-shell →


Post questions or comments on Twitter @realemulator101, or if you find issues in the code, file them on the github repository.