More about binary numbers

When you're writing a computer program, one decision you have to make is what data type you want to use for the numbers - whether you want them to be able to be negative, and how big the numbers will be. For a CPU emulator, you have to match the data type to the data type of the target CPU.

Signed vs Unsigned

When we started talking about hex numbers, we treated them as unsigned - that is that each binary digit of the hex number had a positive value, each one treated as a power of two (ones, twos, fours, etc).

We didn't touch on how the computer would store a negative number. If you know the data you're looking at is signed, that is, it could be negative, you can spot a negative number by looking at the numbers highest bit (most significant bit, or MSB). If the data size is one byte, every number with the MSB bit set is actually negative, and every number with the MSB clear is positive.

The value of the negative number is stored in two's complement form. If you have a signed number with its MSB set, and you want to know what the number is, you can convert it like this. Binary "not" the hex numbers, then add one.

For example: The hex number 0x80 has its MSB set, so its negative. The binary "not" of 0x80 is 0x7f, or decimal 127. 127+1 is 128. So 0x80 is decimal -128. A second example is 0xC5. Not(0xC5) = 0x3A = decimal 58 +1 = decimal 59. So 0xC5 is decimal -59.

The amazing thing about two's complement numbers is that you can do math with them just like unsigned numbers and it just works. The computer doesn't have to do anything special about the signs. I'll do a couple of examples just to prove it.

   Example 1    

     decimal   hex           binary    
      -3       0xFD      1111 1101    
   +  10       0x0A         +0000 1010    
   -----                   -----------    
       7       0x07       1 0000 0111    
                       ^ This one gets put to the carry bit    

   Example 2    

     decimal   hex           binary    
     -59       0xC5      1100 0101    
   +  33       0x21         +0010 0001    
   -----                   -----------    
     -26       0xE6         1110 0110    

In Example 1, you can see that adding 10 and -3 gives 7. There was a carry out of the result of the addition, so the C flag might get set. In Example 2, the result of the addition was negative, so to decode that: Not(0xE6) = 0x19 = 25 + 1 = 26. 0xE6 = -26 Mind Blown!

Read more about two's complement on wikipedia if you want.

Data types

There is a mapping in C between data types and how many bytes are used for that type. We are really only interested in integers. The standard/old-school C data types for integers are char, int, and long and their buddies unsigned char, unsigned int, and unsigned long. The problem is that these types can be different sizes on different platforms and compilers.

So it is best practice to seek out data types for your platform that declare the data size explicitly. If you have stdint.h on your platform, you can use int8_t, uint8_t, etc.

That the size of the integer determines the maximum number that can be stored in it. For unsigned integers, 8 bits can store numbers from 0 to 255. If you think in hex, this corresponds to 0x00 to 0xFF. Since 0xFF is "all bits set", and corresponds to decimal 255 it makes perfect sense that the range of a 1 byte unsigned number is 0-255. The ranges reported for all integer sizes work the exact same - the numbers correspond to whatever number is represented when all bits of the number are set.

TypeRangeHex
8-bit unsigned0 to 2550x0 to 0xFF
8-bit signed-128 to 1270x80 to 0x7F
16-bit unsigned0 to 655350x0 to 0xFFFF
16-bit signed-32768 to 327670x8000 to 0x7FFF
32-bit unsigned0 to 42949672950x0 to 0xFFFFFFFF
32-bit signed-2147483648 to 21474836470x80000000 to 0x7FFFFFFF

It is further interesting to note that -1 in each signed data type is the number with all bits set (0xFF for a signed byte, 0xFFFF for a signed 16-bit number, and 0xFFFFFFFF for a signed 32-bit number). If the data is treated as unsigned, all bits set is the maximum possible number for the data type.

For emulation of CPU registers, you choose a data type to match the size of that register. You probably want to choose unsigned types by default and cast them when you want to treat them as signed. For instance, use the data type uint8_t to represent an 8-bit register.

Tip: Use the debugger to convert data types

If you are on a platform with gdb installed, it is really useful to use it to work with binary numbers. I'll show you below - in the session below lines that start with a # were comments added by me after the session.

   #use the /c modifier to get gdb to interpret the input as signed    
   (gdb) print /c 0xFD    
   $1 = -3 '?'    

   #use the /x modifier to get gdb print the result as hexidecimal    
   #switching to "p" instead of typing out "print"    
   (gdb) p /c 0xA    
   $2 = 10 '\n'    

   #These are the numbers from Example 2 in the two's complement section    
   (gdb) p /c 0xC5    
   $3 = -59 '?'    
   (gdb) p /c 0xC5+0x21    
   $4 = -26 '?'    

   #if you print without a modifier, gdb will respond in decimal    
   (gdb) p 0x21    
   $9 = 33    

   #These are the negative numbers from above, but if I don't tell gdb    
   #They are signed, it treats them as unsigned    
   (gdb) p  0xc5    
   $5 = 197     #unsigned    
   (gdb) p /c 0xc5    
   $3 = -59 '?' #signed    
   (gdb) p 0xfd    
   $6 = 253    

   #It will also tell you the two's complement representation (it defaults to 32 bits integer)    
   (gdb) p /x -3    
   $7 = 0xfffffffd    

   # 1 byte-sized data treated as signed    
   (gdb) print (char) 0xff    
   $1 = -1 '?'    
   # 1 byte-sized data treated as unsigned    
   (gdb) print (unsigned char) 0xff    
   $2 = 255 '?'    


When I am doing work with hex numbers, I always do it in gdb - I do this almost every day. It is way easier than opening up a GUI programmer's calculator. On Linux machines (and Mac OS X) you can just open up a terminal and type "gdb" to start a gdb session. If you are using XCode on OS X, you can use the console inside XCode (the one where the printf output comes out) once the program has started. For Windows, Cygwin has gdb available.

← Prev: stack-group   Next: finishing-the-cpu-emulator →


Post questions or comments on Twitter @realemulator101, or if you find issues in the code, file them on the github repository.