Intro

OmegaGB is a Nintendo Game Boy emulator I am developing using the Haskell programming language. This Devlog tracks my progress.

Enjoy,
    -- bit

Map Viewer - Part 2

April 22 2006

The code snapshot relevant to this post is snapshot 20060422

My previous post ended with me still on my short-term quest to get a map viewer working. I had discovered that I would need to emulate the V-Blank interrupt in order to proceed. That is what I set out to do.

The code I had for periodically incrementing the LY register would be the base for emulation of the V-Blank interrupt. I added to the IrqStates record a field for counting down to the V-Blank cycle. I updated the irqUpdate function so that it would decrement the counter, and when it reaches zero, trigger the V-Blank interrupt by pushing the Program Counter to the stack and then modify the Program Counter to the V-Blank jump address. The already ugly irqUpdate function became even uglier. Querying and modifying state is not very pretty when done functional style. I plan to later convert this function to use a state monad.

But at least it seems to work! Games no longer seem to get stuck in an infinite loop waiting for the V-Blank. However, my program soon crashes with a friendly error message: <<loop>>. Apparently my program is getting stuck in what I think they call a blackhole. The GHC runtime is detecting that a computation contains an infinite loop, and so it bails out. When I run my program from within ghci, I don't get this error, the program simply freezes.

Here begins my first great haskell debugging saga.

I've never really had to debug this kind of problem in haskell before, so I'm not sure what to do. There don't seem to really be any debuggers for haskell, for some reason people claim to not need them. So I decide to use the old fashioned "printf" method of debugging: everywhere you can in the code, put calls to print something to the console. Then when the crash happens, you look at the last thing that was printed. You now know that the error happened somewhere between the point in the code that prints this message, and the next point in code that prints a message.

I scatter putStrLn calls in my step function, and figure out the the crash is occuring somewhere inside my updateDebugPanel function, so I scatter putStrLn calls in there. This function is responsible for updating the GUI so that all of the displayed values of the registers are updated. My "printf" debugging tells me that the error is happening somewhere in here:

reg_a `entrySetText` showHex1 (getRegState regS M_A)

I find this totally baffling. I ask myself how this simple call can be causing an infinite loop? It doesn't make any sense. All it's doing is taking the value of the A register from the state, and updating the GUI with it. How could this be causing some blackhole error?

The obvious answer soon hits me: haskell is lazy! The reading of the A register can cause an entire chain of computations to be executed, computations that have been delayed until the value of A is actually requested. Now I have a problem. How am I supposed to find out where the bug is? My code base is no longer trivially short, so there is no way I'm going through it line by line, looking for a needle in a haystack. The "printf" debugging method won't help me, since the bug is in "pure" functional code, where printf is not allowed.

I could probably use Debug.trace, which is like printf, but it's allowed also in pure functional code. But I have a better idea. I switch from my graphical debugger back to my previous text emulator, the one that simply prints the CPU instructions as they are executed. I want to know what's going on that causes the strange behaviour in the A register. So I modify the program to also print the value of the A register along with each instruction.

I run the program, it dies with the <<loop>> error. I look at the last executed CPU instruction before the crash, and it's an ADD instruction that I had just recently implemented. I look at my code for this ADD instruction, and sure enough, there it is:

let v'' = v + v''

Arrgh! How could I make such a stupid mistake? And why did it take me so long to find it? (In imperative programming languages, like C, code like this is used all the time for incrementing variables. But haskell doesn't have variables, and is more similar to mathematics, where such an equation makes no sense)

Overall my debugging session went pretty well. The important thing is that I eventually found the bug :). This experience helped make lazy evaluation more "real" for me. My way of going about the debugging wasn't that great, but I feel that from now on, I should be able to nail similar bugs quicker.

After examining some game roms, I realize that I'm going to need to emulate more then just the V-Blank interrupt if I want to be able to execute them up to the beginning of the title screen even. I read through the pan docs gameboy interrupts sections over and over again. Some aspects aren't totally clear to me. "joat" from a gameboy irc channel helps me out, and gives me the following algorithm for dealing with gameboy interrupts:

joat: how to work it good:
joat:   whenever a peripheral triggers an irq, set the bit in the IF latch
joat:   whenever the cpu writes to IF, latch the value
joat:   every cycle (this is the slow way, depends on your emu arch on how to
      speed it up), if IME is 1 and IE & IF != 0, trigger the highest
      priority interrupt set in IF, and clear that bit in IF
joat: servicing the interrupt also disables IME
joat: the bits that don't get servied remain set in IF
joat: waiting for a future time when their corresponding IE bit gets set (if it
      isn't), and they too get serviced, or the user clears them manually

Thanks joat! Things are a lot clearer now. It's time to clean up the irqUpdate function. I switch to using a state monad, clean things up a bit, and implement full emulation support of the STAT register, including all of it's various interrupts.

Everything I need for the the map viewer should be working now! I hack up the debugger, and add a viewport for displaying a real-time dump of the tiles in video ram:

Dev Image

Yes! Finally some graphics! A confirmation that my emulator actually does something, and doesn't just print random numbers to the screen :). Gameboy background maps index into these tiles. I'm just one step away to getting such a map viewer working, and here it is:

Dev Image

That is the map that "Dropzone" uses for its title screen. Kind of boring, isn't it? The previous screenshot showing the tile patterns dump is a bit cooler I think. Now that I have this map viewer working I realize that my emulator should in fact be complete enough in order to actually display the intro and title screen of some games!

I quickly put together a new minimal gui, and hack up some code to render the display of the gameboy. I only implement the minimum of the minimum needed to display the background map, which should be enough for the simple scroll effects that games use for their intros. It seems to work great! Here is a screenshot and two animated gifs. Cowabunga!

Dev Image Dev Image Dev Image

"Brainbender" has got some pretty funky scaling going on, and I seem to be emulating it properly :). It appears that the emulator is working awesome so far! It still needs a lot of work though. Out the games I've tested, the emulator only succeeded in running and displaying the title screen of these two. The majority of games will not work for the simple reason that they use cartridges with memory banking support, and I have not implemented emulation for this yet. This is why I've been using "Dropzone", "Brainbender" and "Asteroids" for testing. They are of the few 32Kb(meaning no banking) games that I have. By the way, "Asteroids" sort of shows the title screen, but it's a bit garbled. This is almost definitely because of lack of rendering support for sprites or the gameboy "window".

The big problem though is speed. The emulator is slow :(. I was expecting it to be slow, since my strategy so far has been to not think about performance at all; just get things working. At least things are working! The emulator runs at about 1/10th the speed it needs to for real time emulation. I will eventually need to do some optimizations. That might be the theme of the next post.

Peace,
    -- bit

Map Viewer - Part 1

April 17 2006

The code snapshot relevant to this post is snapshot 20060417

Now that I can execute game ROM code, I set a short term goal to get a gameboy background map-tile viewer working. It should have been simple enough.

Gameboy has video ram that contains tile data and also a tilemap indexing the tiles. All I would need to do to get the map viewer working would be to read this video ram every V-Blank period(LCD display refresh period) and draw it.

First I implemented a missing cpu instruction, so that "Asteroids" wouldn't bail out after a few hundred instructions. Now this sucker executes, and executes... and sucks up an assload of my system ram!

I use the iterate function to make an infinite list of the machine state as it changes after each CPU instruction, and then i simply fetch the instruction from each state and print it. I was hoping that GHC would be able to figure out that it only needs to have one state in memory that can be mutated, but I guess this is not happening: it seems that all of the old states are building up in ram, until all available ram runs out and I get a "Killed" message - I'm guessing as a result of the operating system shutting down the process.

Someone from #haskell mentions using STArray instead of regular Array which I've been using. The array is used for storing the gameboy's RAM data. So I look through the GHC library docs, and along with STArray, I also see UArray, which is supposed to be unboxed arrays.

According to what it says there, my regular Array that I've been using boxes all of it's elements! Since my array contains thousands of Word8 elements, this is not cool! The docs say that switching from boxed Array to unboxed UArray can be done by simply changing the type signature of your array.

I replace "Array" with "UArray" in a few places in my code, and all of my problems seem to be solved! Memory usage now stays constant, and I can execute roms without the process getting "Killed". I'm not sure why GHC was able to figure out that it can mutate unboxed arrays, while boxed arrays it couldn't. Puzzling indeed, but I don't spend too much time thinking about it since everything works now, and I want to continue and get the map viewer working!

Before I can start working on the map viewer, I notice that all roms that I test get stuck in a loop like this one:

$60e LDH A,($FF00+$44)
$610 CP A,$91
$612 JR NZ,-6

These 3 instructions are executed over and over again. Memory address $FF44 on the gameboy is called the LY register. According to the Pan Docs:

The LY indicates the vertical line to which the present data is transferred to the LCD Driver. The LY can take on any value between 0 through 153. The values between 144 and 153 indicate the V-Blank period. Writing will reset the counter.

So this loop monitors the LY register in order to wait for the V-Blank(Why do they wait for LY to be 145($91 = 145) and not 144? I guess to play it safe or something ;)). Unfortunately for the game, this will never happen since I haven't implemented emulation for any of the features of the gameboy hardware except for it's CPU :)

The big question is: How should I implement the "ticking" of the LY register? This is actually part of the bigger question: How should I implement interrupts and joypad input and sound output and all these other things things? I start thinking about some sort of design that would allow pluggable memory-register-injection routines and registering to notifications of register changes, and all kinds of other support systems so that everything needed to be done would be doable, yet the system would still be flexible.

I forget about this idea and go with the more practical, "Do The Simplest Thing That Could Possibly Work". This post finally gets some code:

hBlankPeriod = 456

data IrqStates = IrqStates {
  irqStateIME :: Bool,         -- Interrupt Master Enable
  irqStateHBlankCounter :: Int -- CPU cycles until next hblank
}

initialIrqStates = IrqStates {
  irqStateIME = False,
  irqStateHBlankCounter = hBlankPeriod
}

IrqStates is additional state data that must now be a part of the machine state. Not sure if "Irq" is the greatest name for it, since the H-Blank counter(which will be used to incremement the LY register) isn't really related to interrupts in this context. But I couldn't think of any other name, and one could conceptually think about LY incrementing as some sort of internal interrupt that executes builtin code for incrementing the LY register. Anyway... I need to transform this state after each CPU instruction is executed, so I come up with this really ugly function:

irqUpdate :: IrqStates ->
             CycleCount ->
             (IrqStates,
              (RegisterStates, Memory) -> (RegisterStates, Memory))
irqUpdate is c =
  let is' = is { irqStateHBlankCounter = (irqStateHBlankCounter is)-c } in
  if irqStateHBlankCounter is' <= 0
  then (is' { irqStateHBlankCounter = (irqStateHBlankCounter is')+hBlankPeriod },
        \(r, m) -> let oldLY = (memRam m)!0xFF44
                       newLY = incWrap 153 oldLY in
                   (r, m { memRam = (memRam m)//[(0xFF44, newLY)] })
       )
  else (is', id)
  where incWrap limit n = if (n+1) > limit then 0 else n+1

Notice that this works with the Memory type directly, instead of the abstract MemoryModel type class. This is because I need direct access to the memRam field; I can't use the writeMem function because it will filter writes to the various special memory areas that must be modified. (Check out Memory.hs to see exactly what's going on.)

I Modified the updateMachine function so that it works with and updates this new additional state:

updateMachine :: ((RegisterStates, Memory), IrqStates) ->
                 ((RegisterStates, Memory), IrqStates)
updateMachine (state@(regS, memS), irqS) =
  let stepInstruction = machineStepInstruction state
      pc = getReg2State regS M_PC
      opcode = readMem memS pc
      cycles = opcodeCycleCount opcode
      (irqS', interrupt) = irqUpdate irqS cycles in
  (interrupt stepInstruction, irqS')

This allows the H-Blank counter to decrease the amount of cpu cycles that the executed instruction takes. The machine state is now: ((RegisterStates, Memory), IrqStates). I use a nested tuple like this, since the few other existing functions work with (RegisterStates, Memory). It's kind of ugly. This and that new irqUpdate function is causing severe uglification of the code! I'm not too worried though, since Haskell is great for code refactoring: The language is powerful enough to keep code short, so the ugly code is only like 20 lines. And all of the code that directly interfaces with the ugly code is only another 20 lines. In addition, Haskell's strong type system helps you feel confident that your refactoring is correct.

Enough rambling! LY should be working! I run the "Dropzone" game, and it does indeed get past that previously problematic loop! But now it gets stuck in another loop:

$27c4 LD A,($c103)
$27c7 OR A,A
$27c8 JR Z,-6

This apparently is a loop that waits for memory address $C103 to be non-zero. $C103 is not an IO Register or anything like that. It's in the gameboy's internal RAM area. The only explanation I can think of is that an interrupt routine is supposed to modify $C103

I put an error function in my memory writing routine to catch when there is a write to the Interrupt Enable register(IE), so I'll be able to find out which interrupt the game wants enabled(V-Blank? Timer? Joypad?). Unfortunately the game zeroes the IE register near start up so this triggers the error and I am stuck with no new clues. I could hack up some Debug.Trace calls and scatter some putStrLns in order to get some information, but I decide that I might as well do things right.

I fire up glade-2, and design some sort of gameboy debugger GUI, inspired from the NO$GMB GameBoy emulator/debugger. Unfortunately, glade freezes up just before I save and I lose all my work. Bad glade! I design the whole thing again from scratch saving every 5 seconds.

Now I start coding the debugger, using gtk2hs, of course, which is a gtk+ binding for haskell, with support for loading glade user interfaces. I encounter some bizarre haskell behaviour. For some reason, this code doesn't compile:

do
  let bindWidget = xmlGetWidget windowXml
  window_main <- bindWidget castToWindow "window_main"
  button_open <- bindWidget castToButton "button_open"

If I change that bindWidget line to let bindWidget x y = xmlGetWidget windowXml x y then it compiles and works! I'm not sure why I can't curry the function in this way, probably because of some weirdness of do notation. I don't have time to get to the bottom of this though, it works, and I am eager to get the debugger into working shape! I soon encounter another bizarre oddity. An error that is triggered from a write to an undocumented IO register seems to shutdown/crash ghci! I try to catch it as an exeption but it's useless. Not sure why this is happening. I find it very disturbing.

It doesn't matter though, because the debugger works! Check it:

Dev Image

Looks pretty cool, right? :) It's pretty minimal, but it gets the job done, and it's less then 200 lines of haskell. Programming with gtk2hs has a similar feel to programming in Python PyGtk, but with haskell you have all these really cool monad combinators and very well defined mutable state. The resulting code, even though it's imperative, looks like no other gui code I've seen in other languages. When they say, "Haskell is the best imperative language", it's not just a joke!

Let's get back to $C103. In the screenshot you can see that I've run "Dropzone" and paused it once it got stuck in the $C103 loop. The debug panel on the lower left shows me what I need to know. The Interrupt Master Enable(IME) is set, confirming that interrupts have been enabled with the EI CPU instruction. The Interrupt Enable(IE) Register shows $03 meaning that bits 0 and 1 are set. This means that the only interrupts that are enabled are V-Blank and LCDC. LCDC is actually several interrupts related to the LCD display controller. The STAT IO Register is used to select which LCD controller interrupts are enabled. The debug panel shows that STAT equals $80 which means that only bit 7 is set. According to the Pan Docs, bit 7 isn't used. Since no other bits are set, all of the LCDC interrupts are disabled. This means that the only interrupt that we are dealing with right now is V-Blank! The mystery is solved.

I will need to implement emulation of the V-Blank interrupt in order to continue. But before I do that, I hexdump the game rom. The V-Blank interrupt start address is $0040, and here we have: C3 C0 1F. C3 is the opcode for JP, so this code disassembles to the instruction JP $1FC0. I scroll down the hexdump of the game rom till I get to $1FC0. I scan the following bytes, and sure enough, I see the sequence EA 03 C1, which disassembles to LD ($C103),A. The V-Blank interrupt does indeed seem to write to our good friend $C103 :)

Implementation of the V-Blank interrupt will have to wait until next time. What was supposed to be a simple task -- getting a map viewer working -- turned out to be littered with obstacles, that forced me even to build a graphical debugger. Will I succeed in getting the anticipated map viewer working in time for the next post?

Tune in next time to find out!
    -- bit

Hello, World!

April 16 2006

I finally decided to learn Haskell, that weird programming language that is supposed to be so great. I played around with it for a while and did a few small tests and discovered that it really is great.

So I was looking for a cool project idea to practice my Haskell. Something that would be serious, fun to do, and useful. A Nintendo Game Boy emulator may not be the most useful program in the universe, but everyone loves the gameboy, and writing an emulator is considered a pretty serious programming job. I deliberated for a while between gameboy or NES, and decided to go with gameboy since it's simpler.

I actually started coding the emulator about a week ago, so I've already got a bit of code done. I'll describe what I've got so far. (Source code is in snapshot 20060414)

CPU Emulation

CPU Emulation is pretty much working, except for a few instructions left that need to be implemented. I use an Instruction type that looks like this:

data Instruction =
  LDR Register Register             | -- LD r,r
  LDRN Register Word8               | -- LD r,n
  {- ... -}
  PUSH StackRegister                | -- PUSH sr
  POP StackRegister                 | -- POP sr
  ADD Register                      | -- ADD A,r
  {- ... -}
  AND Register                      | -- AND A,r
  ANDN Word8                        | -- AND A,n
  ANDHL                             | -- AND A,(HL)
  {- ... -}
  SCF                               | -- SCF
  NOP                               | -- NOP
  HALT                              | -- * HALT
  STOP                              | -- * STOP
  DI                                | -- * DI
  EI                                | -- * EI
  {- ... -}
  JP (Maybe FlagCondition) Word16   | -- JP nn               JP cc,nn
  JPHL                              | -- JP (HL)
  JR (Maybe FlagCondition) Int8     | -- JR a                JR cc,a
  CALL (Maybe FlagCondition) Word16 | -- CALL a              CALL cc,a
  RST RestartAddress                | -- RST n
  RET (Maybe FlagCondition)         | -- RET                 RET cc
  RETI                                -- * RETI

It has a total of 94 constructors. Yikes! But at least it staticly guarantees that all instructions are valid. Then there is a function machineCodeToInstruction that assembles an instruction from machine code bytes.

In order to execute an Instruction, I wanted to be able to describe, using some sort of mini-language, the operation of each cpu instruction. This is where haskell shows its power. With the help of chatting with Philippa on #haskell, and her MonadArticleThingy, I was able to get this ExecutionAST type:

data ExecutionAST result where
  Return :: result -> ExecutionAST result
  Bind :: (ExecutionAST oldres) -> (oldres -> ExecutionAST result) ->
          ExecutionAST result
  WriteRegister :: M_Register -> Word8 -> ExecutionAST ()
  ReadRegister :: M_Register -> ExecutionAST Word8
  WriteRegister2 :: M_Register2 -> Word16 -> ExecutionAST ()
  ReadRegister2 :: M_Register2 -> ExecutionAST Word16
  WriteMemory :: Word16 -> Word8 -> ExecutionAST ()
  ReadMemory :: Word16 -> ExecutionAST Word8
instance Monad ExecutionAST where
  return = Return
  (>>=) = Bind

Thanks Philippa! It took me a while to get this right, but in the process, my understanding of monads improved a lot, and I also learned about GADTs(Generic Algebraic Data Types). This type does exactly what I wanted, it describes a mini language with a minimal set of operations for reading/writing registers and memory. Now I can implement an executeInstruction function:

executeInstruction :: Instruction -> ExecutionAST ()
executeInstruction ins = case ins of
  LDR r1 r2 -> do
    a <- readRegister (mRegister r2)
    writeRegister (mRegister r1) a
    incPC1
  {- ... -}
  PUSH sr -> do
    a <- readRegister2 (mRegister2 SP)
    let a' = a - 1
    let a'' = a - 2
    v <- readRegister2 (mRegisterS sr)
    let (hi, lo) = splitWord16 v
    writeMemory a' hi
    writeMemory a'' lo
    writeRegister2 (mRegister2 SP) a''
    incPC1
  {- ... -}
  where
    incPC1 :: ExecutionAST ()
    incPC1 = do
      pc <- readRegister2 M_PC
      let pc' = pc + 1
      writeRegister2 M_PC pc'

The do haskell syntax sugar makes writing this code easy, in an imperative style, but of course it's all still purely functional. Now comes the fun part, the last step of the CPU emulation. I need an interpreter for the ExecutionAST "mini-language". Philippa's pretty much showed us how to do it:

class MemoryModel a where
  readMem :: a -> Word16 -> Word8
  writeMem :: a -> Word16 -> Word8 -> a

machineCpuExecute :: (MemoryModel m) =>
                 (RegisterStates, m) ->
                 ExecutionAST () ->
                 (RegisterStates, m)
machineCpuExecute s e = fst (machineCpuExecute' s e)

machineCpuExecute' :: (MemoryModel m) =>
                  (RegisterStates, m) ->
                  ExecutionAST a ->
                  ((RegisterStates, m), a)
machineCpuExecute' state@(regS, memS) e = case e of
  Return result -> (state, result)
  Bind l r -> let (s, result) = machineCpuExecute' state l in
              machineCpuExecute' s (r result)
  WriteRegister reg n -> ((setRegState regS reg n, memS), ())
  ReadRegister reg -> (state, getRegState regS reg)
  WriteRegister2 reg2 nn -> ((setReg2State regS reg2 nn, memS), ())
  ReadRegister2 reg2 -> (state, getReg2State regS reg2)
  WriteMemory a n -> ((regS, writeMem memS a n), ())
  ReadMemory a -> (state, readMem memS a)

This code is pretty complicated, at least for me :). It works by transforming the state of the registers and memory, plus returning an additional computation result. (I use a MemoryModel type class because I wanted to test the CPU with a ReallySimpleMemory to make sure that it works, before implementing gameboy's complicated memory system, with its switchable banks, echoed areas and IO registers.) machineCpuExecute' pattern matches against the CpuExecution constructors:

I really love this function :) It shows off some of haskell's great features. Together, all of the code I've described so far shows how haskell helps you organize your software into layers. Anyway, I made two more functions, fetchInstruction and stepInstruction and I am able to execute some test programs.

Game Boy ROM loading

I coded a simple function for loading rom images into memory, and extracting the header information. It works great for 32KB and 64KB roms, but for some reason it takes a while for it to load a 128KB rom, and loading a 256KB rom is real slow. 512KB roms take forever. Ok, it's not super bad, but loading of all of these should be instantaneous.

Now that I can load gameboy ROM images, I try and succeed in executing them! Totally sweet. Check it:

Dev Image

Here I am reminded of another great quality of Haskell: Your programs work the first time you run them. Haskell's awesome type system and it's way of encouraging(forcing?) you to code in a certain(functional?) style keeps bugs to a minimum. No hours of debugging, only to find that you made some stupid coding mistake. I, a Haskell newbie, was able to get an almost fully functional gameboy CPU emulator working with only a few intense hours of work. Pretty impressive I think.

This is the end of the first post to this devlog! I hope you enjoyed it. I had to cover quite a bit of development with this post, so I didn't get to go into as much detail as I would have liked to. Next time I will talk in depth about problems encountered during development, design thoughts, my struggles with understanding parts of the gameboy hardware, and more!

Stay tuned...
    -- bit


Game Boy is a registered trademark of Nintendo. The author is in no way affiliated with Nintendo.