Composable CPU descriptions in CλaSH, and wrap-up of RetroChallenge 2018/09 (30 September 2018)

Composable CPU descriptions in CλaSH, and wrap-up of RetroChallenge 2018/09

30 September 2018 (programming haskell fpga electronics retrochallenge retro clash chip-8)

My last post ended with some sample CλaSH code illustrating the spaghetti-ness you can get yourself into if you try to describe a CPU directly as a function (CPUState, CPUIn) -> (CPUState, CPUOut). I promised some ideas for improving that code.

To start off gently, first of all we can give names to some common internal operations to help readability. Here's the code from the previous post rewritten with a small kit of functions:

intensionalCPU (s0, CPUIn{..}) = case phase s of
    WaitKeyPress reg ->
        let s' = case cpuInKeyEvent of
                Just (True, key) -> goto Fetch1 . setReg reg key $ s
                _ -> s
            out = def{ cpuOutMemAddr = pc s' }
        in (s', out)                   
    Fetch1 ->
        let s' = goto Exec s{ opHi = cpuInMem, pc = succ $ pc s }
            out = def{ cpuOutMemAddr = pc s' }
        in (s', out)
  where
    s | cpuInVBlank = s0{ timer = fromMaybe 0 . predIdx $ timer s0 }
      | otherwise = s0

Using a `State` monad

To avoid the possibility of screwing up the threading of the internal state, we can use the State CPUState monad:

stateCPU :: CPUIn -> State CPUState CPUOut      
stateCPU CPUIn{..} = do
    when cpuInVBlank $
        modify $ \s -> s{ timer = fromMaybe 0 . predIdx $ timer s }
    gets phase >>= \case 
      WaitKeyPress reg -> do
        forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do
            setReg reg key
            goto Fetch1
        return def{ cpuOutMemAddr = pc s' }
      Fetch1 -> do
        modify $ \s -> s{ opHi = cpuInMem, pc = succ $ pc s }
        goto Exec
        return def{ cpuOutMemAddr = pc s' }

At this point, the state-accessing actions either come from the kit, because they are useful in many parts of our CPU description (like setReg), or they read or change individual fields of the CPUState record. This second group can benefit from using lenses:

stateCPU CPUIn{..} = do
    when cpuInVBlank $
        timer %= fromMaybe 0 . predIdx
    use phase >>= \case 
      WaitKeyPress reg -> do
        forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do
            setReg reg key
            goto Fetch1
        return def{ cpuOutMemAddr = pc s' }
      Fetch1 -> do
        opHi .= cpuInMem
        pc %= succ
        goto Exec
        return def{ cpuOutMemAddr = pc s' }

Problems with composability

While this code is much more readable than the original one, and allows describing the internal state changes piecewise, it still has a problem with composability. To illustrate this, let's add some more functionality by implementing some CPU instructions (simplified a bit from a real CHIP-8 which can only write to memory through a save-registers instruction and even that is addressed by a dedicated pointer register; and accessing the framebuffer is also done via multiple-byte blitting only).

stateCPU CPUIn{..} = do
    when cpuInVBlank $
        timer %= fromMaybe 0 . predIdx
    use phase >>= \case 
      WaitKeyPress reg -> do
        forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do
            setReg reg key
            goto Fetch1
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Idle -> do
        goto Fetch1
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Fetch1 -> do
        opHi .= cpuInMem
        pc %= succ
        goto Exec
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Exec -> do
        opLo .= cpuInMem
        pc %= succ
        goto Fetch1
        decode >>= \case
          WaitKey reg -> do
            goto $ WaitKeyPress reg
            pc <- use pc    
            return def{ cpuOutMemAddr = pc }
          LoadImm reg imm -> do
            setReg reg imm
            pc <- use pc    
            return def{ cpuOutMemAddr = pc }
          WriteMem addr reg -> do
            val <- getReg reg
            goto Idle -- flush write to RAM before starting next fetch
            return def{ cpuOutMemAddr = addr, cpuOutMemWrite = Just val }           
          SetPixel regX regY -> do
            x <- getReg regX
            y <- getReg regY
            pc <- use pc
            return def{ cpuOutMemAddr = pc, cpuOutFBAddr = (x, y), cpuOutFBWrite = Just True }          
          -- And a lot more opcodes here

Since there are going to be a lot of opcodes to support, it could seem like a good idea to refactor the Exec state to be a separate function, leaving only the implementation of the other control phases in the main function:

stateCPU cpuIn@CPUIn{..} = do
    when cpuInVBlank $
        timer %= fromMaybe 0 . predIdx
    use phase >>= \case 
      WaitKeyPress reg -> do
        forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do
            setReg reg key
            goto Fetch1
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Idle -> do
        goto Fetch1
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Fetch1 -> do
        opHi .= cpuInMem
        pc %= succ
        goto Exec
        pc <- use pc    
        return def{ cpuOutMemAddr = pc }
      Exec -> do
        opLo .= cpuInMem
        pc %= succ
        goto Fetch1
        decode >>= exec cpuIn
        
exec CPUIn{..} (WaitKey reg) = ... -- same as previous version
exec CPUIn{..} (LoadImm reg imm) = ... -- same as previous version
exec CPUIn{..} (WriteMem addr reg) = ...
exec CPUIn{..} (SetPixel regX regY) = ...

However, this becomes problematic if we want to set some parts of the CPUOut result in a generic way, outside exec. Suppose we implement sound support, which in CHIP-8 is done via another 60Hz timer:

stateCPU cpuIn@CPUIn{..} = do
    when cpuInVBlank $ do
        timer %= fromMaybe 0 . predIdx
        soundTimer %= fromMaybe 0 . predIdx
    soundOn <- uses soundTimer (/= 0)
    def <- return def{ cpuOutSound = soundOn }

This takes care of setting cpuOutSound in the branches for WaitKeyPress, Idle and Fetch1 (albeit in an ugly way, by shadowing def), but now this needs to be passed as an extra parameter to exec (or the code to set cpuOutSound needs to be duplicated).

Assembling the output piecewise

So what we want is a way to specify the resulting CPUOut such that

Reasonable defaults (like reading the next instruction from RAM via cpuOutMemAddr = pc) are applied automatically, without requiring any code. Note that the defaults shouldn't need to be static; they can depend on the state.
Various parts of the CPU implementation can contribute to the final CPUOut result that is visible from the outside.

This has led me to a design where the CPU is described by a type CPU i s o = RWS i (Endo o) s:

The State axis is, of course, the internal state, i.e. the CPU registers, as before.
The Reader axis provides easy access to the inputs (as of the current clock cycle) to (potentially deeply nested) functions, without having to pass anything around by hand.
The Writer axis collects an Endo CPUOut; in other words, a function CPUOut -> CPUOut with "later" functions applied after (and thus overriding) "earlier" functions. Thus, the final collected Endo CPUOut can be applied to a CPUOut computed as the default output for the final CPUState:
```
runCPU :: (s -> o) -> CPU i s o () -> (i -> State s o)
runCPU mkDef cpu inp = do
    s <- get
    let (s', f) = execRWS (unCPU cpu) inp s
    put s'
    def <- gets mkDef
    return $ appEndo f def
```

With this setup, we can rewrite our CPU implementation as:

cpu = do
    CPUIn{..} <- input
    when cpuInVBlank $
        timer %= fromMaybe 0 . predIdx
        soundTimer %= fromMaybe 0 . predIdx
    soundOn <- uses soundTimer (/= 0)
    when soundOn $ do
        tell $ \out -> out{ cpuOutSound = True }

    use phase >>= \case 
      WaitKeyPress reg -> do
        forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do
            setReg reg key
            goto Fetch1
      Idle -> do
        goto Fetch1
      Fetch1 -> do
        opHi .= cpuInMem
        pc %= succ
        goto Exec
      Exec -> do
        opLo .= cpuInMem
        pc %= succ
        goto Fetch1
        decode >>= exec

exec (WaitKey reg) = do
    goto $ WaitKeyPress reg
exec (LoadImm reg imm) = do
    setReg reg imm
exec (WriteMem addr reg) = do
    val <- getReg reg
    writeMem addr val
    goto Idle -- flush write to RAM before starting next fetch
exec (SetPixel regX regY) = do
    x <- getReg regX
    y <- getReg regY
    writeFB (x, y) True

Most branches don't need to do anything to get the correct final CPUOut; and just like we were able to name the State kit of setReg &c, we can define

The system described above is implemented here for the reusable library definitions and here for CHIP-8 specifically; the version that uses lenses is in the branch linked above, since I am not 100% convinced yet that the cognitive overhead of the lens library is worth it once enough kit is developed around state access.

Future directions

There are two questions about the above that I feel that haven't been resolved yet:

Is it a good idea to include ContT in the CPU monad stack? On one hand, it is known to be problematic to compose ContT with Reader/Writer; on the other hand, here we only need ask and tell, not local or pass. On the gripping hand, ContT might be useful, for example, to describe CPU sub-states that wait for some external output before proceeding normally, as an early exit from the CPU monad.
Can the CPUState and Phase types themselves be defined piecewise? Is there some unholy union of extensible effects, extensible records and extensible sum types that would allow us to, informally speaking, define the WaitKeyPress phase together with its handler, in the middle of exec where the WaitKey instruction is implemented?

RetroChallenge 2018/09 wrap-up

All in all, even though I couldn't get as far as I originally hoped, I am glad I at least managed to build a CHIP-8 computer in CλaSH that can boot up on real hardware and play original CHIP-8 games; that's pretty much what I originally set out to do. The RetroChallenge format of detailed blog posts concurrent with the actual development work being done is quite taxing on me, because I am slow to write these posts; overall, I have probably spent about half and half of my RetroChallenge time on coding/design vs. writing about it. But the fixed one-month timeframe and the great progress reports by the other participants have been great motivators; I would probably try something next time where I can expect less uknown unknowns.

There is still work to be done even on this CHIP-8 computer: I've found some games that don't work as expected when they try to use the built-in hexadecimal digit font; and there are of course the previously discussed performance problems inherent in addressing the frame buffer as single bits. Also, I would like to adapt it to use a real four-by-four keypad (by scanning the key rows from the FPGA) and a PCD8544 monochrome matrix LCD

And then, of course, further exploration of the design space of CPU descriptions discussed above; and then moving on to a "real" CPU, probably by rewriting my Lava 6502 core. Also, I will need to get a new FPGA dev board with a chip that is supported by a contemporary toolchain, so that I won't get bogged down by problems like the Spartan-6 synthesis bug.

« Integrating Verilator and Clash via Cabal

All posts

CPU modeling in CλaSH »