Composable CPU descriptions in CλaSH, and wrap-up of RetroChallenge 2018/09
30 September 2018 (programming haskell fpga electronics retrochallenge retro clash chip-8)My last post ended with some sample CλaSH code illustrating the spaghetti-ness you can get yourself into if you try to describe a CPU directly as a function (CPUState, CPUIn) -> (CPUState, CPUOut). I promised some ideas for improving that code.
To start off gently, first of all we can give names to some common internal operations to help readability. Here's the code from the previous post rewritten with a small kit of functions:
intensionalCPU (s0, CPUIn{..}) = case phase s of WaitKeyPress reg -> let s' = case cpuInKeyEvent of Just (True, key) -> goto Fetch1 . setReg reg key $ s _ -> s out = def{ cpuOutMemAddr = pc s' } in (s', out) Fetch1 -> let s' = goto Exec s{ opHi = cpuInMem, pc = succ $ pc s } out = def{ cpuOutMemAddr = pc s' } in (s', out) where s | cpuInVBlank = s0{ timer = fromMaybe 0 . predIdx $ timer s0 } | otherwise = s0
Using a State monad
To avoid the possibility of screwing up the threading of the internal state, we can use the State CPUState monad:
stateCPU :: CPUIn -> State CPUState CPUOut stateCPU CPUIn{..} = do when cpuInVBlank $ modify $ \s -> s{ timer = fromMaybe 0 . predIdx $ timer s } gets phase >>= \case WaitKeyPress reg -> do forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do setReg reg key goto Fetch1 return def{ cpuOutMemAddr = pc s' } Fetch1 -> do modify $ \s -> s{ opHi = cpuInMem, pc = succ $ pc s } goto Exec return def{ cpuOutMemAddr = pc s' }
At this point, the state-accessing actions either come from the kit, because they are useful in many parts of our CPU description (like setReg), or they read or change individual fields of the CPUState record. This second group can benefit from using lenses:
stateCPU CPUIn{..} = do when cpuInVBlank $ timer %= fromMaybe 0 . predIdx use phase >>= \case WaitKeyPress reg -> do forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do setReg reg key goto Fetch1 return def{ cpuOutMemAddr = pc s' } Fetch1 -> do opHi .= cpuInMem pc %= succ goto Exec return def{ cpuOutMemAddr = pc s' }
Problems with composability
While this code is much more readable than the original one, and allows describing the internal state changes piecewise, it still has a problem with composability. To illustrate this, let's add some more functionality by implementing some CPU instructions (simplified a bit from a real CHIP-8 which can only write to memory through a save-registers instruction and even that is addressed by a dedicated pointer register; and accessing the framebuffer is also done via multiple-byte blitting only).
stateCPU CPUIn{..} = do when cpuInVBlank $ timer %= fromMaybe 0 . predIdx use phase >>= \case WaitKeyPress reg -> do forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do setReg reg key goto Fetch1 pc <- use pc return def{ cpuOutMemAddr = pc } Idle -> do goto Fetch1 pc <- use pc return def{ cpuOutMemAddr = pc } Fetch1 -> do opHi .= cpuInMem pc %= succ goto Exec pc <- use pc return def{ cpuOutMemAddr = pc } Exec -> do opLo .= cpuInMem pc %= succ goto Fetch1 decode >>= \case WaitKey reg -> do goto $ WaitKeyPress reg pc <- use pc return def{ cpuOutMemAddr = pc } LoadImm reg imm -> do setReg reg imm pc <- use pc return def{ cpuOutMemAddr = pc } WriteMem addr reg -> do val <- getReg reg goto Idle -- flush write to RAM before starting next fetch return def{ cpuOutMemAddr = addr, cpuOutMemWrite = Just val } SetPixel regX regY -> do x <- getReg regX y <- getReg regY pc <- use pc return def{ cpuOutMemAddr = pc, cpuOutFBAddr = (x, y), cpuOutFBWrite = Just True } -- And a lot more opcodes here
Since there are going to be a lot of opcodes to support, it could seem like a good idea to refactor the Exec state to be a separate function, leaving only the implementation of the other control phases in the main function:
stateCPU cpuIn@CPUIn{..} = do when cpuInVBlank $ timer %= fromMaybe 0 . predIdx use phase >>= \case WaitKeyPress reg -> do forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do setReg reg key goto Fetch1 pc <- use pc return def{ cpuOutMemAddr = pc } Idle -> do goto Fetch1 pc <- use pc return def{ cpuOutMemAddr = pc } Fetch1 -> do opHi .= cpuInMem pc %= succ goto Exec pc <- use pc return def{ cpuOutMemAddr = pc } Exec -> do opLo .= cpuInMem pc %= succ goto Fetch1 decode >>= exec cpuIn exec CPUIn{..} (WaitKey reg) = ... -- same as previous version exec CPUIn{..} (LoadImm reg imm) = ... -- same as previous version exec CPUIn{..} (WriteMem addr reg) = ... exec CPUIn{..} (SetPixel regX regY) = ...
However, this becomes problematic if we want to set some parts of the CPUOut result in a generic way, outside exec. Suppose we implement sound support, which in CHIP-8 is done via another 60Hz timer:
stateCPU cpuIn@CPUIn{..} = do when cpuInVBlank $ do timer %= fromMaybe 0 . predIdx soundTimer %= fromMaybe 0 . predIdx soundOn <- uses soundTimer (/= 0) def <- return def{ cpuOutSound = soundOn }
This takes care of setting cpuOutSound in the branches for WaitKeyPress, Idle and Fetch1 (albeit in an ugly way, by shadowing def), but now this needs to be passed as an extra parameter to exec (or the code to set cpuOutSound needs to be duplicated).
Assembling the output piecewise
So what we want is a way to specify the resulting CPUOut such that
- Reasonable defaults (like reading the next instruction from RAM via cpuOutMemAddr = pc) are applied automatically, without requiring any code. Note that the defaults shouldn't need to be static; they can depend on the state.
- Various parts of the CPU implementation can contribute to the final CPUOut result that is visible from the outside.
This has led me to a design where the CPU is described by a type CPU i s o = RWS i (Endo o) s:
- The State axis is, of course, the internal state, i.e. the CPU registers, as before.
- The Reader axis provides easy access to the inputs (as of the current clock cycle) to (potentially deeply nested) functions, without having to pass anything around by hand.
-
The Writer axis collects an Endo
CPUOut; in other words, a function CPUOut ->
CPUOut with "later" functions applied after (and thus
overriding) "earlier" functions. Thus, the final collected
Endo CPUOut can be applied to a
CPUOut computed as the default output for the
final CPUState:
runCPU :: (s -> o) -> CPU i s o () -> (i -> State s o) runCPU mkDef cpu inp = do s <- get let (s', f) = execRWS (unCPU cpu) inp s put s' def <- gets mkDef return $ appEndo f def
With this setup, we can rewrite our CPU implementation as:
cpu = do CPUIn{..} <- input when cpuInVBlank $ timer %= fromMaybe 0 . predIdx soundTimer %= fromMaybe 0 . predIdx soundOn <- uses soundTimer (/= 0) when soundOn $ do tell $ \out -> out{ cpuOutSound = True } use phase >>= \case WaitKeyPress reg -> do forM_ cpuInKeyEvent $ \(pressed, key) -> when pressed $ do setReg reg key goto Fetch1 Idle -> do goto Fetch1 Fetch1 -> do opHi .= cpuInMem pc %= succ goto Exec Exec -> do opLo .= cpuInMem pc %= succ goto Fetch1 decode >>= exec exec (WaitKey reg) = do goto $ WaitKeyPress reg exec (LoadImm reg imm) = do setReg reg imm exec (WriteMem addr reg) = do val <- getReg reg writeMem addr val goto Idle -- flush write to RAM before starting next fetch exec (SetPixel regX regY) = do x <- getReg regX y <- getReg regY writeFB (x, y) True
Most branches don't need to do anything to get the correct final CPUOut; and just like we were able to name the State kit of setReg &c, we can define
The system described above is implemented here for the reusable library definitions and here for CHIP-8 specifically; the version that uses lenses is in the branch linked above, since I am not 100% convinced yet that the cognitive overhead of the lens library is worth it once enough kit is developed around state access.
Future directions
There are two questions about the above that I feel that haven't been resolved yet:
- Is it a good idea to include ContT in the CPU monad stack? On one hand, it is known to be problematic to compose ContT with Reader/Writer; on the other hand, here we only need ask and tell, not local or pass. On the gripping hand, ContT might be useful, for example, to describe CPU sub-states that wait for some external output before proceeding normally, as an early exit from the CPU monad.
- Can the CPUState and Phase types themselves be defined piecewise? Is there some unholy union of extensible effects, extensible records and extensible sum types that would allow us to, informally speaking, define the WaitKeyPress phase together with its handler, in the middle of exec where the WaitKey instruction is implemented?
RetroChallenge 2018/09 wrap-up
All in all, even though I couldn't get as far as I originally hoped, I am glad I at least managed to build a CHIP-8 computer in CλaSH that can boot up on real hardware and play original CHIP-8 games; that's pretty much what I originally set out to do. The RetroChallenge format of detailed blog posts concurrent with the actual development work being done is quite taxing on me, because I am slow to write these posts; overall, I have probably spent about half and half of my RetroChallenge time on coding/design vs. writing about it. But the fixed one-month timeframe and the great progress reports by the other participants have been great motivators; I would probably try something next time where I can expect less uknown unknowns.
There is still work to be done even on this CHIP-8 computer: I've found some games that don't work as expected when they try to use the built-in hexadecimal digit font; and there are of course the previously discussed performance problems inherent in addressing the frame buffer as single bits. Also, I would like to adapt it to use a real four-by-four keypad (by scanning the key rows from the FPGA) and a PCD8544 monochrome matrix LCD
And then, of course, further exploration of the design space of CPU descriptions discussed above; and then moving on to a "real" CPU, probably by rewriting my Lava 6502 core. Also, I will need to get a new FPGA dev board with a chip that is supported by a contemporary toolchain, so that I won't get bogged down by problems like the Spartan-6 synthesis bug.