A tiny computer for Tiny BASIC
17 November 2020 (programming haskell clash fpga retro)Just a quick post to peel back the curtain a bit on a Clash example I posted to Twitter. My tweet in question shows the following snippet, claiming it is "a complete Intel 8080-based FPGA computer that runs Tiny BASIC":
logicBoard :: (HiddenClockResetEnable dom) => Signal dom (Maybe (Unsigned 8)) -> Signal dom Bool -> Signal dom (Maybe (Unsigned 8)) logicBoard inByte outReady = outByte where CPUOut{..} = intel8080 CPUIn{..} interruptRequest = pure False (dataIn, outByte) = memoryMap _addrOut _dataOut $ do matchRight $ do mask 0x0000 $ romFromFile (SNat @0x0800) "_build/intel8080/image.bin" mask 0x0800 $ ram0 (SNat @0x0800) mask 0x1000 $ ram0 (SNat @0x1000) matchLeft $ do mask 0x10 $ port $ acia inByte outReady
I got almost a hundred likes on this tweet, which isn't too bad for a topic as niche as hardware design with Haskell and Clash. Obviously, the above tweet was meant as a brag, not as a detailed technical description; but given the traction it got, I thought I might as well try to expand a bit on it.
Now, the background to all this is that I'm working on a book on retrocomputing with Clash. So in this post, I will try to condense the 75k words of text and the 5k lines of code that I have written so far; it's going to be more of an extended table of contents than a Cliffs notes.
Tiny BASIC is an interactive BASIC interpreter; these days, we would call it a REPL. The computer above runs one of the original, Intel 8080-based versions of Tiny BASIC as its firmware. When the computer is turned on, it boots straight into a BASIC prompt, just like the Commodore PETs and Apple II's of yesteryear. The software assumes there is a peripheral controller connected to certain output ports of the CPU; all IO is done via a stream of bytes written to and read from this controller.
Input and output
Let's start with the interface. We see that logicBoard has two input signals and one output signal. The Maybe (Unsigned 8) input is a byte coupled with an "input ready" line; in other words, the value is Just a byte if there is new input coming in in a given clock cycle, and Nothing otherwise. The output of the same type is, unsurprisingly, the output of the whole computer in the same format. The extra Bool input is for backpressure: some output peripherals might need time to process an output byte before they are ready for the next one.
We will need to connect these incoming and outgoing bytes to the outside world somehow, and in a real hardware implementation, the IO controller actually contained a UART so that input and output was serialized into a stream of bits. However, by uncoupling the UART from the ACIA implementation, and exposing input and output as byte events, we make it easy to hook up alternative communication media:
-
We can run the whole logicBoard in the Clash
simulator, and connect its input and output to terminal
events:
main :: IO () main = do sim <- simulateIO_ @System (uncurry logicBoard . unbundle) (Nothing, True) withTerminal $ runTerminalT $ forever $ sim $ \outByte -> do traverse_ printByte outByte inByte <- sampleKey return (inByte, True)
-
For a simple full hardware implementation that is true to the
Altair 8800 roots of Tiny BASIC, we can hook up a serial
receiver and transmitter. Incidentially,
this is an example of why we added the outReady
input to logicBoard, since serializing a byte via
a 9600 bps UART takes much longer than a single clock cycle of
any reasonable FPGA implementation.
topEntity :: "CLK" ::: Clock System -> "RESET" ::: Reset System -> "RX" ::: Signal System Bit -> "TX" ::: Signal System Bit topEntity = withEnableGen board where board rx = tx where outByte = logicBoard inByte outReady inByte = fmap unpack <$> serialRx (SNat @9600) rx (tx, outReady) = serialTx (SNat @9600) (fmap pack <$> outByte)
-
If we want sophisticated and flashy, why not connect to a PS/2
keyboard and a VGA
monitor and recreate that late seventies-early eighties
home computer vibe? I have posted about about consuming
PS/2 input and video
signal generation before; all we need is a bit of extra logic to implement
a textual video mode, and we're good to go. Note that we
need a 25.175 MHz clock to match the pixel clock of 640⨯480@60
VGA mode.
topEntity :: "CLK_25MHZ" ::: Clock Dom25 -> "RESET" ::: Reset Dom25 -> "PS2" ::: PS2 Dom25 -> "VGA" ::: VGAOut Dom25 8 8 8 topEntity = withEnableGen board where board ps2 = vga where (frameEnd, vga) = video cursor vidWrite (vidReady, cursor, vidWrite) = screenEditor outByte outByte = logicBoard inByte vidReady inByte = keyboard ps2
The CPU
The Intel 8080 core we are using is also, of course, written in Clash. I can't really go through its implementation in this small space; the chapter where we construct it from scratch is at 13k words and builds heavily on techniques introduced in earlier chapters describing a CPU that uses Brainfuck as its machine code and a CHIP-8 machine. In this post, we'll just look at its interface, to see how it fits into the logicBoard:
declareBareB [d| data CPUIn = CPUIn { dataIn :: Maybe Value , interruptRequest :: Bool } |] declareBareB [d| data CPUOut = CPUOut { _addrOut :: Maybe (Either Port Addr) , _dataOut :: Maybe Value , _interruptAck :: Bool , _halted :: Bool } |] type Signals dom b = b Covered (Signal dom) type Pure b = b Bare Identity type Partial b = Barbie (b Covered) Last intel8080 :: (HiddenClockResetEnable dom) => Signals dom CPUIn -> Signals dom CPUOut
We use the wonderful Barbies library for CPUIn and CPUOut so that we can switch between a record of signals and a signal of a record. Outwards, Signals dom CPUOut gives easy access to each output pin separately; but internally, the code describing a single clock cycle's state transition creates a full Pure CPUOut inside a Writer monad, with composable Partial CPUOut-producing fragments.
Looking back at logicBoard, we can see that it keeps interruptRequest at False and feeds the dataIn bus from the result of the memory mapper/address decoder. On the output side, addrOut and dataOut is, unsurprisingly, fed into the address decoder; the other pins of CPUOut are not used in this particular computer.
Address decoding
The Intel 8080 has a 16-bit address bus that it uses both for accessing memory and for communication with up to 256 peripheral controllers. The actual details of how a real Intel 8080 signals memory vs. IO port access is quite baroque; but because we are building full computers using an Intel 8080 ISA-compatible CPU, we don't look for pin compatibility. Instead, we capture the morally right addressing interface of the 8080 by using the type Either Port Address (or, with the type synonys resolved, Either (Unsigned 8) (Unsigned 16)) for addrOut. In this particular computer, the details of the original Tiny BASIC firmware prescribe the following memory layout:
- The two registers of the ACIA are accessed through port numbers 0x10 and 0x11.
- Since a real hardware Intel 8080 always starts execution at address 0x0000, we have to put ROM there. Since Palo Alto Tiny BASIC v2.0 is 1900 bytes, we round it up to 2 KB, giving us ROM from 0x0000 to 0x07ff.
- Starting at 0x0800, various states of the BASIC interpreter are stored before the actual program text. We need RAM starting at this address.
- Initialization code sets the stack pointer to 0x2000. Recalling that on the 8080, the stack grows towards 0, that means we need RAM at least up to that address.
Putting it all together, we need 6 KB of RAM from 0x0800 to 0x1fff. We split it into two parts for easier address decoding: 2 KB from 0x0800 to 0x0fff, and 4 KB from 0x1000 to 0x1fff:
.--------. | ROM 2KB|<---. `--------' | .------. | | | .------. .--------. | | | | |<--------<( inByte | RAM 2KB|<---|----->| 8080 |<--->| ACIA | `--------' | | | | |>-------->( outByte | | | | |<--------<( outReady .--------. | `------' `------' | RAM 4KB|<---' `--------'
We want to write this, and exactly this in logicBoard, and the abstraction we use for this is memoryMap. In an address decoder, we start with an address of the whole memory space, and iteratively cut it down into smaller sub-spaces, until we get to something small enough that it maps to a single component. In this case, we start with Either Port Address, and cut it down into e.g. an Unsigned 11 to address a 2K memory component. The dataOut writes and the dataIn reads are restricted along the address decoding, so that the write request only goes to the selected component, and the read result is taken from that as well.
Under the hood, the memory map description uses a Reader of the address and the write, and a Writer of the read result:
newtype Addressing dom addr dat a = Addressing { unAddressing :: ReaderT (Signal dom (Maybe addr), Signal dom (Maybe dat)) (Writer (Ap (Signal dom) (First dat))) a } deriving newtype (Functor, Applicative, Monad) memoryMap :: Signal dom (Maybe addr) -> Signal dom (Maybe dat) -> Addressing dom addr dat a -> (Signal dom (Maybe dat), a)
Why the Maybe in the address? Because we want to know if the CPU doesn't need memory access in a given cycle at all; this is useful when memory is shared between the CPU and other components. We can resolve memory access contention by prioritizing some peripherals (returning Nothing in dataIn, forcing the CPU to stall) and de-prioritizing others (by only allowing them to access their memory component when the addressing of that sub-space is Nothing).
The implementation of memoryMap and its combinators is not particularly interesting; mask routes based on the high bits and leaves the low bits for the sub-space to handle:
mask :: (KnownNat k, KnownNat n) => (HiddenClockResetEnable dom) => Unsigned (n + k) -> Addressing dom (Unsigned k) dat a -> Addressing dom (Unsigned (n + k)) dat a
ram0 and romFromFile simply wrap Clash memory primitives (zero-initialized synchronous RAM, and synchronous ROM initialized from a bitfile image) into Addressing-compliant forms, and port hooks up an IO peripheral that responds to PortCommands:
type Port dom addr dat a = Signal dom (Maybe (PortCommand addr dat)) -> (Signal dom (Maybe dat), a) port :: (HiddenClockResetEnable dom, NFDataX dat) => Port dom addr dat a -> Addressing dom addr dat a
This is also where we find out where the second component of memoryMap's return vaue comes from: the whole point of IO peripherals is that they can have connections, and as such, output signals, going to other parts of the circuit (or directly the outside world), not just the CPU's data bus.
The full code
The full Clash source code of the Tiny BASIC computer, including the Intel 8080 core, is available on Github. There is no user documentation at all yet; I've been using all the time and energy I can put into hacking for writing my book instead of tidying up the related repositories. So that's still something I need to get around to eventually; pull requests are obviously welcome!