## Typed embedding of STLC into Haskell

5 February 2015 (programming haskell language correctness)Someone posted to the Haskell subreddit this blogpost of Lennart where he goes step-by-step through implementing an evaluator and type checker for CoC. I don't know why this post from 2007 showed up on Reddit this week, but it's a very good post, pedagogically speaking. Go and read it.

In this post, I'd like to elaborate on the simply-typed lambda calculus part of his blogpost. His typechecker defines the following types for representing STLC types, terms, and environments:

data Type = Base | Arrow Type Type deriving (Eq, Show) type Sym = String data Expr = Var Sym | App Expr Expr | Lam Sym Type Expr deriving (Eq, Show)

The signature of the typechecker presented in his post is as follows:

type ErrorMsg = String type TC a = Either ErrorMsg a newtype Env = Env [(Sym, Type)] deriving (Show) tCheck :: Env -> Expr -> TC Type

My approach is to instead create a representation of terms of STLC in such a way that only well-scoped, well-typed terms can be represented. So let's turn on a couple of heavy-weight language extensions from GHC 7.8 (we'll see how each of them is used), and define a typed representation of STLC terms:

{-# LANGUAGE GADTs, StandaloneDeriving #-} {-# LANGUAGE DataKinds, KindSignatures, TypeFamilies, TypeOperators #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TemplateHaskell #-} -- sigh... import Data.Singletons.Prelude import Data.Singletons.TH import Data.Type.Equality -- | A (typed) variable is an index into a context of types data TVar (ts :: [Type]) (a :: Type) where Here :: TVar (t ': ts) t There :: TVar ts a -> TVar (t ': ts) a deriving instance Show (TVar ctx a) -- | Typed representation of STLC: well-scoped and well-typed by construction data TTerm (ctx :: [Type]) (a :: Type) where TConst :: TTerm ctx Base TVar :: TVar ctx a -> TTerm ctx a TLam :: TTerm (a ': ctx) b -> TTerm ctx (Arrow a b) TApp :: TTerm ctx (Arrow a b) -> TTerm ctx a -> TTerm ctx b deriving instance Show (TTerm ctx a)

The idea is to represent the context of a term as a list of
types of variables in scope, and index into that list, de
Bruijn-style, to refer to variables. This indexing operation
maintains the necessary connection between the pointer and the
type that it points to. Note the type of the `TLam`
constructor, where we extend the context at the front for the
inductive step.

To give a taste of how convenient it is to work with this representation programmatically, here's a total evaluator:

-- | Interpretation (semantics) of our types type family Interp (t :: Type) where Interp Base = () Interp (Arrow t1 t2) = Interp t1 -> Interp t2 -- | An environment gives the value of all variables in scope in a given context data Env (ts :: [Type]) where Nil :: Env '[] Cons :: Interp t -> Env ts -> Env (t ': ts) lookupVar :: TVar ts a -> Env ts -> Interp a lookupVar Here (Cons x _) = x lookupVar (There v) (Cons _ xs) = lookupVar v xs -- | Evaluate a term of STLC. This function is total! eval :: Env ctx -> TTerm ctx a -> Interp a eval env TConst = () eval env (TVar v) = lookupVar v env eval env (TLam lam) = \x -> eval (Cons x env) lam eval env (TApp f e) = eval env f $ eval env e

Of course, the problem is that this representation is not at all convenient for other purposes. For starters, it is certainly not how we would expect human beings to type in their programs.

My version of the typechecker is such that instead of giving the
type of a term (when it is well-typed), it instead transforms
the loose representation (`Term`) into the tight one
(`TTerm`). A `Term` is well-scoped and
well-typed (under some binders) iff there is a
`TTerm` corresponding to it. Let's use singletons
to store type information in existential positions:

$(genSingletons [''Type]) $(singDecideInstance ''Type) -- | Existential version of 'TTerm' data SomeTerm (ctx :: [Type]) where TheTerm :: Sing a -> TTerm ctx a -> SomeTerm ctx -- | Existential version of 'TVar' data SomeVar (ctx :: [Type]) where TheVar :: Sing a -> TVar ctx a -> SomeVar ctx -- | A typed binder of variable names data Binder (ctx :: [Type]) where BNil :: Binder '[] BCons :: Sym -> Sing t -> Binder ts -> Binder (t ': ts)

Armed with these definitions, we can finally define the type inferer. I would argue that it is no more complicated than Lennart's version. In fact, it has the exact same shape, with value-level equality tests replaced with Data.Type.Equality-based checks.

-- | Type inference for STLC infer :: Binder ctx -> Term -> Maybe (SomeTerm ctx) infer bs (Var v) = do TheVar t v' <- inferVar bs v return $ TheTerm t $ TVar v' infer bs (App f e) = do TheTerm (SArrow t0 t) f' <- infer bs f TheTerm t0' e' <- infer bs e Refl <- testEquality t0 t0' return $ TheTerm t $ TApp f' e' infer bs (Lam v ty e) = case toSing ty of SomeSing t0 -> do TheTerm t e' <- infer (BCons v t0 bs) e return $ TheTerm (SArrow t0 t) $ TLam e' inferVar :: Binder ctx -> Sym -> Maybe (SomeVar ctx) inferVar (BCons u t bs) v | v == u = return $ TheVar t Here | otherwise = do TheVar t' v' <- inferVar bs u return $ TheVar t' $ There v' inferVar _ _ = Nothing

Note that pattern matching on `Refl` in the
`App` case brings in scope type equalities that are
crucial to making `infer` well-typed.

Of course, because of the existential nature of
`SomeVar`, we should provide a typechecker as well
which is a much more convenient interface to work with:

-- | Typechecker for STLC check :: forall ctx a. (SingI a) => Binder ctx -> Term -> Maybe (TTerm ctx a) check bs e = do TheTerm t' e' <- infer bs e Refl <- testEquality t t' return e' where t = singByProxy (Proxy :: Proxy a) -- | Typechecker for closed terms of STLC check_ :: (SingI a) => Term -> Maybe (TTerm '[] a) check_ = check BNil

(The `SingI a` constraint is an unfortunate
implementation detail; the kind of `a` is
`Type`, which is closed, so GHC should be able to
know there is always going to be a `SingI a`
instance).

To review, we've written a **typed** embedding of
STLC into Haskell, with a total evaluator and a typechecker, in
about 110 lines of code.

If we were doing this in something more like Agda, one possible
improvement would be to define a function `untype :: TTerm
ctx a -> Term` and use that to give `check`
basically a type of `Binder ctx -> (e :: Term) -> Either
((e' :: TTerm ctx a) -> untype e' == e -> Void) (TTerm ctx
a)`, i.e. to give a proof in the non-well-typed case as well.