Dromedary and a half - My First GADT

GADTs in OCaml… You probably don’t need them, but –who knows– maybe one day you will. Also they can make for fun puzzles to figure out. So here’s a small tutorial to get you started.

The primary use of GADTs is to delegate the maintenance of some invariants of your code to the OCaml compiler’s type checker. In other words, you can use GADTs to encode custom additional typing rules which relate to some aspect of your program’s logic (as opposed to the built-in typing rules which relate to the runtime safety of all programs).

Prologue

A few notes before we start.

You probably don’t need GADTs. That’s because you can use some other OCaml features to encode many of of the invariants that you might care about. For example, you can use a module with a small interface around an abstract or private type. You can also use runtime checks and return option or Either.t values.

What even are GADTs? “GADT” stands for Generalised Algebraic Data Type. I don’t know why it’s Generalised, but the ADT part is essentially your sum types and product types. Sums and products being the stuff of algebra, we have ADTs.

There are two syntaxes for sum types in OCaml, only the more recent one can be used in GADTs. Here’s a comparison of the two syntaxes for the option type:

(* Classic, terse syntax *)
type 'a option =
   | None
   | Some of 'a

(* Modern, verbose, GADT-compatible syntax*)
type _ option =
   | None : 'a option
   | Some : 'a -> 'a option

There are two kinds of product types in OCaml: tuples and records. This tutorial uses a mix of them just like most of the code out there.

This tutorial has several parts. This first part starts with a simple GADT definition and shows how to manipulate values of that type. It doesn’t delve into anything too advanced.

A second part focuses on different techniques to use when making GADTs. A third part focuses on real-world examples.

All the code is available to download and play with. Files are linked to at different points of the tutorial.

My Very First GADT

For your very first GADT, you will make a type which is analogous to int list but with a type parameter indicating whether the list is empty or not. In other words, you make a type where the possible emptiness of a list (of integers) can be tracked by the type system.

First, declare some types which represent the property that you want to track.

type empty = Empty
type nonempty = NonEmpty

Then, declare a type int_list with a single type parameter which is instantiated with the property types.

type _ int_list =
   | Nil : empty int_list
   | Cons : int * _ int_list -> nonempty int_list

Note how each constructor (Nil and Cons) instantiates the type parameter differently:
Nil : empty int_list
Cons : … -> nonempty int_list

Based on this information, the type checker is able to assign different property types for the parameter of different values.

let nil : empty int_list = Nil
let onetwo : nonempty int_list = Cons (1, Cons (2, Nil))
(* you don't need the type annotations,
   the compiler can infer them. *)

That’s it, your very first GADT.

Using the GADT: a specialised function

Because you can enforce emptiness/fullness, you can write exception free version of, say, Stdlib.List.hd:

let hd
: nonempty int_list -> int
= fun (Cons (x, _)) -> x

This function’s type restrict the input to only nonempty int_list. This has two effects: First, it prevents callers from passing an empty int_list value. In other words, you encoded an additional rule for the type system to track. Second, it allows you to only consider the Cons constructor within the body of the function.

You can try to break a few things: call hd Nil, add a Nil case to the function definition, change the type declaration of the function. Try it!

Using the GADT: a generic function

You can also write functions that are generic and accept both empty and nonempty parameters.

let is_empty
: type e. e int_list -> bool
= function
   | Nil -> true
   | Cons _ -> false

Note the type e. prefix inside the type annotation of the function. This is needed because the type parameter e is different in different parts of the function. Without it the compiler gives an error. Try it!

You can also write generic recursive functions. The type annotation uses the same prefix for the same reason.

let rec iter
: type e. (int -> unit) -> e int_list -> unit
= fun f l ->
   match l with
   | Nil -> ()
   | Cons (x, xs) -> f x; iter f xs

Try to define a few useful functions for this type or check out the file veryfirst.ml.

Using the GADT: existential constructor

You can write generic functions, but you cannot create a generic data-structure which hosts values of either property indiscriminately. If you try, you get an error message. Try the following value declaration:

(* broken code: type error *)
let kvs = [
   ("a", Nil);
   ("b", Cons (0, Nil));
   ("c", Cons (1, Cons (4, Nil)));
]

The error makes sense because the type of this value would be (string * _ int_list) List.t and what does that _ even is in this context? Another way to think about the error is to think about the return type for the following function:
let find_value key = List.assoc key kvs

When you need to store values with different property parameters into a single data-structure, you need to introduce an existential constructor. It’s a new type with a new constructor which wraps the GADT so that the property type parameter disappears. By convention, it’s often call Any (because it wraps any of the GADT value) or E (for “exists”).

type any_int_list = Any : _ int_list -> any_int_list

You can use it to host values with different property parameters into a single data-structure.

let kvs = [
   ("a", Any Nil);
   ("b", Any (Cons (0, Nil)));
   ("c", Any (Cons (1, Cons (4, Nil))));
]

You can recover the wrapped/hidden type parameter locally by matching:

let iter_any f xs =
   match xs with
   | Any Nil -> ()
   | Any (Cons _ as xs) ->
      (* in this branch only, [xs] has type [nonempty int_list] *)
      iter f xs

Kinda useless

This GADT is sort of useless. Sure you can write exception-less versions of List.hd and such, but it’s not practical.

The main issue is that int_list is a new type: you cannot use functions from the Stdlib or any existing library. You have to rewrite everything you need yourself. (You could convert to int Seq.t but then you lose the type information.)

Note that because of the didactic intent behind int_list, I cut some corners in making it. It can be made somewhat practical. But as is it is not.

A better alternative is to use a private type alias. See altveryfirst.ml for an example.

Part 2

Now that you know the basics of how to define and use a GADT, check out the next part of this tutorial:
GADT Tips and Tricks.

My First GADT