Testing Seqes
Seqes
The Seqes library
provides monadic traversors for the Seq
module of
the OCaml Stdlib. Specifically, the Seqes library provides functors
parametrised over a monad, returning monad-friendly
Seq
-like modules and monad-friendly
Seq
-compatible modules.
module IO = struct
type 'a t = …
let return x = …
let bind x f = …
end
module SeqIO = Seqes.Standard.Make1(IO)
let dump strings =
lines
|> Seq.filter (fun string -> string <> "")
|> SeqIO.iter
(fun line ->
IO.bind
(write_string line)
(fun () -> write_char '\n'))
Motivation
The Octez project uses
the Lwt monad for IO, the
Result monad for
error-management, and the combination of the two for error-managed IO.
There is a
significant part of the source code providing monadic helpers for
different data-structures of the Stdlib: List
,
Map
,
Set
,
Hashtbl
,
etc. These modules provide the normal functionality from the Stdlib as
well as monadic traversors. They can be seen as a generalisation of Lwt_list
.
There is also a Seq
module. However, the Stdlib’s Seq
module exports types
which cannot be mixed freely with monads. Consequently, there are also
companion modules to Seq
in Octez: Seq_s
,
Seq_e
,
and Seq_es
.
Each one is a module similar to Seq
, but with monads baked
directly into the type. For example:
Stdlib.Seq : |
Seq_s : |
---|---|
|
|
These Seq*
modules were added at the time of OCaml
version 4.13, when the Seq
module was relatively small. The
code was written out entirely by hand — or, rather, by the use of
advanced text-editor features: a meta-programming of
sort.
With the release of OCaml 4.14, the Seq
module has grown
significantly. It has become unreasonable to maintain the
monad-specialised code. Hence Seqes.
Implementation
The development of Seqes started at the mirage-os retreat of 2022.
Discussions during this retreat have contributed greatly to the library.
The implementation is based on functors. Each functor of the library
takes a monad as parameter and returns a module which is identical to
Stdlib’s Seq
but with the added monad.
There are several functors because there are different kinds of
monads and you might want different things from them. This is explained
in more details in the documentation of the library. But all the
functors produce modules that are identical to the whole or to a subset
of the Seq
module with some monad in it.
There are a few cool tricks in the implementation of this library. But this post is focused on a different matter: the tests.
Testing the implementation
As mentioned above, discussions during the mirage-os retreat contributed to the library. Some of the most important discussions happened with Jan Midtgaard who knows a lot about randomised testing. He encouraged me to base the tests on a formal description of the library’s API (something he has done for other projects), very much shaping the tests of Seqes.
What to test?
As mentioned above the Seqes library produces modules which are
similar to the Stdlib Seq
module, with a light peppering of
monad. This is true of the signatures of the modules: they contain
exactly the same function types as the Stdlib Seq
module
with some monad type constructors here and there. This is also true of
the implementation of the modules: they contain exactly the same
function definitions as the Stdlib Seq
with some monadic
bind
and return
here and there.
(The library is released under the LGPL license because the bulk of
the content of the functors is based on the code from the Stdlib
Seq
module, with the added monadifications.)
In other words, the modules produced by the library’s functors are
supposed to be equivalent to the Stdlib Seq
module, modulo
the monad parameter. This property is easy to test, provided the monad
is simple:
module Identity = struct
type 'a t = 'a
let return x = x
let bind x f = f x
end
With this specific monad, the produced modules should be indistinguishable from the Stdlib’s. In other words,
given a module produced by the library
module SeqId = Seqes.Monadic1.Make(Identity)
given a function that appears in this module
Seq.find : 'a Seq.t -> ('a -> bool) -> 'a option SeqId.find : 'a SeqId.t -> ('a -> bool) -> 'a option Identity.t
(Note: the type
'a option Identity.t
is an alias for the type'a option
.)and given inputs that are the equivalent modulo the monad
s_stdlib : 'a Seq.t s_seqid : 'a SeqId.t f : 'a -> bool
the applications in
Seq
andSeqId
should lead to the same result modulo the monad:Seq.find f s_stdlib = SeqId.find f s_seqid
The “modulo the monad” wording is somewhat hand-wavy here. But in practice it’s easy to write down the exact property for any given function of the interface.
All in all, the Seqes library lends itself to property-based testing.
Property-based testing: how
The first time I encountered property-based testing in OCaml was when Gabriel Scherer wrote tests for a library I maintain: data-encoding. I was impressed with the amount of testing that could be done from so little code and I have been using this kind of tests a lot since then.
In OCaml there are several libraries dedicated to writing property-based tests.
- Crowbar: a solid library with support for the AFL fuzzer.
- Monolith: a library specialised in testing functions against a reference implementation.
- QCheck: a generic property-based testing library with good tooling integration and a large set of value generators.
I went for QCheck because it supports generators for functions, I was already familiar with this library, it has good integration with Alcotest, and I had an expert on hands from the very start (the aforementioned Jan).
In QCheck, a test is fully described by
- a generator for input values,
- a property over the inputs (i.e., a function from the inputs to a boolean),
- some optional parameters for tweaking the test,
- some optional metadata for making the test report readable.
QCheck2.Test.make :
(* some optional parameters here *)
'a QCheck2.Gen.t (* generator *)
-> ('a -> bool) (* property *)
-> QCheck2.Test.t
A formal description of the API
The property mentioned above (equivalence of Seq.find
and SeqId.find
) is easy to describe in QCheck. However, the
property is only one of many: one for each function of the signature.
And the whole suite of properties must be checked for each functor of
the library.
Writing this test suite by hand is unreasonably verbose. This verbosity would impact maintainability. But it could also hide some issues with the code: it makes it harder to check that the test suite is complete and correct.
Thankfully, it is possible to minimise boilerplate by the clever
application of GADTs. More
concretely, consider the signature of the Seq
module: only
a handful of types and type constructors are used; they can be described
by a data-type.
E.g., for the function
find_map : ('a -> 'b option) -> 'a Seq.t -> 'b option
The type-description data-type needs to include the following constructors:
type ty =
| Option of ty
| Seq of ty
| Lambda of ty * ty
(* … *)
Type parameters (the 'a
and 'b
) are
somewhat difficult to handle and so I simplified the problem by
considering that all the polymorphic parts of the interface are
instantiated with the char
type. (I chose char
because it doesn’t appear in the interface of Seq
at all
and because it is simple without being trivial.)
type ty =
(* … *)
| Char
(* … *)
As is, this ty
is not very usable. That’s because I
cannot write functions that consume values of this type to return
interestingly typed values. For example, I want to generate equality
functions based on those type descriptions. Such an equality function
should have a type that matches the values being checked. E.g.,
eq_of_ty (Option Char)
should have type
char option -> char option -> bool
. And so I need to
keep track of the type of the values the type of which is being
described.
After some experimentation the following type ended up covering all the testing needs:
type (_, _) ty =
| Unit : (unit, unit) ty
| Char : (char, char) ty
| Int : (int, int) ty
| Nat : (int, int) ty
| Bool : (bool, bool) ty
| Tup2 : ('va, 'ma) ty * ('vb, 'mb) ty ->
(('va * 'vb), ('ma * 'mb)) ty
| Option : ('v, 'm) ty -> ('v option, 'm option) ty
| Either : ('va, 'ma) ty * ('vb, 'mb) ty ->
(('va, 'vb) Either.t, ('ma, 'mb) Either.t) ty
| Lambda : ('vk, 'vr, 'mk, 'mr) params * ('vr, 'mr) ty ->
('vk, 'mk) ty
| Seq : ('va, 'ma) ty -> ('va Seq.t, 'ma SeqId.t) ty
| Monad : ('va, 'ma) ty -> ('va, 'ma) ty
and (_, _, _, _) params =
| [] : ('vr, 'vr, 'mr, 'mr) params
| ( :: ) : ('vp, 'mp) ty * ('vk, 'vr, 'mk, 'mr) params ->
('vp -> 'vk, 'vr, 'mp -> 'mk, 'mr) params
The type constructor ty
carries two type parameters for
the two sides of the property being checked: the Stdlib side and the
Seqes side. The Stdlib side parameters use v
as a prefix
(for vanilla), and the Seqes side parameters use m
(for monadic). E.g., the type descriptor
Option (Monad Char)
has the type
(char option, char option) ty
.
Note that the with the Identity
monad, the
Identity.t
type constructor can be omitted. And, in fact,
the Monad
constructor could be removed altogether. It is
left because of future plans to test against monads other than
Identity
. (Stay tuned for a future post about that.)
The Lambda
constructor has one argument for parameters
(of type params
) and one for return. The argument for
parameters describes the lambda’s parameters, using the
params
type. The params
type has four type
parameters because it needs to keep track of both the parameters types
as well as the return types, for both the Stdlib side and the monadic
side. For example,
let params
: ( (char -> char -> int Seq.t)
, int Seq.t
, (char -> char -> int SeqId.t)
, int SeqId.t
) params
= [Char; Char]
in
Lambda (params, Seq Int)
: ( (char -> char -> int Seq.t)
, (char -> char -> int SeqId.t)
) ty
From the formal description to the tests
Equipped with this formal type-description type, it is possible to
extract the building blocks of a QCheck tests, namely an equality
function and an input generator. The equality function
(eq_of_ty : ('v, 'm) ty -> ('v -> 'm -> bool)
) has
trivial ground types, simple type constructors, trivial lambdas, and
interesting sequences.
let eq_of_ty =
| Unit -> fun () () -> true
| Int -> fun a b -> a = b
(* … more ground types *)
| Option ty -> (fun a b -> match a, b with
| Some a, Some b -> eq_of_ty ty a b
| None, None -> true
| Some _, None -> false
| None, Some _ -> false
)
(* … more type constructors *)
| Lambda _ -> invalig_arg "eq_of_ty.Lambda"
| Seq ty -> (fun v m ->
let eq = eq_of_ty ty in
let rec loop n v m =
if v < 0 then
true (* only compare to a certain length *)
else
match v (), m () with
| Seq.Nil, SeqId.Nil -> true
| Seq.Cons (x, v), SeqId.Cons (y, m) ->
eq x v && loop (n-1) v m
| _ -> false
in
loop 100 v m)
| Monad ty -> eq_of_ty ty
The input generator function
(gen_of_ty : ('v, 'm) ty -> (v * m) QCheck2.Gen.t
)
generates a pair of equivalent-modulo-the-monad values. It has trivial
ground types, simple type constructor, complex lambdas, and simple
sequences.
let gen_of_ty =
| Unit -> QCheck2.Gen.map (fun x -> (x, x)) QCheck2.Gen.unit
| Int -> QCheck2.Gen.map (fun x -> (x, x)) QCheck2.Gen.int
(* … more group types *)
| Option ty ->
QCheck2.Gen.oneof [
QCheck2.Gen.return (None, None);
QCheck2.Gen.map (fun (v, m) -> (Some v, Some m)) (gen_of_ty ty);
]
(* … more type constructors *)
For the sequence, the function generates a list of values and then
converts it to a pair of the different sequence types — see
code. For the lambdas, the function generates a single vanilla
function, and then patches it by monadifying the parameters and return
value — see
code. This part is fiddly and it took several attempt to get right.
It’s also complicated to generalise based on the arity of the functions.
As a result, this part of the test helpers is somewhat verbose and it is
hard-coded to support only the arities actually used in the
Seq
interface.
Equipped with these two function, I can write
test_of_ty
:
let test_of_ty
: type vk vr mk mr.
string (* name of test *)
-> ( (vk, vr, mk, mr) params (* description of arguments *)
* (vr, mr) ty ) (* description of return values *)
-> vk (* stdlib function *)
-> mk (* monadic function *)
-> QCheck2.Test.t
= fun name (params, tyr) vf mf ->
match params with
| [] -> assert false (* no parameterless functions *)
| [tya] ->
QCheck2.Test.make
~name
(gen_of_ty tya) (* inputs generator *)
(fun (va, ma) ->
(eq_of_ty tyr) (* equality for return values *)
(vf va) (* value returned by stdlib *)
(mf ma) (* value returned by seqes *)
)
(* … support for more arities *)
Actually writing the tests down
To avoid some verbosity, I define a small DSL:
module DSL = struct
let unit = Unit
let data = Char
(* … more ground types *)
let ( * ) a b = Tup2 (a, b)
(* … more type constructors *)
let ( @-> ) p r =
(* used for inner lambdas *)
Lambda (p, r)
let ( --> ) p r =
(* used for top-level lambdas *)
let _ =
(* force type constraints *)
Lambda (p, r)
in
(p, r)
end
This makes test declarations easy:
let test_fold_left =
test_of_ty "fold_left"
DSL.([ [data; data] @-> data; data; seq data ] --> monad data)
Seq.fold_left
SeqId.fold_left
Testing the test suite
The test-suite is built on robust foundations which ensure some level of correctness. For example, a failure of a test cannot be due to a mismatch between the inputs generator and the function being checked.
Moreover, the ty
-based construction allows to separate
the tests into different components: the inputs generator, the equality
checker, the test declarations. Reviewing the test suite can be done one
component at a time. This is much easier than revieweing a test suite
without ty
: each test has its own generator which needs to
be reviewed.
Nonetheless, there could easily be issues in the test suite. The
simplest issue could be a missing test (say forgetting to test
iter
). Another issue would be to test the library function
(say exists
) against a different yet type-compatible
reference function (say for_all
); such an issue could hide
an implementation bug. A further issue would be that the tests would not
cover a good range of input values (say never generating empty
sequences).
So I wanted to test the test suite!
In another conversation with Jan he mentioned mutaml: a mutation testing tool for OCaml.
Mutation testing
Mutation testing is a technique wherein you modify your code and
check that the tests catch the modification. For example, you change
if n = 0 then …
into if n <> 0 then …
,
you run your test suite, and you check that the tests fail.
If a mutation goes undetected by your test suite, then the test suite is likely incomplete. (Alternatively, some of the code is dead-code.)
With a large enough set of mutations, we can increase confidence in the test suite. In order to obtain a large set of mutations, we need to automate the process of mutating and running the test suite. This is what mutaml provides.
mutaml
The mutaml project provides multiple tools which, taken together, let you exercise your test suite against many mutations.
The first tool is a ppx-rewriter which introduces mutations in your code.
dune build --instrument-with mutaml
This will generate environment-based conditional mutations: something
akin to
if (if env_has "MUTATION_XX" then n <> 0 else n = 0) then …
.
This approach allows the second tool to check many mutations without the
need to recompile each time.
mutaml-runner "dune exec --no-build test/pbt/test_monadic1.exe"
This mutaml-runner
generates a report: listing the test
status for each mutation. The third tool analyses this report. It shows
a summary of the findings (essentially an estimate of the coverage of
the test suite) as well as each of the mutations that went undetected
(if any). These mutations are presented as a simple diff, eliminating
the noisy environment-based approach that helps the runner:
E.g.,
Mutation "lib/standard.ml-mutant6" passed
--- lib/standard.ml
+++ lib/standard.ml-mutant6
@@ -163,7 +163,7 @@
return false
| Cons (y, ys) ->
let* b = f x y in
- if b then
+ if not b then
return true
else
exists2 f xs ys
Miscellaneous advice
Make sure to disable your dune-cache when running the commands above to avoid interferences with previous work.
Set the environment variables
QCHECK_LONG=true
andQCHECK_LONG_FACTOR=50
to make the tests more thorough and avoid false successes.Edit the
_build/default/mutaml-mut-files.txt
files to only run through a subset of the mutations.
Status of mutaml
The mutaml project is in alpha. It needs some more work (on the UX side most notably) but it is already very useful.
Using mutaml, I had a similar feeling as when I first started using property-based testing: that of having found a very useful and practical tool that can change my programming habits significantly and for the better. I fully intend to use mutaml in more and more projects.
I would recommend you give it a try.