From 9e0a77282691b6a3b9de61a0cd0bfb36975b1851 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 14 May 2019 15:03:30 +0200 Subject: [PATCH] blog post on the typeclass index issue --- .../_drafts/typeclasses-exponential-blowup.md | 214 +++++++++++++++++- 1 file changed, 209 insertions(+), 5 deletions(-) diff --git a/ralf/_drafts/typeclasses-exponential-blowup.md b/ralf/_drafts/typeclasses-exponential-blowup.md index bfff0ae..0343bb5 100644 --- a/ralf/_drafts/typeclasses-exponential-blowup.md +++ b/ralf/_drafts/typeclasses-exponential-blowup.md @@ -1,13 +1,217 @@ --- -title: "Exponential blowup when using typeclasses with nested indices" +title: "Exponential blowup when using unbundled typeclasses to model algebraic hierarchies" categories: research coq --- -When formalizing a proof in an interactive theorem prover like [Coq](https://coq.inria.fr/), one reoccuring issue is handling algebraic hierarchies. -Such hierarchies are everywhere: Some operations are associative, while others also commute; some types have an equivalence relation, while others also have a (pre-)order or maybe even a well-ordering; and so on. +When formalizing a proof in an interactive theorem prover like [Coq](https://coq.inria.fr/), one reoccuring issue is the handling of algebraic hierarchies. +Such hierarchies are everywhere: some operations are associative, while others commute; some types have an equivalence relation, while others also have a (pre-)order or maybe even a well-ordering; and so on. So the question arises: What is the best way to actually encode these hierarchies in Coq? Coq offers two mechanism that are suited to solve this task: typeclasses and canonical structures. Both can be instrumented in different ways to obtain a (more or less) convenient-to-use algebraic hierarchy. -A common approach using typeclasses is the ["unbundled" approach by Bas Spitters and Eelis van der Weegen](https://arxiv.org/abs/1102.1323). -However, as [has been observed before](https://pastel.archives-ouvertes.fr/pastel-00649586), and as learned the hard way in the Coq formalization of the [original Iris paper](https://iris-project.org/pdfs/2015-popl-iris1-final.pdf), this approach quickly leads to terms that seem to be exponential in size. +A common approach using typeclasses is the ["unbundled" approach by Bas Spitters and Eelis van der Weegen](http://www.eelis.net/research/math-classes/mscs.pdf). +However as we learned the hard way in the Coq formalization of the [original Iris paper](https://iris-project.org/pdfs/2015-popl-iris1-final.pdf), this approach quickly leads to terms that seem to be exponential in size. +In this post, I will explain the cause of this exponential blowup. + + +I should note that this observation is not new, it already occurs in [François Garillot's PhD thesis](https://pastel.archives-ouvertes.fr/pastel-00649586). +The goal of this post is to provide a more self-contained presentation, not requiring all the context set up in that thesis. + +## Unbundled groups + +A formalization of the algebraic hierarchy for groups using the unbundled approach could look as follows: +{% highlight coq %} +Require Import Lia ZArith. + +(** The operations as typeclasses for overloading, without any axioms. *) +Class Op (A: Type) := op : A -> A -> A. +Infix "+" := op. + +Class Unit (A: Type) := unit : A. + +Class Inverse (A: Type) := inv : A -> A. + +(** The algebraic classes adding the axioms, taking the operations as indices. *) +Class Semigroup (A: Type) `{Op A} := { + assoc a b c : (a + b) + c = a + (b + c); +}. + +Class Monoid (A: Type) `{Semigroup A, Unit A} := { + id_r a : a + unit = a; + id_l a : unit + a = a; +}. + +Class Group (A: Type) `{Monoid A, Inverse A} := { + inv_r a : a + inv a = unit; + inv_l a : inv a + a = unit; +}. +{% endhighlight %} + +As usual for the unbundled approach, we end up with lots of tiny typeclasses. +`Op`, `Unit` and `Inverse` serve as operators that can be overloaded. +They each just take a type `A` as argument (we say they are *indexed* by the type: we can do typeclass search that, given the type, finds the instance). +`Semigroup`, `Monoid` and `Group` supply the laws for these operators; they are indexed by the operators they involve and the typeclasses from which they "inherit". +Taking the superclasses as indices instead of embedding them as fields simplifies typeclass search and is necessary to support diamonds in the hierarchy; see [the original paper](http://www.eelis.net/research/math-classes/mscs.pdf) for more details. + +We use Coq's feature for anonymous implicit generalization: `` `{Semigroup A} `` adds not only a parameter of type `Semigroup A` that we don't have to name; furthermore all typeclass arguments of `Semigroup` (in this case, `Op A`) are also implicitly added as parameter. +This avoids lots of repetition that the unbundled style would usually incur, [but the semantics of implicit generalization are quirky and poorly documented](https://coq.discourse.group/t/documentation-of-foo/129). + +Next, we can show that the integers `Z` form a group: +{% highlight coq %} +Instance Z_op: Op Z := Z.add. +Instance Z_unit: Unit Z := 0%Z. +Instance Z_inv: Inverse Z := Z.opp. + +Instance Z_semigroup: Semigroup Z. +Proof. split. intros. unfold op, Z_op. lia. Qed. +Instance Z_monoid: Monoid Z. +Proof. split; intros; unfold op, Z_op, unit, Z_unit; lia. Qed. +Instance Z_group: Group Z. +Proof. split; intros; unfold op, Z_op, unit, Z_unit, inv, Z_inv; lia. Qed. +{% endhighlight %} +We let `lia` do most of the proving, but we have to unfold some definitions first. + +We can also show that a product of two {semigroups, monoids, groups} is itself a {semigroup, monoid, group}: +{% highlight coq %} +Section prod. + Context {A B: Type}. + + Global Instance prod_op `{Op A, Op B}: Op (A * B) := + fun '(a1, b1) '(a2, b2) => (a1 + a2, b1 + b2). + Global Instance prod_unit `{Unit A, Unit B}: Unit (A * B) := + (unit, unit). + Global Instance prod_inv `{Inverse A, Inverse B}: Inverse (A * B) := + fun '(a, b) => (inv a, inv b). + + Global Instance prod_semigroup `{Semigroup A, Semigroup B}: Semigroup (A * B). + Proof. + split. intros [a1 b1] [a2 b2] [a3 b3]. unfold op, prod_op. + rewrite !assoc. reflexivity. + Qed. + Global Instance prod_monoid `{Monoid A, Monoid B}: Monoid (A * B). + Proof. + split; intros [a b]; unfold op, prod_op, unit, prod_unit; + rewrite ?id_l, ?id_r; reflexivity. + Qed. + Global Instance prod_group `{Group A, Group B}: Group (A * B). + Proof. + split; intros [a b]; unfold op, prod_op, unit, prod_unit, inv, prod_inv; + rewrite ?inv_l, ?inv_r; reflexivity. + Qed. +End prod. +{% endhighlight %} + +## Unexpected complexities + +Seems reasonable? +Unfortunately, we have set ourselves up for some unexpectedly large terms: +the instance of something like `Group (Z*Z*Z*Z*Z*Z)` grows with the fifth power as the number of `Z` increase! + +{% highlight coq %} +Definition test: Group (Z*Z*Z*Z*Z*Z) := _. +Set Printing All. +Print test. +(* Lots of output. *) +{% endhighlight %} + +What is happening here? +We can see that when we try to construct the proof terms more explicitly. +We start by defining `Zn`, which is basically `Z^(n+1)`: a product of `n+1` times `Z`. +{% highlight coq %} +Fixpoint Zn (n: nat): Type := + match n with + | O => Z + | S n => @prod (Zn n) Z + end. +{% endhighlight %} +Unsurprisingly, this term has size linear in `n`: the recursive case adds `@prod • Z` around the term produced for `n-1` (represented by the `•`). +This means the term grows by a constant when `n` increases by 1, leading to linear complexity overall. + +Next, we define `Op (Zn n)`: +{% highlight coq %} +Fixpoint Zn_op (n: nat): Op (Zn n) := + match n with + | O => Z_op + | S n => @prod_op (Zn n) Z (Zn_op n) Z_op + end. +{% endhighlight %} +In the recursive case, we use `prod_op` as one might expect. +Coq could infer all the arguments here, but let us take a closer look at what it would infer: what is the size of `Zn_op` as `n` grows? +In the recursive case, we add `@prod_op (Zn n) Z • Z_op` around the term produced for `n-1`. +This term includes `Zn n`, which we established is linear in size. +When `n` increases by 1, the term grows by a linear amount. +Thus, the overall term size is quadratic in `n`! +Ouch. +I don't know about you, but I found this surprising when I realized it for the first time. + +The index basically repeats all the information that is already present in the recursive call: +to extend a `Zn_op` by one more element, we have to repeat the full type for which we already constructed an `Op`. + +But, you might say, quadratic complexity is not so bad. +Computers are fast, we can handle that. +Unfortunately, I got bad news: things get worse as our indices start nesting. +Let us look at the instance for `Semigroup (Zn n)`: +{% highlight coq %} +Fixpoint Zn_semigroup (n: nat): Semigroup (Zn n) := + match n return @Semigroup (Zn n) (Zn_op n) with + | O => Z_semigroup + | S n => @prod_semigroup (Zn n) Z (Zn_op n) (Zn_semigroup n) Z_op Z_semigroup + end. +{% endhighlight %} +How big is this term? +Well, the term added as `n` increases by 1 is `@prod_semigroup (Zn n) Z (Zn_op n) • Z_op Z_semigroup`. +This includes `Zn_op n`, which is quadratic -- so `Zn_semigroup` is cubic! + +`Zn_semigroup` has an `Op (Zn n)` as index, and again the full nested term describing the operator of both semigroups that we take the product of is getting repeated each time we add just one more `Z`. +So, while `Op (Zn n)` has an index that makes it grow quadratically, `Semigroup (Zn n)` *has an index that has an index*, and grows cubically. +To compute the complexity of our terms, we have to figure out *how nested* our indices are: with no indices (such as `Zn n` itself), the term is linear; for each extra nested index the power increases by 1.[^1] +If we look at `Group`, we see that it has a `Monoid` index which has a `Semigroup` index which has an `Op` index which has a `Type` index. +That's 4 nested indices, and hence the terms have size `O(n^5)`. +More generally speaking, the term size is *exponential* in the height of our hierarchy. +That hurts. + +[^1]: Note that the *number* of indices does not matter, like the fact that `Monoid` has both an `Op` and a `Unit` index. That just affects the constant factor. It's the nesting of indices that gets us here. + +## Conclusion + +We have seen that the number of nested indices of a typeclass has exponential impact on the size of proof terms that arise when deriving typeclass instances for compound types. +This effect has grinded the original Iris coq development back in 2015 to a halt: compiling the correctness proof of the logic itself took around 3.5 hours and needed more than the 4GB of RAM than my laptop had back then. + +In my view, this basically rules out the unbundled approach for any serious formalization effort. +But what are the alternatives? + +Instead of making superclasses indices, we could make them fields, as in: +{% highlight coq %} +Class Monoid (A: Type) `{Op A, Unit A} := { + monoid_semigroup :> Semigroup A; + id_r a : a + unit = a; + id_l a : unit + a = a; +}. +{% endhighlight %} + +This limits the "index depth" to 1, and thus limits term complexity to `O(n^2)`. +However, this scheme does not support diamonds in the hierarchy as instances will be duplicated. +Also, the instance we implicitly added by writing `:>`, namely that `Monoid A -> Semigroup A`, means we'll check all `Monoid` instances any time we just want a `Semigroup`. +This leads to exponential search times when performing backtracking typeclass search: a `Semigroup (A*A)` can be obtained either by searching for a `Monoid (A*A)` or via `prod_semigroup`, and in case of backtracking Coq will go through a combinatorial explosion when trying all possible ways to derive `Semigroup (A*A*A*...)`.[^2] + +[^2]: In genaral, such combinatorial blowups happen very easily with backtracking, and they are hard to detect in advance. This is one reason why I believe that backtracking-per-default is a bad choice for proof search. + +Maybe there is a way to make the unbundled approach viable with some help from the proof assistant. +Some way to eliminate indices would help (similar to primitive projections, but also covering operations such as `prod_op`). +Or maybe the proof assistant could exploit the huge amounts of sharing that are present in these large terms. +If all subterms are properly shared, I think the overall term size remains linear. +Coq deduplicates term representation in memory, but (to my knowledge) does not exploit sharing when traversing the term (for unification or type-checking or any other purpose). +Defeating the exponential complexity here would require exploiting the sharing in *every* pass over the term. + +In Iris, we instead went with an adaptation of [this approach based on canonical structures](https://hal.inria.fr/hal-00816703v1/document). +This fully bundled approach avoids indices entirely and thus maintains a linear term size. +On the other hand, it comes with a lot of boilerplate code, and mixing canonical structures with typeclasses can be tricky to debug. (Unfortunately we cannot avoid typeclasses entirely as we need to have `Proper` instances in our algebraic classes.) +When composing types, Iris users have to write things like `prodC (listC natC) (optionC fracC)` instead of `(list nat) * (option frac)`, which is an additional barrier of entry to an already complicated development. +Also, I have to say I don't really have an intuitive grasp of canonical structure resolution, so when something does not work I don't even know how to start debugging. +On the other hand, the correctness proof of the logic *and* the general proof mode infrastructure *and* a few examples verified in the logic take less than five minutes to compile in our latest Coq formalization. +That's quite some speedup! + +All in all, I think the question of how to best represent algebraic hierarchies in a proof assistant is still wide open. +I am curious which ideas people will come up with! + +#### Footnotes -- 2.30.2