X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/a859220c739395803573893936de9e11da335e64..refs/heads/master:/ralf/_posts/2019-05-15-typeclasses-exponential-blowup.md diff --git a/ralf/_posts/2019-05-15-typeclasses-exponential-blowup.md b/ralf/_posts/2019-05-15-typeclasses-exponential-blowup.md deleted file mode 100644 index bc9a8c9..0000000 --- a/ralf/_posts/2019-05-15-typeclasses-exponential-blowup.md +++ /dev/null @@ -1,221 +0,0 @@ ---- -title: "Exponential blowup when using unbundled typeclasses to model algebraic hierarchies" -categories: research coq -forum: https://coq.discourse.group/t/blog-post-exponential-blowup-when-using-unbundled-typeclasses-to-model-algebraic-hierarchies/289 ---- - -When formalizing a proof in an interactive theorem prover like [Coq](https://coq.inria.fr/), one reoccurring issue is the handling of algebraic hierarchies. -Such hierarchies are everywhere: some operations are associative, while others commute; some types have an equivalence relation, while others also have a (pre-)order or maybe even a well-ordering; and so on. -So the question arises: What is the best way to actually encode these hierarchies in Coq? -Coq offers two mechanisms that are suited to solve this task: typeclasses and canonical structures. -Both can be instrumented in different ways to obtain a (more or less) convenient-to-use algebraic hierarchy. -A common approach using typeclasses is the ["unbundled" approach by Bas Spitters and Eelis van der Weegen](http://www.eelis.net/research/math-classes/mscs.pdf). -However as we learned the hard way in the Coq formalization of the [original Iris paper](https://iris-project.org/pdfs/2015-popl-iris1-final.pdf), this approach quickly leads to terms that seem to be exponential in size. -In this post, I will explain the cause of this exponential blowup. - - - -I should note that this observation is not new, it already occurs in [François Garillot's PhD thesis](https://pastel.archives-ouvertes.fr/pastel-00649586), a [paper by the same author](https://hal.inria.fr/inria-00368403v2/document) and even the authors of the "unbundled" paper already note having performance problems when scaling up. -The goal of this post is to provide a more self-contained presentation, not requiring all the context set up in that thesis and going into more details than the brief remarks the two papers. - -## Unbundled groups - -A formalization of the algebraic hierarchy for groups using the unbundled approach could look as follows: -{% highlight coq %} -Require Import Lia ZArith. - -(** The operations as typeclasses for overloading, without any axioms. *) -Class Op (A: Type) := op : A -> A -> A. -Infix "+" := op. - -Class Unit (A: Type) := unit : A. - -Class Inverse (A: Type) := inv : A -> A. - -(** The algebraic classes adding the axioms, taking the operations as indices. *) -Class Semigroup (A: Type) `{Op A} := { - assoc a b c : (a + b) + c = a + (b + c); -}. - -Class Monoid (A: Type) `{Semigroup A, Unit A} := { - id_r a : a + unit = a; - id_l a : unit + a = a; -}. - -Class Group (A: Type) `{Monoid A, Inverse A} := { - inv_r a : a + inv a = unit; - inv_l a : inv a + a = unit; -}. -{% endhighlight %} - -As usual for the unbundled approach, we end up with lots of tiny typeclasses. -`Op`, `Unit` and `Inverse` serve as operators that can be overloaded. -They each just take a type `A` as argument (we say they are *indexed* by the type: we can do typeclass search that, given the type, finds the instance). -`Semigroup`, `Monoid` and `Group` supply the laws for these operators; they are indexed by the operators they involve and the typeclasses from which they "inherit". -Taking the superclasses as indices instead of embedding them as fields simplifies typeclass search and is necessary to support diamonds in the hierarchy; see [the original paper](http://www.eelis.net/research/math-classes/mscs.pdf) for more details. - -We use Coq's feature for anonymous implicit generalization: `` `{Semigroup A} `` adds not only a parameter of type `Semigroup A` that we don't have to name; furthermore all typeclass arguments of `Semigroup` (in this case, `Op A`) are also implicitly added as parameter. -This avoids lots of repetition that the unbundled style would usually incur, [but the semantics of implicit generalization are quirky and poorly documented](https://coq.discourse.group/t/documentation-of-foo/129). - -Next, we can show that the integers `Z` form a group: -{% highlight coq %} -Instance Z_op: Op Z := Z.add. -Instance Z_unit: Unit Z := 0%Z. -Instance Z_inv: Inverse Z := Z.opp. - -Instance Z_semigroup: Semigroup Z. -Proof. split. intros. unfold op, Z_op. lia. Qed. -Instance Z_monoid: Monoid Z. -Proof. split; intros; unfold op, Z_op, unit, Z_unit; lia. Qed. -Instance Z_group: Group Z. -Proof. split; intros; unfold op, Z_op, unit, Z_unit, inv, Z_inv; lia. Qed. -{% endhighlight %} -We let `lia` do most of the proving, but we have to unfold some definitions first. - -We can also show that a product of two {semigroups, monoids, groups} is itself a {semigroup, monoid, group}: -{% highlight coq %} -Section prod. - Context {A B: Type}. - - Global Instance prod_op `{Op A, Op B}: Op (A * B) := - fun '(a1, b1) '(a2, b2) => (a1 + a2, b1 + b2). - Global Instance prod_unit `{Unit A, Unit B}: Unit (A * B) := - (unit, unit). - Global Instance prod_inv `{Inverse A, Inverse B}: Inverse (A * B) := - fun '(a, b) => (inv a, inv b). - - Global Instance prod_semigroup `{Semigroup A, Semigroup B}: Semigroup (A * B). - Proof. - split. intros [a1 b1] [a2 b2] [a3 b3]. unfold op, prod_op. - rewrite !assoc. reflexivity. - Qed. - Global Instance prod_monoid `{Monoid A, Monoid B}: Monoid (A * B). - Proof. - split; intros [a b]; unfold op, prod_op, unit, prod_unit; - rewrite ?id_l, ?id_r; reflexivity. - Qed. - Global Instance prod_group `{Group A, Group B}: Group (A * B). - Proof. - split; intros [a b]; unfold op, prod_op, unit, prod_unit, inv, prod_inv; - rewrite ?inv_l, ?inv_r; reflexivity. - Qed. -End prod. -{% endhighlight %} - -## Unexpected complexities - -Seems reasonable? -Unfortunately, we have set ourselves up for some unexpectedly large terms: -the size of an instance of something like `Group (Z*Z*Z*Z*Z*Z)` grows with the fifth power as the number of `Z` increase! - -{% highlight coq %} -Definition test: Group (Z*Z*Z*Z*Z*Z) := _. -Set Printing All. -Print test. -(* Lots of output. *) -{% endhighlight %} - -What is happening here? -We can see that when we try to construct the proof terms more explicitly. -We start by defining `Zn`, which is basically `Z^(n+1)`: a product of `n+1` times `Z`. -{% highlight coq %} -Fixpoint Zn (n: nat): Type := - match n with - | O => Z - | S n => @prod (Zn n) Z - end. -{% endhighlight %} -Unsurprisingly, this term has size linear in `n`: the recursive case adds `@prod • Z` around the term produced for `n-1` (represented by the `•`). -This means the term grows by a constant when `n` increases by 1, leading to linear complexity overall. - -Next, we define `Op (Zn n)`: -{% highlight coq %} -Fixpoint Zn_op (n: nat): Op (Zn n) := - match n with - | O => Z_op - | S n => @prod_op (Zn n) Z (Zn_op n) Z_op - end. -{% endhighlight %} -In the recursive case, we use `prod_op` as one might expect. -Coq could infer all the arguments here, but let us take a closer look at what it would infer: what is the size of `Zn_op` as `n` grows? -In the recursive case, we add `@prod_op (Zn n) Z • Z_op` around the term produced for `n-1`. -This term includes `Zn n`, which we established is linear in size. -When `n` increases by 1, the term grows by a linear amount. -Thus, the overall term size is quadratic in `n`! -Ouch. -I don't know about you, but I found this surprising when I realized it for the first time. - -The index basically repeats all the information that is already present in the recursive call: -to extend a `Zn_op` by one more element, we have to repeat the full type for which we already constructed an `Op`. - -But, you might say, quadratic complexity is not so bad. -Computers are fast, we can handle that. -Unfortunately, I got bad news: things get worse as our indices start nesting. -Let us look at the instance for `Semigroup (Zn n)`: -{% highlight coq %} -Fixpoint Zn_semigroup (n: nat): Semigroup (Zn n) := - match n return @Semigroup (Zn n) (Zn_op n) with - | O => Z_semigroup - | S n => @prod_semigroup (Zn n) Z (Zn_op n) (Zn_semigroup n) Z_op Z_semigroup - end. -{% endhighlight %} -How big is this term? -Well, the term added as `n` increases by 1 is `@prod_semigroup (Zn n) Z (Zn_op n) • Z_op Z_semigroup`. -This includes `Zn_op n`, which is quadratic -- so `Zn_semigroup` is cubic! - -`Zn_semigroup` has an `Op (Zn n)` as index, and again the full nested term describing the operator of both semigroups that we take the product of is getting repeated each time we add just one more `Z`. -So, while `Op (Zn n)` has an index that makes it grow quadratically, `Semigroup (Zn n)` *has an index that has an index*, and grows cubically. - -In general, to compute the complexity of our terms, we have to figure out *how nested* our indices are: with no indices (such as `Zn n` itself), the term is linear; for each extra nested index the power increases by 1.[^1] -If we look at `Group`, we see that it has a `Monoid` index which has a `Semigroup` index which has an `Op` index which has a `Type` index. -That's 4 nested indices, and hence the terms have size `O(n^5)`. -More generally speaking, the term size is *exponential* in the height of our hierarchy. -That hurts. - -[^1]: Note that the *number* of indices does not matter, like the fact that `Monoid` has both an `Op` and a `Unit` index. That just affects the constant factor. It's the nesting of indices that gets us here. - -## Conclusion - -We have seen that the number of nested indices of a typeclass has exponential impact on the size of proof terms that arise when deriving typeclass instances for compound types. -This effect has grinded the original Iris coq development back in 2015 to a halt: compiling the correctness proof of the logic itself took around 3.5 hours and needed more than the 4GB of RAM than my laptop had back then. - -In my view, this basically rules out the unbundled approach for any serious formalization effort. -But what are the alternatives? - -Instead of making superclasses indices, we could make them fields, as in: -{% highlight coq %} -Class Monoid (A: Type) `{Op A, Unit A} := { - monoid_semigroup :> Semigroup A; - id_r a : a + unit = a; - id_l a : unit + a = a; -}. -{% endhighlight %} - -This limits the "index depth" to 2, and thus limits term complexity to `O(n^3)`. -Still not great, but at least it doesn't grow further as we add more subclasses. -However, this scheme does not support diamonds in the hierarchy as instances will be duplicated. -Also, the instance we implicitly added by writing `:>`, namely that `Monoid A -> Semigroup A`, means we'll check all `Monoid` instances any time we just want a `Semigroup`. -This leads to exponential complexity when performing backtracking typeclass search: a `Semigroup (A*A)` can be obtained either by searching for a `Monoid (A*A)` or via `prod_semigroup`, so in case of backtracking Coq will go through a combinatorial explosion when trying all possible ways to derive `Semigroup (A*A*A*...)`.[^2] - -[^2]: In genaral, such combinatorial blowups happen very easily with backtracking, and they are hard to detect in advance. This is one reason why I believe that backtracking-per-default is a bad choice for proof search. - -Maybe there is a way to make the unbundled approach viable with some help from the proof assistant. -Some way to eliminate indices would help (similar to primitive projections, but also covering operations such as `prod_op`). -Or maybe the proof assistant could exploit the huge amounts of sharing that are present in these large terms. -If all subterms are properly shared, I think the overall term size remains linear. -Coq deduplicates term representation in memory, but (to my knowledge) does not exploit sharing when traversing the term (for unification or type-checking or any other purpose). -Defeating the exponential complexity here would require exploiting the sharing in *every* pass over the term. -An alternative to opportunistically discovering and exploiting sharing might be to [bake it more deeply into metavariable inference](https://eutypes.cs.ru.nl/eutypes_pmwiki/uploads/Meetings/Kovacs_slides.pdf). - -In Iris, we instead went with an adaptation of [this approach based on canonical structures](https://hal.inria.fr/hal-00816703v1/document). -This fully bundled approach avoids indices entirely and thus maintains a linear term size. -On the other hand, it comes with a lot of boilerplate code, and mixing canonical structures with typeclasses can be tricky to debug. (Unfortunately we cannot avoid typeclasses entirely as we need to have `Proper` instances in our algebraic classes.) -When composing types, Iris users have to write things like `prodC (listC natC) (optionC fracC)` instead of `(list nat) * (option frac)`, which is an additional barrier of entry to an already complicated development. -Also, I have to say I don't really have an intuitive grasp of canonical structure resolution, so when something does not work I don't even know how to start debugging. -On the other hand, the correctness proof of the logic *and* the general proof mode infrastructure *and* a few examples verified in the logic take less than five minutes to compile in our latest Coq formalization. -That's quite some speedup! - -All in all, I think the question of how to best represent algebraic hierarchies in a proof assistant is still wide open. -I am curious which ideas people will come up with! - -#### Footnotes