ralf/_drafts/typeclasses-exponential-blowup.md

   1 ---
   2 title: "Exponential blowup when using unbundled typeclasses to model algebraic hierarchies"
   3 categories: research coq
   4 ---
   5
   6 When formalizing a proof in an interactive theorem prover like [Coq](https://coq.inria.fr/), one reoccuring issue is the handling of algebraic hierarchies.
   7 Such hierarchies are everywhere:  some operations are associative, while others commute; some types have an equivalence relation, while others also have a (pre-)order or maybe even a well-ordering; and so on.
   8 So the question arises:  What is the best way to actually encode these hierarchies in Coq?
   9 Coq offers two mechanism that are suited to solve this task: typeclasses and canonical structures.
  10 Both can be instrumented in different ways to obtain a (more or less) convenient-to-use algebraic hierarchy.
  11 A common approach using typeclasses is the ["unbundled" approach by Bas Spitters and Eelis van der Weegen](http://www.eelis.net/research/math-classes/mscs.pdf).
  12 However as we learned the hard way in the Coq formalization of the [original Iris paper](https://iris-project.org/pdfs/2015-popl-iris1-final.pdf), this approach quickly leads to terms that seem to be exponential in size.
  13 In this post, I will explain the cause of this exponential blowup.
  14
  15 <!-- MORE -->
  16
  17 I should note that this observation is not new, it already occurs in [François Garillot's PhD thesis](https://pastel.archives-ouvertes.fr/pastel-00649586).
  18 The goal of this post is to provide a more self-contained presentation, not requiring all the context set up in that thesis.
  19
  20 ## Unbundled groups
  21
  22 A formalization of the algebraic hierarchy for groups using the unbundled approach could look as follows:
  23 {% highlight coq %}
  24 Require Import Lia ZArith.
  25
  26 (** The operations as typeclasses for overloading, without any axioms. *)
  27 Class Op (A: Type) := op : A -> A -> A.
  28 Infix "+" := op.
  29
  30 Class Unit (A: Type) := unit : A.
  31
  32 Class Inverse (A: Type) := inv : A -> A.
  33
  34 (** The algebraic classes adding the axioms, taking the operations as indices. *)
  35 Class Semigroup (A: Type) `{Op A} := {
  36   assoc a b c : (a + b) + c = a + (b + c);
  37 }.
  38
  39 Class Monoid (A: Type) `{Semigroup A, Unit A} := {
  40   id_r a : a + unit = a;
  41   id_l a : unit + a = a;
  42 }.
  43
  44 Class Group (A: Type) `{Monoid A, Inverse A} := {
  45   inv_r a : a + inv a = unit;
  46   inv_l a : inv a + a = unit;
  47 }.
  48 {% endhighlight %}
  49
  50 As usual for the unbundled approach, we end up with lots of tiny typeclasses.
  51 `Op`, `Unit` and `Inverse` serve as operators that can be overloaded.
  52 They each just take a type `A` as argument (we say they are *indexed* by the type: we can do typeclass search that, given the type, finds the instance).
  53 `Semigroup`, `Monoid` and `Group` supply the laws for these operators; they are indexed by the operators they involve and the typeclasses from which they "inherit".
  54 Taking the superclasses as indices instead of embedding them as fields simplifies typeclass search and is necessary to support diamonds in the hierarchy; see [the original paper](http://www.eelis.net/research/math-classes/mscs.pdf) for more details.
  55
  56 We use Coq's feature for anonymous implicit generalization: `` `{Semigroup A} `` adds not only a parameter of type `Semigroup A` that we don't have to name; furthermore all typeclass arguments of `Semigroup` (in this case, `Op A`) are also implicitly added as parameter.
  57 This avoids lots of repetition that the unbundled style would usually incur, [but the semantics of implicit generalization are quirky and poorly documented](https://coq.discourse.group/t/documentation-of-foo/129).
  58
  59 Next, we can show that the integers `Z` form a group:
  60 {% highlight coq %}
  61 Instance Z_op: Op Z := Z.add.
  62 Instance Z_unit: Unit Z := 0%Z.
  63 Instance Z_inv: Inverse Z := Z.opp.
  64
  65 Instance Z_semigroup: Semigroup Z.
  66 Proof. split. intros. unfold op, Z_op. lia. Qed.
  67 Instance Z_monoid: Monoid Z.
  68 Proof. split; intros; unfold op, Z_op, unit, Z_unit; lia. Qed.
  69 Instance Z_group: Group Z.
  70 Proof. split; intros; unfold op, Z_op, unit, Z_unit, inv, Z_inv; lia. Qed.
  71 {% endhighlight %}
  72 We let `lia` do most of the proving, but we have to unfold some definitions first.
  73
  74 We can also show that a product of two {semigroups, monoids, groups} is itself a {semigroup, monoid, group}:
  75 {% highlight coq %}
  76 Section prod.
  77   Context {A B: Type}.
  78
  79   Global Instance prod_op `{Op A, Op B}: Op (A * B) :=
  80     fun '(a1, b1) '(a2, b2) => (a1 + a2, b1 + b2).
  81   Global Instance prod_unit `{Unit A, Unit B}: Unit (A * B) :=
  82     (unit, unit).
  83   Global Instance prod_inv `{Inverse A, Inverse B}: Inverse (A * B) :=
  84     fun '(a, b) => (inv a, inv b).
  85
  86   Global Instance prod_semigroup `{Semigroup A, Semigroup B}: Semigroup (A * B).
  87   Proof.
  88     split. intros [a1 b1] [a2 b2] [a3 b3]. unfold op, prod_op.
  89     rewrite !assoc. reflexivity.
  90   Qed.
  91   Global Instance prod_monoid `{Monoid A, Monoid B}: Monoid (A * B).
  92   Proof.
  93     split; intros [a b]; unfold op, prod_op, unit, prod_unit;
  94       rewrite ?id_l, ?id_r; reflexivity.
  95   Qed.
  96   Global Instance prod_group `{Group A, Group B}: Group (A * B).
  97   Proof.
  98     split; intros [a b]; unfold op, prod_op, unit, prod_unit, inv, prod_inv;
  99       rewrite ?inv_l, ?inv_r; reflexivity.
 100   Qed.
 101 End prod.
 102 {% endhighlight %}
 103
 104 ## Unexpected complexities
 105
 106 Seems reasonable?
 107 Unfortunately, we have set ourselves up for some unexpectedly large terms:
 108 the size of an instance of something like `Group (Z*Z*Z*Z*Z*Z)` grows with the fifth power as the number of `Z` increase!
 109
 110 {% highlight coq %}
 111 Definition test: Group (Z*Z*Z*Z*Z*Z) := _.
 112 Set Printing All.
 113 Print test.
 114 (* Lots of output. *)
 115 {% endhighlight %}
 116
 117 What is happening here?
 118 We can see that when we try to construct the proof terms more explicitly.
 119 We start by defining `Zn`, which is basically `Z^(n+1)`: a product of `n+1` times `Z`.
 120 {% highlight coq %}
 121 Fixpoint Zn (n: nat): Type :=
 122   match n with
 123   | O => Z
 124   | S n => @prod (Zn n) Z
 125   end.
 126 {% endhighlight %}
 127 Unsurprisingly, this term has size linear in `n`: the recursive case adds `@prod • Z` around the term produced for `n-1` (represented by the `•`).
 128 This means the term grows by a constant when `n` increases by 1, leading to linear complexity overall.
 129
 130 Next, we define `Op (Zn n)`:
 131 {% highlight coq %}
 132 Fixpoint Zn_op (n: nat): Op (Zn n) :=
 133   match n with
 134   | O => Z_op
 135   | S n => @prod_op (Zn n) Z (Zn_op n) Z_op
 136   end.
 137 {% endhighlight %}
 138 In the recursive case, we use `prod_op` as one might expect.
 139 Coq could infer all the arguments here, but let us take a closer look at what it would infer: what is the size of `Zn_op` as `n` grows?
 140 In the recursive case, we add `@prod_op (Zn n) Z • Z_op` around the term produced for `n-1`.
 141 This term includes `Zn n`, which we established is linear in size.
 142 When `n` increases by 1, the term grows by a linear amount.
 143 Thus, the overall term size is quadratic in `n`!
 144 Ouch.
 145 I don't know about you, but I found this surprising when I realized it for the first time.
 146
 147 The index basically repeats all the information that is already present in the recursive call:
 148 to extend a `Zn_op` by one more element, we have to repeat the full type for which we already constructed an `Op`.
 149
 150 But, you might say, quadratic complexity is not so bad.
 151 Computers are fast, we can handle that.
 152 Unfortunately, I got bad news: things get worse as our indices start nesting.
 153 Let us look at the instance for `Semigroup (Zn n)`:
 154 {% highlight coq %}
 155 Fixpoint Zn_semigroup (n: nat): Semigroup (Zn n) :=
 156   match n return @Semigroup (Zn n) (Zn_op n) with
 157   | O => Z_semigroup
 158   | S n => @prod_semigroup (Zn n) Z (Zn_op n) (Zn_semigroup n) Z_op Z_semigroup
 159   end.
 160 {% endhighlight %}
 161 How big is this term?
 162 Well, the term added as `n` increases by 1 is `@prod_semigroup (Zn n) Z (Zn_op n) • Z_op Z_semigroup`.
 163 This includes `Zn_op n`, which is quadratic -- so `Zn_semigroup` is cubic!
 164
 165 `Zn_semigroup` has an `Op (Zn n)` as index, and again the full nested term describing the operator of both semigroups that we take the product of is getting repeated each time we add just one more `Z`.
 166 So, while `Op (Zn n)` has an index that makes it grow quadratically, `Semigroup (Zn n)` *has an index that has an index*, and grows cubically.
 167
 168 In general, to compute the complexity of our terms, we have to figure out *how nested* our indices are: with no indices (such as `Zn n` itself), the term is linear; for each extra nested index the power increases by 1.[^1]
 169 If we look at `Group`, we see that it has a `Monoid` index which has a `Semigroup` index which has an `Op` index which has a `Type` index.
 170 That's 4 nested indices, and hence the terms have size `O(n^5)`.
 171 More generally speaking, the term size is *exponential* in the height of our hierarchy.
 172 That hurts.
 173
 174 [^1]: Note that the *number* of indices does not matter, like the fact that `Monoid` has both an `Op` and a `Unit` index. That just affects the constant factor. It's the nesting of indices that gets us here.
 175
 176 ## Conclusion
 177
 178 We have seen that the number of nested indices of a typeclass has exponential impact on the size of proof terms that arise when deriving typeclass instances for compound types.
 179 This effect has grinded the original Iris coq development back in 2015 to a halt: compiling the correctness proof of the logic itself took around 3.5 hours and needed more than the 4GB of RAM than my laptop had back then.
 180
 181 In my view, this basically rules out the unbundled approach for any serious formalization effort.
 182 But what are the alternatives?
 183
 184 Instead of making superclasses indices, we could make them fields, as in:
 185 {% highlight coq %}
 186 Class Monoid (A: Type) `{Op A, Unit A} := {
 187   monoid_semigroup :> Semigroup A;
 188   id_r a : a + unit = a;
 189   id_l a : unit + a = a;
 190 }.
 191 {% endhighlight %}
 192
 193 This limits the "index depth" to 2, and thus limits term complexity to `O(n^3)`.
 194 Still not great, but at least it doesn't grow further as we add more sublcasses.
 195 However, this scheme does not support diamonds in the hierarchy as instances will be duplicated.
 196 Also, the instance we implicitly added by writing `:>`, namely that `Monoid A -> Semigroup A`, means we'll check all `Monoid` instances any time we just want a `Semigroup`.
 197 This leads to exponential complexity when performing backtracking typeclass search: a `Semigroup (A*A)` can be obtained either by searching for a `Monoid (A*A)` or via `prod_semigroup`, so in case of backtracking Coq will go through a combinatorial explosion when trying all possible ways to derive `Semigroup (A*A*A*...)`.[^2]
 198
 199 [^2]: In genaral, such combinatorial blowups happen very easily with backtracking, and they are hard to detect in advance. This is one reason why I believe that backtracking-per-default is a bad choice for proof search.
 200
 201 Maybe there is a way to make the unbundled approach viable with some help from the proof assistant.
 202 Some way to eliminate indices would help (similar to primitive projections, but also covering operations such as `prod_op`).
 203 Or maybe the proof assistant could exploit the huge amounts of sharing that are present in these large terms.
 204 If all subterms are properly shared, I think the overall term size remains linear.
 205 Coq deduplicates term representation in memory, but (to my knowledge) does not exploit sharing when traversing the term (for unification or type-checking or any other purpose).
 206 Defeating the exponential complexity here would require exploiting the sharing in *every* pass over the term.
 207 An alternative to opportunistically discovering and exploiting sharing might be to [bake it more deeply into metavariable inference](https://eutypes.cs.ru.nl/eutypes_pmwiki/uploads/Meetings/Kovacs_slides.pdf).
 208
 209 In Iris, we instead went with an adaptation of [this approach based on canonical structures](https://hal.inria.fr/hal-00816703v1/document).
 210 This fully bundled approach avoids indices entirely and thus maintains a linear term size.
 211 On the other hand, it comes with a lot of boilerplate code, and mixing canonical structures with typeclasses can be tricky to debug. (Unfortunately we cannot avoid typeclasses entirely as we need to have `Proper` instances in our algebraic classes.)
 212 When composing types, Iris users have to write things like `prodC (listC natC) (optionC fracC)` instead of `(list nat) * (option frac)`, which is an additional barrier of entry to an already complicated development.
 213 Also, I have to say I don't really have an intuitive grasp of canonical structure resolution, so when something does not work I don't even know how to start debugging.
 214 On the other hand, the correctness proof of the logic *and* the general proof mode infrastructure *and* a few examples verified in the logic take less than five minutes to compile in our latest Coq formalization.
 215 That's quite some speedup!
 216
 217 All in all, I think the question of how to best represent algebraic hierarchies in a proof assistant is still wide open.
 218 I am curious which ideas people will come up with!
 219
 220 #### Footnotes