Haskell for Maths

CHAs V: More Hopf Algebra morphisms

2012-06-10T18:42:00.000+01:00

Last time we looked at the descending tree morphism between the combinatorial Hopf algebras SSym and YSym with fundamental bases consisting of (indexed by) permutations and binary trees respectively. We previously also looked at a Hopf algebra QSym with a basis consisting of compositions.

There are also morphisms between SSym/YSym and QSym. However, before we look at these, we need to look at an alternative basis for QSym.

When I introduced QSym, I defined a type QSymM for the basis, without explaining what the M stands for. It actually stands for "monomial" (but I'm not going to explain why quite yet). Now, of course it is possible to construct any number of alternative bases for QSym, by taking linear combinations of the QSymM basis elements. However, most of these alternative bases are not likely to be very mathematically useful. (By mathematically useful, I mean, for example, that it leads to a simple expression for the multiplication rule.) When looking at the relation between QSym and SSym/YSym, there is another basis for QSym that leads to a clearer picture of their relationship, called the fundamental basis.

We will represent the fundamental basis by a new type:

newtype QSymF = QSymF [Int] deriving (Eq)

instance Ord QSymF where
    compare (QSymF xs) (QSymF ys) = compare (sum xs, xs) (sum ys, ys)

instance Show QSymF where
    show (QSymF xs) = "F " ++ show xs

qsymF :: [Int] -> Vect Q QSymF
qsymF xs | all (>0) xs = return (QSymF xs)
         | otherwise = error "qsymF: not a composition"

In a moment, I'll describe the relationship between the monomial and fundamental bases, but first, there's something I need to explain.

If the monomial and fundamental bases are bases for the same Hopf algebra (QSym), how can they be different types? So I think what it comes down to is that if we have different types then we get to have different Show instances. So we will be able to choose whether to view an element of QSym in terms of the monomial or the fundamental basis.

We could have achieved this in other ways, say by designating the monomial basis as the "true" basis, and then providing functions to input and output using the fundamental basis. Giving the fundamental basis its own type is more egalitarian: it puts the two bases on an equal footing.

Okay then, so in order to make this all work, we need to define the relationship between the two bases, and provide functions to convert between them. Let's take a look.

A (proper) refinement of a composition is any composition which can be obtained from the first composition by splitting one or more of the parts of the first composition.

refinements (x:xs) = [y++ys | y <- compositions x, ys <- refinements xs]
refinements [] = [[]]

> refinements [1,3]
[[1,1,1,1],[1,1,2],[1,2,1],[1,3]]

Then the fundamental basis can be expressed in terms of the monomial basis, as follows:

qsymFtoM :: (Eq k, Num k) => Vect k QSymF -> Vect k QSymM
qsymFtoM = linear qsymFtoM' where
    qsymFtoM' (QSymF alpha) = sumv [return (QSymM beta) | beta <- refinements alpha]

For example:

> qsymFtoM (qsymF [1,3])
M [1,1,1,1]+M [1,1,2]+M [1,2,1]+M [1,3]

Conversely, elements of the monomial basis can be expressed as sums of elements of the fundamental basis, as follows:

qsymMtoF :: (Eq k, Num k) => Vect k QSymM -> Vect k QSymF
qsymMtoF = linear qsymMtoF' where
    qsymMtoF' (QSymM alpha) = sumv [(-1) ^ (length beta - length alpha) *> return (QSymF beta) | beta <- refinements alpha]

> qsymMtoF (qsymM [1,3])
F [1,1,1,1]-F [1,1,2]-F [1,2,1]+F [1,3]

So we can input elements of QSym using either the monomial or fundamental basis (using the qsymM and qsymF constructors). Shortly, we'll define Algebra and Coalgebra instances for QSymF, so that we can perform arithmetic in either basis. Finally, we can output in either basis, by using the conversion functions if necessary.

How do we know that QSymF is a basis? How do we know that its elements are linearly independent, and span the space? In Vect Q QSymF, this is guaranteed by the free vector space construction. But what the question is really asking is, how do we know that the image of the "basis" QSymF in Vect Q QSymM (via qsymFtoM) is a basis?

Well, it will be linearly independent if qsymFtoM is injective, and spanning if qsymFtoM is surjective. So we require that qsymFtoM is bijective. This follows if we can show that qsymFtoM and qsymMtoF are mutual inverses. Well, quickCheck seems to think so:

> quickCheck (\x -> x == (qsymMtoF . qsymFtoM) x)
+++ OK, passed 100 tests.
> quickCheck (\x -> x == (qsymFtoM . qsymMtoF) x)
+++ OK, passed 100 tests.

(For the cognoscenti: The reason this works is that qsymMtoF' is the Mobius inversion of qsymFtoM' in the poset of compositions ordered by refinement.)

Okay, so we have an alternative basis for QSym as a vector space. What do the multiplication and comultiplication look like relative to this new basis? Now, it is possible to define the algebra, coalgebra and Hopf algebra structures explicitly in terms of the QSymF basis, but I'm going to cheat, and just round-trip via QSymM:

instance (Eq k, Num k) => Algebra k QSymF where
    unit x = x *> return (QSymF [])
    mult = qsymMtoF . mult . (qsymFtoM `tf` qsymFtoM)

instance (Eq k, Num k) => Coalgebra k QSymF where
    counit = unwrap . linear counit' where counit' (QSymF xs) = if null xs then 1 else 0
    comult = (qsymMtoF `tf` qsymMtoF) . comult . qsymFtoM

instance (Eq k, Num k) => Bialgebra k QSymF where {}

instance (Eq k, Num k) => HopfAlgebra k QSymF where
    antipode = qsymMtoF . antipode . qsymFtoM

(Recall that `tf` is the tensor product of linear maps.)

It's kind of obvious from the definitions that the algebra, coalgebra and Hopf algebra laws will be satisfied. (It's obvious because we already know that these laws are satisfied in Vect Q QSymM, and the definitions for Vect Q QSymF are just the same, but under the change of basis.) However, for additional confidence, we can for example:

> quickCheck (prop_Algebra :: (Q, Vect Q QSymF, Vect Q QSymF, Vect Q QSymF) -> Bool)
+++ OK, passed 100 tests.

Okay, so the reason for introducing the fundamental basis for QSym is that there is a Hopf algebra morphism from SSym to QSym, which is easiest to express in terms of their respective fundamental bases. Specifically, we can define a map between the bases, SSymF -> QSymF, which lifts (using fmap, ie using the free vector space functor) to a map between the Hopf algebras.

Given a permutation p of [1..n], a descent is an index i such that p(i) > p(i+1). For example, the permutation [2,3,5,1,6,4] has descents from the 5 to the 1 and from the 6 to the 4. We can think of the descents as splitting the permutation into segments, each of which is strictly ascending. Thus 235164 splits into 235-16-4. If we count the lengths of these segments, we get a composition, which I call the descent composition. Here's the code:

descentComposition [] = []
descentComposition xs = descComp 0 xs where
    descComp c (x1:x2:xs) = if x1 < x2 then descComp (c+1) (x2:xs) else (c+1) : descComp 0 (x2:xs)
    descComp c [x] = [c+1]

> descentComposition [2,3,5,1,6,4]
[3,2,1]

We can lift this map between the bases to a map between the Hopf algebras.

descentMap :: (Eq k, Num k) => Vect k SSymF -> Vect k QSymF
descentMap = nf . fmap (\(SSymF xs) -> QSymF (descentComposition xs))

Now, it turns out that this is a Hopf algebra morphism. That is, it commutes with the algebra, coalgebra and Hopf algebra structures.

> quickCheck (prop_AlgebraMorphism descentMap)
+++ OK, passed 100 tests.
> quickCheck (prop_CoalgebraMorphism descentMap)
+++ OK, passed 100 tests.
> quickCheck (prop_HopfAlgebraMorphism descentMap)
+++ OK, passed 100 tests.

Why does this work? Well, let's work through an example, for comultiplication. (In the following I omit brackets and commas for brevity.) If we do the descent map before the comultiplication, we get:

2341 (SSymF)
-> (descentMap)
31 (QSymF)
-> (qsymFtoM - sum of refinements)
31+211+121+1111 (QSymM)
-> (comult - deconcatenations)
[]⊗31 + 3⊗1 + 31⊗[] +
[]⊗211 + 2⊗11 + 21⊗1 + 211⊗[] +
[]⊗121 + 1⊗21 + 12⊗1 + 121⊗[] +
[]⊗1111 + 1⊗111 + 11⊗11 + 111⊗1 + 1111⊗[]
(QSymM⊗QSymM)

(We convert to QSymM at the second step because it's in QSymM that we know how to comultiply. It is possible to give an explicit expression for the comultiplication in terms of the QSymF basis, but I wanted to keep things simple.)

Conversely, if we do the comultiplication before the descent map:

2341 (SSymF)
-> (comult - flattened deconcatenations)
[]⊗2341 + 1⊗231 + 12⊗21 + 123⊗1 + 2341⊗[] (SSymF⊗SSymF)
-> (descentMap⊗descentMap)
[]⊗31 + 1⊗21 + 2⊗11 + 3⊗1 + 31⊗[] (QSymF⊗QSymF)
-> (qsymFtoM⊗qsymFtoM - sum of refinements)
[]⊗(31+211+121+1111) +
1⊗(21+111) +
(2+11)⊗11 +
(3+21+12+111)⊗1 +
(31+211+121+1111)⊗[]
(QSymM⊗QSymM)

The result comes out the same, whichever way round you go, as required. But why does it work? Well, you can imagine the inventor going through the following thought process:

Comult in SSymF is by flattened deconcatenations (of permutations), and in QSymM is by deconcatenations (of compositions). If we split a permutation at a descent, the descents on either side are preserved. So we could try sending a permutation in SSymF to its descent composition in QSymM. For example, ssymF [2,3,4,1] -> qsymM [3,1], which deconcatenates to [2,3,4]⊗[1] -> [3]⊗[1].
However, a deconcatenation in SSymF might split a permutation partway through an ascending segment. For example, [2,3,4,1] -> [2,3]⊗[4,1] (which flattens to [1,2]⊗[2,1]). Taking this to descent compositions would give [2]⊗[1,1]. This is not a deconcatenation of qsymM [3,1] - it is however a deconcatenation of [2,1,1], which is a one-step refinement of [3,1].
So we could try sending a permutation in SSymF to its descent composition in QSymM, and its one-step refinements. For example, ssymF [2,3,4,1] -> qsymM [3,1] + qsymM [2,1,1] + qsymM [1,2,1].
But now that means that [2,3]⊗[4,1] (flattening omitted for clarity) -> [2]⊗[1,1] + [1,1]⊗[1,1]. The second term is a deconcatenation of [1,1,1,1], a two-step refinement of [3,1].
It's pretty obvious that the way to make it all work out is to send a permutation in SSymF to its descent composition in QSymM, and all its proper refinements.
But this sum, of a composition and all its refinements (in QSymM) is just exactly how we defined the QSymF basis.

Exercise: Explain why descentMap commutes with mult.

Exercise: Last time we looked at a descendingTreeMap : SSym -> YSym. Show that the descentMap : SSym -> QSym factors through the descendingTreeMap, and describe the other factor f : YSym -> QSym.

CHAs IV: Hopf Algebra Morphisms

2012-04-30T21:31:00.000+01:00

In the last few posts, we've been looking at combinatorial Hopf algebras, specifically:
- SSym, a Hopf algebra with a basis of (indexed by) permutations
- YSym, with a basis of binary trees
- QSym, with a basis of compositions

It turns out that these three Hopf algebras are quite closely related (and indeed, there are a few others in the same "family"). Let's start with SSym and YSym.

Given a permutation p of [1..n], we can construct a binary tree called the descending tree as follows:
- Split the permutation as p = ls ++ [n] ++ rs
- Place n at the root of the tree, and recursively place the descending trees of ls and rs as the left and right children of the root
- To bottom out the recursion, the descending tree of the empty permutation is of course the empty tree

For example, the following diagram shows the descending tree of [3,5,1,4,2].

Here's the Haskell code:

descendingTree [] = E
descendingTree [x] = T E x E
descendingTree xs = T l x r
    where x = maximum xs
          (ls,_:rs) = L.break (== x) xs
          l = descendingTree ls
          r = descendingTree rs

> :m Math.Algebras.Structures Math.Combinatorics.CombinatorialHopfAlgebra
> descendingTree [3,5,1,4,2]
T (T E 3 E) 5 (T (T E 1 E) 4 (T E 2 E))

Now you'll recall that for the Hopf algebra YSym, although we sometimes carry around the node labels to help us see what is going on, we're really only interested in the shapes of the trees. So here's a function to erase the labels:

shape :: PBT a -> PBT ()
shape t = fmap (\_ -> ()) t

> shape $ T (T E 3 E) 5 (T (T E 1 E) 4 (T E 2 E))
T (T E () E) () (T (T E () E) () (T E () E))

Thus (shape . descendingTree) is a function from permutations to unlabelled trees. We can consider this as a map between the fundamental bases of SSym and YSym, which therefore induces a linear map between the corresponding Hopf algebras:

descendingTreeMap :: (Eq k, Num k) => Vect k SSymF -> Vect k (YSymF ())
descendingTreeMap = nf . fmap (\(SSymF xs) -> (YSymF . shape .  descendingTree) xs)

Now it turns out that this map is in fact a Hopf algebra morphism. What does that mean? Basically it means that descendingTreeMap plays nicely ("commutes") with the unit, mult, counit, comult, and antipode maps in the two Hopf algebras.

For example, for an algebra morphism, we require:
f (x1*x2) = f x1 * f x2
f . unit = unit

It's not immediately clear why descendingTreeMap should have these properties. The unit property is clear:
> descendingTreeMap 1 == 1

True

or put another way
> descendingTreeMap (ssymF []) == ysymF E

True

But what about the mult property?

Recall that in SSymF, we multiply permutations by shifting the right operand, and then summing all possible shuffles of the two lists:

> ssymF [3,2,1] * ssymF [1,2]
F [3,2,1,4,5]+F [3,2,4,1,5]+F [3,2,4,5,1]+F [3,4,2,1,5]+F [3,4,2,5,1]+F [3,4,5,2,1]+F [4,3,2,1,5]+F [4,3,2,5,1]+F [4,3,5,2,1]+F [4,5,3,2,1]

On the other hand, in YSymF, we multiply by multi-splitting the left operand, and then grafting the parts onto the leafs of the right operand, in all possible ways. The following diagram shows one possible multi-split-grafting, corresponding to one of the summands in [3,5,1,4,2] * [1,2]:

The numbers along the tops of the trees are the node labels generated by the descending tree construction. We can see from these that there is an exact correspondence between a shifted shuffle in SSymF and a multi-split grafting in YSymF. The asymmetry in the mult for YSymF, where we graft multi-splits of the left operand onto the right operand, corresponds to the asymmetry in the mult for SSymF, where we shift the right operand. This shifting in SSymF ensures that the nodes for the right operand end up at the root of the descending tree, as required by the grafting in YSymF. When we defined shuffling, we gave a recursive definition, but it's fairly clear that the grafting of the parts of the multi-split onto the right tree is accomplishing the same thing.

Similarly, a coalgebra morphism is a linear map which commutes with counit and comult:

In SSymF, counit is 1 on the empty permutation, 0 on anything else. In YSymF, counit is 1 on the empty tree, 0 on anything else. The descendingTreeMap sends the empty permutation to the empty tree, so it's clear that it commutes with counit in the required way.

What about comult? In SSymF, the comult of a permutation is the sum of its two-part deconcatenations (with each part flattened back to a permutation).

> comult $ ssymF [3,5,1,4,2]
(F [],F [3,5,1,4,2])+(F [1],F [4,1,3,2])+(F [1,2],F [1,3,2])+(F [2,3,1],F [2,1])+(F [2,4,1,3],F [1])+(F [3,5,1,4,2],F [])

In YSymF, the comult of a tree is the sum of its two-part splits. Now it's clear that flattening makes no difference to the descending tree. Then, it's also clear that if you take descending trees of the two parts of a deconcatenation, this corresponds to a split of the descending tree of the whole.

Finally, a Hopf algebra morphism is a bialgebra morphism which in addition commutes with antipode.

In SSymF and YSymF, we didn't give an explicit expression for the antipode, but instead derived it from mult and comult using the fact that they are both graded connected bialgebras. So it's actually kind of obvious that descendingTreeMap will be a Hopf algebra morphism, but just to check:

prop_HopfAlgebraMorphism f x = (f . antipode) x == (antipode . f) x

> quickCheck (prop_HopfAlgebraMorphism descendingTreeMap)
+++ OK, passed 100 tests.

Given any tree, there are many ways to label the nodes so that the tree is descending (ie such that the label on each child node is less than the label on its parent). For example, we could first label all the leaf nodes [1..k], and then all their immediate parents [k+1..], and so on. (For our example tree, this would lead to the alternative labelling [1,5,2,4,3].)

This shows that the descending tree map is surjective, but not injective. Hence YSym is a quotient of SSym.

CHAs III: QSym, a combinatorial Hopf algebra on compositions

2012-04-23T21:29:00.000+01:00

The compositions of a number n are the different ways that it can be expressed as an ordered sum of positive integers. For example, the compositions of 4 are 1+1+1+1, 1+1+2, 1+2+1, 2+1+1, 1+3, 2+2, 3+1, 4. Equivalently, we can forget about the plus signs, and just consider the ordered lists of positive integers that sum to n. Here's the Haskell code:

compositions 0 = [[]]
compositions n = [i:is | i <- [1..n], is <- compositions (n-i)]

> :m Math.Algebras.Structures Math.Combinatorics.CombinatorialHopfAlgebra
> compositions 4
[[1,1,1,1],[1,1,2],[1,2,1],[1,3],[2,1,1],[2,2],[3,1],[4]]

We will define a CHA QSym whose basis elements are (indexed by) compositions:

newtype QSymM = QSymM [Int] deriving (Eq)

instance Ord QSymM where
    compare (QSymM xs) (QSymM ys) = compare (sum xs, xs) (sum ys, ys)

instance Show QSymM where
    show (QSymM xs) = "M " ++ show xs

qsymM :: [Int] -> Vect Q QSymM
qsymM xs | all (>0) xs = return (QSymM xs)
         | otherwise = error "qsymM: not a composition"

We will use QSymM as the basis for a Hopf algebra, indexed by compositions. (In practice, we're going to stick with smallish compositions, as the calculations would take too long otherwise.) We form the free vector space over this basis. An element of the free vector space is a linear combination of compositions. For example:

> qsymM [1,2] + 2 * qsymM [3,1]

M [1,2]+2M [3,1]

The algebra structure on QSymM is similar to the algebra structure we saw last time on SSymM. Instead of shifted shuffles, we will use overlapping shuffles or quasi-shuffles. This means that when shuffling (x:xs) and (y:ys), then instead of just choosing between taking x first or taking y first, we also have a third choice to take them both and add them together.

quasiShuffles (x:xs) (y:ys) = map (x:) (quasiShuffles xs (y:ys)) ++
                              map ((x+y):) (quasiShuffles xs ys) ++
                              map (y:) (quasiShuffles (x:xs) ys)
quasiShuffles xs [] = [xs]
quasiShuffles [] ys = [ys]

For example:

> quasiShuffles [1,2] [3]
[[1,2,3],[1,5],[1,3,2],[4,2],[3,1,2]]

For our algebra structure, we say that the product of two compositions is the sum of all quasi-shuffles of the compositions:

instance (Eq k, Num k) => Algebra k QSymM where
    unit x = x *> return (QSymM [])
    mult = linear mult' where
        mult' (QSymM alpha, QSymM beta) = sumv [return (QSymM gamma) | gamma <- quasiShuffles alpha beta]

It's fairly obvious that this is associative and satisfies the algebra requirements. (And there are quickCheck tests in the package to confirm it.)

The coalgebra structure on QSymM is also similar to the coalgebra structure on SSymM. (We'll see in due course that SSym and QSym are closely related.) For the comultiplication of a composition, we'll just use the sum of the deconcatenations of the composition (without the flattening that we did with SSymM):

instance (Eq k, Num k) => Coalgebra k QSymM where
    counit = unwrap . linear counit' where counit' (QSymM alpha) = if null alpha then 1 else 0
    comult = linear comult' where
        comult' (QSymM gamma) = sumv [return (QSymM alpha, QSymM beta) | (alpha,beta) <- deconcatenations gamma]

For example:

> comult $ qsymM [1,2,3]
(M [],M [1,2,3])+(M [1],M [2,3])+(M [1,2],M [3])+(M [1,2,3],M [])

(Recall that (x,y) should be read as x⊗y.)

This comultiplication, along with those we have seen for other combinatorial Hopf algebras, is obviously coassociative - but perhaps I should spell out what that means. Coassociativity says that:
(comult⊗id) . comult = (id⊗comult) . comult

In other words, if you split a composition in two, and then split the left part in two - in all possible ways - then you get the same sum of possibilities as if you had done the same but splitting the right part in two. This is obvious, because it's just the sum of possible ways of splitting the composition in three (modulo associativity).

Then for a coalgebra we also require:
(id⊗counit) . comult = id = (counit⊗id) . comult

But this is clear. For example:
((id⊗counit) . comult) [1,2,3] =
(id⊗counit) ([]⊗[1,2,3] + [1]⊗[2,3] + [1,2]⊗[3] + [1,2,3]⊗[]) =
0 + 0 + 0 + [1,2,3]
(modulo the isomorphism C⊗k = C)

Then, as we did previously, we can quickCheck that the algebra and coalgebra structures are compatible, and hence that we have a bialgebra. (See the detailed explanation last time: mainly it comes down to the fact that deconcatenating quasi-shuffles is the same as quasi-shuffling deconcatenations.)

instance (Eq k, Num k) => Bialgebra k QSymM where {}

As before, this is a connected graded bialgebra in an obvious way (using the sum of the composition as the grading), and hence it automatically has an antipode. In this case we can give an explicit expression for the antipode. The coarsenings of a composition are the compositions which are "more coarse" than the given composition, in the sense that they can be obtained by combining two or more adjacent parts of the given composition.

coarsenings (x1:x2:xs) = map (x1:) (coarsenings (x2:xs)) ++ coarsenings ((x1+x2):xs)
coarsenings xs = [xs] -- for xs a singleton or null

For example:

> coarsenings [1,2,3]
[[1,2,3],[1,5],[3,3],[6]]

Then the antipode can be defined as follows:

instance (Eq k, Num k) => HopfAlgebra k QSymM where
    antipode = linear antipode' where
        antipode' (QSymM alpha) = (-1)^length alpha * sumv [return (QSymM beta) | beta <- coarsenings (reverse alpha)]

Why does this work? Remember that - in the case of a combinatorial Hopf algebra, where counit picks out the empty structure - antipode has to perform the disappearing trick of making everything cancel out.

That is, we require that
mult . (id⊗antipode) . comult = unit . counit
where the right hand side is zero for any non-empty composition.

Now, in order to try to figure out how it works, I decided to go through an example. However, in the cold light of day I have to admit that perhaps this is one of those times when maths is not a spectator sport. So the rest of this blog post is "optional".

Ok, so let's look at an example:

[1,2,3]

-> (comult = deconcatenations)

[]⊗[1,2,3] + [1]⊗[2,3] + [1,2]⊗[3] + [1,2,3]⊗[]

-> (id⊗antipode = reversal coarsening of right hand side, with alternating signs)

- []⊗([3,2,1] + [5,1] + [3,3] + [6])
+ [1]⊗([3,2] + [5])
- [1,2]⊗[3]
+ [1,2,3]⊗[]

-> (mult = quasi-shuffles)

- [3,2,1] - [5,1] - [3,3] - [6]
+ [1,3,2] + [4,2] + [3,1,2] + [3,3] + [3,2,1] + [1,5] + [6] + [5,1]
- [1,2,3] - [1,5] - [1,3,2] - [4,2] - [3,1,2]
+ [1,2,3]

= 0

(The way I like to think of this is:
- comult splits the atom into a superposition of fragment states
- antipode turns the right hand part of each fragment pair into a superposition of anti-matter states
- mult brings the matter and anti-matter parts together, causing them to annihilate
However, I think that is just fanciful - there's no flash of light - so I'll say no more.)

So why does it work? Well, in the final sum, the terms have to cancel out in pairs. If we look at some of the terms, it appears that there are three possibilities:

Case 1:

Consider the case of [4,2]. It arises in two ways in the final sum:
123 -> 1,23 -> 1,32 -> 42
123 -> 12,3 -> -12,3 -> -42

In both cases, the 1 and the 3 are combined during the quasi-shuffle phase to make a 4. In order to be combined in this way, they have to end up on opposite sides of the split during the deconcatenation phase. Because there is a 2 separating them, there are two ways this can happen, with the 2 ending up on either the left or right hand side of the split. And then the alternating signs ensure that the two outcomes cancel out at the end.

(In this case there was just one part separating them, but the same thing happens if there are more. For example:
1234 -> 1,234 -> -1,432 -> -532 + other terms
1234 -> 12,34 -> 12,43 -> 532 + 523
1234 -> 123,4 -> -123,4 -> -523)

Case 2:

Consider the case of [3,3]. This arises in two ways in the final sum:
123 -> [],123 -> -[],33 -> -33
123 -> 1,23 -> 1,32 -> 33

In this case it is the 1 and the 2 that combine. Unlike the 42 case, in this case the parts that combine are adjacent to one another at the beginning. In the top interaction, they combine during the reverse coarsening phase, in the bottom interaction, they combine during the quasi-shuffle phase. If x and y are adjacent, then the only way they can combine during the quasi-shuffle phase is if the split happens between them during the deconcatenation phase. This will always cancel with the term we get by splitting just before both x and y, and then combining them during coarsening.

wxyz -> w,xyz -> +/- w,z(y+x) -> {wz}(y+x)
wxyz -> wx,yz -> -/+ wx,zy -> {wz}(x+y)

Case 3:

Finally, it may happen that x and y don't combine. This can happen in two different ways, depending whether x and y are adjacentparts or not.

If they're not adjacent, then we get a failed case 1. For example, the [3,1,2] and [1,3,2] terms arise when the 1 and 3 fail to combine into a 4:
123 -> 1,23 -> 1,32 -> 132 + 312
123 -> 12,3 -> -12,3 -> -132-312

If they're adjacent, we get a failed case 2. For example, the [3,2,1] terms arise when the 1 and 2 fail to combine into a 3:
123 -> [],123 -> -[],321 -> -321
123 -> 1,23 -> 1,32 -> 321

This isn't quite a proof, of course. In a longer composition, there might be an opportunity for more than one of these cases to happen at the same time, and we need to show how that all works out. Also, I'm not sure that the [1,2,3] term is either a failed case 1 or a failed case 2. I hope at least though that this analysis has shed some light on why it works. (Exercise: Complete the proof.)

CHAs II: The Hopf Algebra SSym of Permutations

2012-03-25T21:55:00.001+01:00

Last time we looked at YSym, the (dual of the) Loday-Ronco Hopf algebra of binary trees. This time I want to look at SSym, the Malvenuto-Reutenauer Hopf algebra of permutations. (In due course we'll see that YSym and SSym are quite closely related.)

The fundamental basis for SSym is (indexed by) the permutations of [1..n] for n <- [0..]. As usual, we work in the free vector space over this basis. Hence a typical element of SSym might be something like [1,3,2] + 2 [4,1,3,2]. Here's some code:

newtype SSymF = SSymF [Int] deriving (Eq)

instance Ord SSymF where
    compare (SSymF xs) (SSymF ys) = compare (length xs, xs) (length ys, ys)

instance Show SSymF where
    show (SSymF xs) = "F " ++ show xs

ssymF :: [Int] -> Vect Q SSymF
ssymF xs | L.sort xs == [1..n] = return (SSymF xs)
         | otherwise = error "Not a permutation of [1..n]"
         where n = length xs

(The "F" in SSymF stands for fundamental basis: we may have cause to look at other bases in due course.)

Let's try it out:

$ cabal update
$ cabal install HaskellForMaths
$ ghci
> :m Math.Algebras.Structures Math.Combinatorics.CombinatorialHopfAlgebra
> ssymF [1,3,2] + 2 * ssymF [4,1,3,2]
F [1,3,2]+2F [4,1,3,2]

Ok, so how can we define an algebra structure on this basis? How can we multiply permutations? (We want to consider permutations as combinatorial rather than algebraic objects here. So no, the answer isn't the group algebra.) One possibility would be to use the following shifted concatenation operation:

shiftedConcat (SSymF xs) (SSymF ys) = let k = length xs in SSymF (xs ++ map (+k) ys)

For example:

> shiftedConcat (SSymF [1,2]) (SSymF [2,1,3])
F [1,2,4,3,5]

This has the required properties. It's associative:

> quickCheck (\x y z -> shiftedConcat (shiftedConcat x y) z == shiftedConcat x (shiftedConcat y z))
+++ OK, passed 100 tests.

And it's pretty obvious that the empty permutation, SSymF [], is a left and right identity. Hence we could form a monoid algebra using this operation.

However, for the Hopf algebra we're going to look at, we will use a slightly more complicated multiplication. We will retain the idea of shifting the second permutation, so that the two lists are disjoint. However, instead of just concatenating them, we will form the sum of all possible "shuffles" of the two lists. Here's the shuffle code:

shuffles (x:xs) (y:ys) = map (x:) (shuffles xs (y:ys)) ++ map (y:) (shuffles (x:xs) ys)
shuffles xs [] = [xs]
shuffles [] ys = [ys]

So shuffles takes two input "decks of cards", and it outputs all possible ways that they can be shuffled together, while preserving the order between cards from the same deck. For example:

> shuffles [1,2] [3,4,5]
[[1,2,3,4,5],[1,3,2,4,5],[1,3,4,2,5],[1,3,4,5,2],[3,1,2,4,5],[3,1,4,2,5],[3,1,4,5,2],[3,4,1,2,5],[3,4,1,5,2],[3,4,5,1,2]]

Notice how in each of the output shuffles, we have 1 before 2, and 3 before 4 before 5. This enables us to define an algebra structure on permutations as follows:

instance (Eq k, Num k) => Algebra k SSymF where
    unit x = x *> return (SSymF [])
    mult = linear mult'
        where mult' (SSymF xs, SSymF ys) =
                  let k = length xs
                  in sumv [return (SSymF zs) | zs <- shuffles xs (map (+k) ys)]

For example:

> ssymF [1,2] * ssymF [2,1,3]
F [1,2,4,3,5]+F [1,4,2,3,5]+F [1,4,3,2,5]+F [1,4,3,5,2]+F [4,1,2,3,5]+F [4,1,3,2,5]+F [4,1,3,5,2]+F [4,3,1,2,5]+F [4,3,1,5,2]+F [4,3,5,1,2]

It's clear that ssymF [] is indeed a left and right unit for this multiplication. It's also fairly clear that this multiplication is associative (because both shifting and shuffling are). Let's just check:

> quickCheck (prop_Algebra :: (Q, Vect Q SSymF, Vect Q SSymF, Vect Q SSymF) -> Bool)
+++ OK, passed 100 tests.

(The test code isn't exposed in the package, so you'll have to dig around in the source if you want to try this. It takes a minute or two, because every time we multiply we end up with lots of terms.)

What about a coalgebra structure? As I mentioned last time, in a combinatorial Hopf algebra, the comultiplication is usually a sum of the different ways to take apart our combinatorial object into two parts. In this case, we take a permutation apart by "deconcatenating" it (considered as a list) into two pieces. This is like cutting a deck of cards:

deconcatenations xs = zip (inits xs) (tails xs)

For example:

> deconcatenations [2,3,4,1]
[([],[2,3,4,1]),([2],[3,4,1]),([2,3],[4,1]),([2,3,4],[1]),([2,3,4,1],[])]

However, most of those parts are no longer permutations of [1..n] (for any n), because they are missing some numbers. In order to get back to permutations, we need to "flatten" each part:

flatten xs = let mapping = zip (L.sort xs) [1..]
        in [y | x <- xs, let Just y = lookup x mapping]

For example:

> flatten [3,4,1]
[2,3,1]

Putting the deconcatenation and flattening together we get the following coalgebra definition:

instance (Eq k, Num k) => Coalgebra k SSymF where
    counit = unwrap . linear counit' where counit' (SSymF xs) = if null xs then 1 else 0
    comult = linear comult'
        where comult' (SSymF xs) = sumv [return (SSymF (flatten us), SSymF (flatten vs))
                                        | (us, vs) <- deconcatenations xs]

For example:

> comult $ ssymF [2,3,4,1]
(F [],F [2,3,4,1])+(F [1],F [2,3,1])+(F [1,2],F [2,1])+(F [1,2,3],F [1])+(F [2,3,4,1],F [])

(Recall that the result should be read as F[]⊗F[2,3,4,1] + F[1]⊗F[2,3,1] + ...)

It's fairly clear that this comultiplication is coassociative. The counit properties are equally straightforward. Hence:

> quickCheck (prop_Coalgebra :: Vect Q SSymF -> Bool)
+++ OK, passed 100 tests.

Is it a bialgebra? Do the algebra and coalgebra structures commute with one another? In previous posts I've been a bit hand-wavy about this, so let's take a short time out to look at what this actually means. For example, what does it mean for mult and comult to commute? Well, it means that the following diagram commutes:

But hold on a sec - what is the mult operation on (B⊗B)⊗(B⊗B) - is B⊗B even an algebra? And similarly, what is the comult operation on B⊗B - is it even a coalgebra?

Yes they are. Given any algebras A and B, we can define an algebra structure on A⊗B via (a1⊗b1) * (a2⊗b2) = (a1*a2)⊗(b1*b2). In code:

instance (Eq k, Num k, Ord a, Ord b, Algebra k a, Algebra k b) => Algebra k (Tensor a b) where
    unit x = x *> (unit 1 `te` unit 1)
    mult = (mult `tf` mult) . fmap (\((a,b),(a',b')) -> ((a,a'),(b,b')) )

Similarly, given coalgebras A and B we can define a coalgebra structure on A⊗B as follows:

instance (Eq k, Num k, Ord a, Ord b, Coalgebra k a, Coalgebra k b) => Coalgebra k (Tensor a b) where
    counit = unwrap . linear counit'
        where counit' (a,b) = (wrap . counit . return) a * (wrap . counit . return) b
    comult = nf . fmap (\((a,a'),(b,b')) -> ((a,b),(a',b')) ) . (comult `tf` comult)

(Recall that those pesky wrap and unwrap calls are the isomorphisms k <-> Vect k (). What the counit definition really says is that counit (a⊗b) = counit a * counit b.)

Notice how in both the mult and comult definitions, we have to swap the middle two terms of the fourfold tensor product over, in order to have something of the right type.

So how does this work for SSym? Well, we start in the top left with SSym⊗SSym. You can think of that as two decks of cards.

If we go along the top and down the right, then we first shuffle the two decks together (in all possible ways), and then deconcatenate the results (in all possible ways).

If we go down the left and along the bottom, then we first deconcatenate each deck independently (in all possible ways), and then shuffle the first parts of both decks together and separately shuffle the second parts together (in all possible ways).

You can kind of see that this is going to lead to the same result. It's just saying that it doesn't matter whether you shuffle before you cut or cut before you shuffle. (There's also shifting and flattening going on, of course, but it's clear that doesn't affect the result.)

For a bialgebra, we require that each of unit and mult commutes with each of counit and comult - so four conditions in all. The other three are much easier to verify, so I'll leave them as an exercise.

Just to check:

> quickCheck (prop_Bialgebra :: (Q, Vect Q SSymF, Vect Q SSymF) -> Bool)
+++ OK, passed 100 tests.

Okay so what about a Hopf algebra structure? Is there an antipode operation?

Well, recall that when we looked last time at YSym, we saw that it was possible to give a recursive definition of the antipode map. This isn't always possible. The reason it was possible for YSym was:
- We saw that the comultiplication of a tree t is a sum of terms u⊗v, where with the exception of the term 1⊗t, all the other terms have a smaller tree on the right hand side.
- We saw that the counit is 1 on the smallest tree, and 0 otherwise.

It turns out that these are instances of a more general concept, of a graded and connected coalgebra.
- A graded vector space means that there is a concept of the size or degree of the basis elements. For YSym, the degree was the number of nodes in the tree. A graded coalgebra means that the comultiplication respects the degree, in the sense that if comult t is a sum of terms u⊗v, then degree u + degree v = degree t.
- A connected coalgebra means that the counit is 1 on the degree zero piece, and 0 otherwise.

(There is a basis-independent way to explain this, for the purists.)

Now, SSym is also a graded connected coalgebra:
- the degree of SSymF xs is simply length xs. Comultiplication respects the degree.
- counit is 1 on the degree zero piece, 0 otherwise.

Hence we can again give a recursive definition for the antipode:

instance (Eq k, Num k) => HopfAlgebra k SSymF where
    antipode = linear antipode'
        where antipode' (SSymF []) = return (SSymF [])
              antipode' x@(SSymF xs) = (negatev . mult . (id `tf` antipode) . removeTerm (SSymF [],x) . comult . return) x

For example:

> antipode $ ssymF [1,2,3]
-F [3,2,1]
> antipode $ ssymF [1,3,2]
-F [2,1,3]-F [2,3,1]+F [3,1,2]
> antipode $ ssymF [2,1,3]
-F [1,3,2]+F [2,3,1]-F [3,1,2]

It is possible to give an explicit expression for the antipode. However, it's a little complicated, and I haven't got around to coding it yet.

Let's just check:

> quickCheck (prop_HopfAlgebra :: Vect Q SSymF -> Bool)
+++ OK, passed 100 tests.

However, this isn't quite satisfactory. I'd like a little more insight into what the antipode actually does.

Recall that the definition requires that (mult . (id ⊗ antipode) . comult) == (unit . counit).

The left hand side is saying:
- first cut (deconcatenate) the deck of cards into two parts (in all possible ways)
- next apply the antipode to just one of the two parts
- finally shuffle the two parts together (in all possible ways)

The right hand side, remember, sends the empty permutation SSymF [] to itself, and every other permutation to zero.

So what this is saying is, you cut a (non-empty) deck of cards, wave a wand over one part, shuffle the two parts together again, and the cards all disappear!

Or is it? No, not quite. Remember that comult (resp. mult) is a sum of all possible deconcatenations (resp. shuffles). So what this is saying is that the antipode arranges things so that when you sum over all possible deconcatenations and shuffles, they cancel each other out. Cool!

(This sounds like it might have something to do with renormalization in physics, where we want to get a bunch of troublesome infinities to cancel each other out. There is apparently a connection between Hopf algebras and renormalization, but I don't know if this is it. Unfortunately my understanding of physics isn't up to figuring this all out.)

So we've now seen how to define Hopf algebra structures on two different sets of combinatorial objects: trees and permutations. We'll see that these two Hopf algebras are actually quite closely related, and that there is something like a family of combinatorial Hopf algebras with interesting connections to one another.

My main source for this article was Aguiar, Sottile, Structure of the Malvenuto-Reutenauer Hopf algebra of permutations.

Combinatorial Hopf Algebras I: The Hopf Algebra YSym of Binary Trees

2012-03-18T19:21:00.001+00:00

Last time we looked at the definition of Hopf algebras, using the group algebra as a motivating example. This time I want to look at YSym, a Hopf algebra of binary trees. This is an example of a Combinatorial Hopf Algebra (CHA), meaning a Hopf algebra defined over some combinatorial object such as partitions, compositions, permutations, trees.

The binary trees we're concerned with are the familiar binary trees from computer science. In the math literature on this, however, they're called (rooted) planar binary trees. As far as I can tell, that's because in math, a tree means a simple graph with no cycles. So from that point of view, a CS binary tree is in addition rooted - it has a distinguished root node - and planar - so that you can distinguish between left and right child nodes.

Here's a Haskell type for (rooted, planar) binary trees:

data PBT a = T (PBT a) a (PBT a) | E deriving (Eq, Show, Functor)



instance Ord a => Ord (PBT a) where

    ...

As a convenience, the trees have labels at each node, and PBT is polymorphic in the type of the labels. The labels aren't really necessary: the Hopf algebra structure we're going to look at depends only on the shapes of the trees, not the labels. However, it will turn out to be useful to be able to label them, to see more clearly what is going on.

There's more than one way to set up a Hopf algebra structure on binary trees, so we'll use a newtype wrapper to identify which structure we're using.

newtype YSymF a = YSymF (PBT a) deriving (Eq, Ord, Functor)



instance Show a => Show (YSymF a) where

    show (YSymF t) = "F(" ++ show t ++ ")"



ysymF :: PBT a -> Vect Q (YSymF a)

ysymF t = return (YSymF t)

(The "F" in YSymF signifies that we're looking at the fundamental basis. We may have occasion to look at other bases later.)

So as usual, we're going to work in the free vector space on this basis, Vect Q (YSymF a), consisting of linear combinations of binary trees. Here's what a typical element might look like:

$ cabal update

$ cabal install HaskellForMaths

$ ghci

> :m Math.Algebras.Structures Math.Combinatorics.CombinatorialHopfAlgebra

> ysymF (E) + 2 * ysymF (T (T E () E) () E)

F(E)+2F(T (T E () E) () E)

Ok, so how do we multiply two binary trees together? Well actually, let's look at comultiplication first, because it's a little easier to explain. In CHAs (combinatorial Hopf algebras), the comultiplication is often a sum of the different ways of taking a combinatorial structure apart into two pieces. In the case of binary trees, we take them apart by splitting down the middle, starting at a leaf and continuing down to the root. Each node that we pass through goes to the side which has both its branches.

The diagram shows one possible split of the tree. Each leaf of the tree gives rise to a split. Hence the possible splits are:

> mapM_ print $ splits $ T (T E 1 E) 2 (T (T E 3 E) 4 (T E 5 E))

(E,T (T E 1 E) 2 (T (T E 3 E) 4 (T E 5 E)))

(T E 1 E,T E 2 (T (T E 3 E) 4 (T E 5 E)))

(T (T E 1 E) 2 E,T (T E 3 E) 4 (T E 5 E))

(T (T E 1 E) 2 (T E 3 E),T E 4 (T E 5 E))

(T (T E 1 E) 2 (T (T E 3 E) 4 E),T E 5 E)

(T (T E 1 E) 2 (T (T E 3 E) 4 (T E 5 E)),E)

The definition of splits is wonderfully simple:

splits E = [(E,E)]

splits (T l x r) = [(u, T v x r) | (u,v) <- splits l] ++ [(T l x u, v) | (u,v) <- splits r]

We use this to define a coalgebra structure on the vector space of binary trees as follows:

instance (Eq k, Num k, Ord a) => Coalgebra k (YSymF a) where

    counit = unwrap . linear counit' where counit' (YSymF E) = 1; counit' (YSymF (T _ _ _)) = 0

    comult = linear comult'

        where comult' (YSymF t) = sumv [return (YSymF u, YSymF v) | (u,v) <- splits t]

In other words, the counit is the indicator function for the empty tree, and the comult sends a tree t to the sum of u⊗v for all splits (u,v).

> comult $ ysymF $ T (T E 1 E) 2 (T E 3 E)

(F(E),F(T (T E 1 E) 2 (T E 3 E)))+(F(T E 1 E),F(T E 2 (T E 3 E)))+(F(T (T E 1 E) 2 E),F(T E 3 E))+(F(T (T E 1 E) 2 (T E 3 E)),F(E))

Let's just check that this satisfies the coalgebra conditions:

> quickCheck (prop_Coalgebra :: Vect Q (YSymF ()) -> Bool)

+++ OK, passed 100 tests.

[I should say that although the test code is included in the HaskellForMaths package, it is not part of the exposed modules, so if you want to try this you will have to fish around in the source.]

Multiplication is slightly more complicated. Suppose that we are trying to calculate the product of trees t and u. Suppose that u has k leaves. Then we look at all possible "multi-splits" of t into k parts, and then graft the parts onto the leaves of u (in order). This is probably easiest explained with a diagram:

The diagram shows just one possible multi-split, but the multiplication is defined as the sum over all possible multi-splits. Here's the code:

multisplits 1 t = [ [t] ]

multisplits 2 t = [ [u,v] | (u,v) <- splits t ]

multisplits n t = [ u:ws | (u,v) <- splits t, ws <- multisplits (n-1) v ]



graft [t] E = t

graft ts (T l x r) = let (ls,rs) = splitAt (leafcount l) ts

                     in T (graft ls l) x (graft rs r)



instance (Eq k, Num k, Ord a) => Algebra k (YSymF a) where

    unit x = x *> return (YSymF E)

    mult = linear mult'

        where mult' (YSymF t, YSymF u) = sumv [return (YSymF (graft ts u)) | ts <- multisplits (leafcount u) t]

For example:

> ysymF (T (T E 1 E) 2 E) * ysymF (T E 3 E)

F(T (T (T E 1 E) 2 E) 3 E)+F(T (T E 1 E) 3 (T E 2 E))+F(T E 3 (T (T E 1 E) 2 E))

It's fairly clear that the empty tree E is a left and right identity for this multiplication. It seems plausible that the multiplication is also associative - let's just check:

> quickCheck (prop_Algebra :: (Q, Vect Q (YSymF ()), Vect Q (YSymF ()), Vect Q (YSymF ())) -> Bool)

+++ OK, passed 100 tests.

So we've defined both algebra and coalgebra structures on the free vector space of binary trees. For a bialgebra, the algebra and coalgebra structures need to satisfy compatibility conditions: roughly, the multiplication and comultiplication need to commute, ie comult (mult x y) == mult (comult x) (comult y); plus similar conditions involving unit and counit.

Given the way they have been defined, it seems plausible that the structures are compatible (roughly, because it doesn't matter whether you split before or after grafting), but let's just check:

> quickCheck (prop_Bialgebra :: (Q, Vect Q (YSymF ()), Vect Q (YSymF ())) -> Bool)

+++ OK, passed 100 tests.

This entitles us to declare a Bialgebra instance:

instance (Eq k, Num k, Ord a) => Bialgebra k (YSymF a) where {}

(The Bialgebra class doesn't define any "methods". So this is just a way for us to declare in the code that we have a bialgebra. For example, we could write functions which require Bialgebra as a context.)

Finally, for a Hopf algebra, we need an antipode operation. Recall that an antipode must satisfy the following diagram:

In particular,

mult . (id ⊗ antipode) . comult = unit . counit

Let's assume that an antipode for YSym exists, and see what we can deduce about it. The right hand side of the above equation is the function that takes a linear combination of trees, and drops everything except the empty tree. Informally:
(unit . counit) E = E
(unit . counit) (T _ _ _) = 0

For example:

> (unit . counit) (ysymF E) :: Vect Q (YSymF ())

F(E)

> (unit . counit) (ysymF (T E () E)) :: Vect Q (YSymF ())

0

Since we also know that comult E = E⊗E, and mult E⊗E = E, it follows that antipode E = E.

Now what about the antipode of non-empty trees? We know that
comult t = E⊗t + ... + t⊗E
where the sum is over all splits of t.

Hence
((id ⊗ antipode) . comult) t = E⊗(antipode t) + ... + t⊗(antipode E)
and
(mult . (id ⊗ antipode) . comult) t = E * antipode t + ... + t * antipode E
where the right hand side of each multiplication symbol has antipode applied.

Now, E is the identity for multiplication of trees, so
(mult . (id ⊗ antipode) . comult) t = antipode t + ... + t * antipode E

Now, the Hopf algebra condition requires that (mult . (id ⊗ antipode) . comult) = (unit . counit). And we saw that for a non-empty tree, (unit . counit) t = 0. Hence:

antipode t + ... + t * antipode E = 0

Notice that all the terms after the first involve the antipodes of trees "smaller" than t (ie with fewer nodes). As a consequence, we can use this equation as the basis of a recursive definition of the antipode. We recurse through progressively smaller trees, and the recursion terminates because we know that antipode E = E. Here's the code:

instance (Eq k, Num k, Ord a) => HopfAlgebra k (YSymF a) where

    antipode = linear antipode'

        where antipode' (YSymF E) = return (YSymF E)

              antipode' x = (negatev . mult . (id `tf` antipode) . removeTerm (YSymF E,x) . comult . return) x

For example:

> antipode $ ysymF (T E () E)

-F(T E () E)

> antipode $ ysymF (T (T E () E) () E)

F(T E () (T E () E))



> quickCheck (prop_HopfAlgebra :: Vect Q (YSymF ()) -> Bool)

+++ OK, passed 100 tests.

It is also possible to give an explicit definition of the antipode (exercise: find it), but I thought it would be more illuminating to do it this way.

It's probably silly, but I just love being able to define algebraic structures on pictures (ie of trees).

Incidentally, I understand that there is a Hopf algebra structure similar to YSym underlying Feynman diagrams in physics.

I got the maths in the above mainly from Aguiar, Sottile, Structure of the Loday-Ronco Hopf algebra of trees. Thanks to them and many other researchers for coming up with all this cool maths, and for making it freely available online.

What is a Hopf algebra?

2012-03-03T18:49:00.003+00:00

A while ago I looked at the concepts of an algebra and a coalgebra, and showed how to represent them in Haskell. I was intending to carry on to look at bialgebras and Hopf algebras, but I realised that I wasn't sufficiently clear in my own mind about the motivation for studying them. So, I confess that I have been keeping this blog just about afloat with filler material, while behind the scenes I've been writing loads of code - some fruitful, some not - to figure out how to motivate the concept of Hopf algebra.

Some concepts in mathematics have an obvious motivation at the outset: groups arise from thinking about (eg geometric) symmetry, (normed) vector spaces are a representation of physical space. (Both concepts turn out to have a usefulness that goes far beyond the initial intuitive motivation.) With Hopf algebras this doesn't really appear to be the case.

Hopf algebras appear to have been invented for narrow technical reasons to solve a specific problem, and then over time, it has turned out both that they are far more widespread and far more useful than was initially realised. So for me, there are two motivations for looking at Hopf algebras:
- There are some really interesting structures that are Hopf algebras. I'm particularly interested in combinatorial Hopf algebras, which I hope to cover in this blog in due course. But there are also examples in physics, associated with Lie algebras, for example.
- They have some interesting applications. I'm interested in the applications to knot theory (which unfortunately are a little technical, so I'll have to summon up some courage if I want to go over them in this blog). But again, there seem to be applications to physics, such as renormalization in quantum field theory (which I don't claim to understand, btw).

As a gentle introduction to Hopf algebras, I want to look at the group Hopf algebra, which has some claim to be the fundamental example, and provides a really good anchor for the concept.

So last time we looked at the group algebra - the free k-vector space of k-linear combinations of elements of a group G. We saw that many elements in the group algebra have multiplicative inverses, and we wrote code to calculate them. However, if you look again at that code, you'll see that it doesn't actually make use of group inverses anywhere. The code really only relies on the fact that finite groups are finite monoids. We could say that actually, the code we wrote is for finding inverses in finite monoid algebras. (But of course it just so happens that all finite monoids are groups.)

[Edit: Oops - that last claim in brackets is not true. Thanks to reddit readers for pointing it out.]

So the group algebra is an algebra, indeed a monoid algebra, but with the special property that the basis elements (the group elements) all have multiplicative inverses.

Now, mathematicians always prefer, if possible, to come up with definitions that are basis-independent. (In HaskellForMaths, we've been working with free vector spaces, which encourages us to think of vector spaces in terms of a particular basis. But many interesting properties of vector spaces are true regardless of the choice of basis, so it is better to find a way to express them that doesn't involve the basis.)

Can we find a basis-independent way to characterise this special property of the group algebra?

Well, our first step has to be to find a way to talk about the group inverse without having to mention group elements. We do that by encapsulating the group inverse in a linear map, which we call the antipode:

antipode x = nf (fmap inverse x)

So in other words the antipode operation just inverts each group element in a group algebra element (a k-linear combination of group elements). (The nf call just puts the vector in normal form.) For example:

> antipode $ p [[1,2,3]] + 2 * p [[2,3,4]]
[[1,3,2]]+2[[2,4,3]]

The key point about the antipode is that we now have a linear map on the group algebra, rather just an operation on the group alone. Although we defined the antipode in terms of a particular basis, linear maps fundamentally don't care about the basis. If you choose some other basis for the group algebra, I can tell you how the antipode transforms your basis elements.

Ok, so what we would like to do is find a way to express the group inverse property (ie the property that every group element has an inverse) as a property of the group algebra, in terms of the antipode.

Well, no point beating about the bush. Define:

trace (V ts) = sum [x | (g,x) <- ts]
diag = fmap (\g -> (g,g))

(So in maths notation, diag is the linear map which sends g to g⊗g.)

Note that these are both linear maps. In particular, once again, although we have defined them in terms of our natural basis (the group elements), as linear maps they don't really care what basis we choose.

Then it follows from the group inverse property that the following diagram commutes:

Why will it commute? Well, think about what happens to a group element, going from left to right:
- Going along the top, we have g -> g⊗g -> g⊗g^-1 -> g*g^-1 = 1
- Going along the middle, we have g -> 1 -> 1
- Going along the bottom, we have g -> g⊗g -> g^-1⊗g -> g^-1*g = 1
All of the maps are linear, so they extend from group elements to arbitrary k-linear combinations of group elements.

(Shortly, I'll demonstrate that it commutes as claimed, using a quickcheck property.)

We're nearly there. We've found a way to express the special property of the group algebra, in a basis-independent way. We can say that what is special about the group algebra is that there is a linear antipode map, such that the above diagram commutes.

[In fact I think it may be true that if you have a monoid algebra such that the above diagram commutes, then it follows that the monoid is a group. This would definitely be true if we constrained antipode to be of the form (fmap f), for f a function on the monoid, but I'm not absolutely sure that it's true if antipode is allowed to be an arbitrary linear function.]

Now, the concept of a Hopf algebra is just a slight generalization of this.

Observe that trace and diag actually define a coalgebra structure:
- diag is clearly coassociative
- the left and right counit properties are also easy to check

So we can define:


instance (Eq k, Num k) => Coalgebra k (Permutation Int) where

    counit = unwrap . linear counit' where counit' g = 1  -- trace

    comult = fmap (\g -> (g,g))                           -- diagonal

(In fact, trace and diag define a coalgebra structure on the free vector space over any set. Of course, some free vector spaces also have other more interesting coalgebra structures.)

Let's just quickcheck:

> quickCheck (prop_Coalgebra :: GroupAlgebra Q -> Bool)
+++ OK, passed 100 tests.

So for the definition of a Hopf algebra, we allow the antipode to be defined relative to other coalgebra structures besides the trace-diag structure (with some restrictions to be discussed later). So a Hopf algebra is a vector space having both an algebra and a coalgebra structure, such that there exists an antipode map that makes the following diagram commute:

Thus we can think of Hopf algebras as a generalisation of the group algebra. As we'll see (in future posts), there are Hopf algebras with rather more intricate coalgebra structures and antipodes than the group algebra.

Here's a Haskell class for Hopf algebras:


class Bialgebra k b => HopfAlgebra k b where

    antipode :: Vect k b -> Vect k b

(A bialgebra is basically an algebra plus coalgebra - but with one more condition that I'll explain in a minute.)

We've already seen the antipode for the group algebra, but here it is again in its proper home, as part of a Hopf algebra instance:


instance (Eq k, Num k) => HopfAlgebra k (Permutation Int) where

    antipode = nf . fmap inverse

And here's a quickCheck property:


prop_HopfAlgebra x =

    (unit . counit) x == (mult . (antipode `tf` id) . comult) x &&

    (unit . counit) x == (mult . (id `tf` antipode) . comult) x



> quickCheck (prop_HopfAlgebra :: GroupAlgebra Q -> Bool)

+++ OK, passed 100 tests.

So there you have it. That's what a Hopf algebra is.

Except that I've cheated slightly. What I've defined so far is actually only a weak Hopf algebra. There is one other condition that is needed, called the Hopf compatibility condition. This requires that the algebra and coalgebra structures are "compatible" in the following sense:
- counit and comult are algebra morphisms
- unit and mult are coalgebra morphisms

I don't want to dwell too much on this. It seems a pretty reasonable requirement, although other compatibility conditions are possible (eg Frobenius algebras). An algebra plus coalgebra satisfying these conditions (even if it doesn't have an antipode) is called a bialgebra. And it turns out that the group algebra is one.

> quickCheck (prop_Bialgebra :: (Q, GroupAlgebra Q, GroupAlgebra Q) -> Bool)
+++ OK, passed 100 tests.

Introducing the Group Algebra

2012-02-10T21:23:00.000+00:00

Here's an interesting example of an algebra.

Given a group, form the free vector space on the elements of the group. For example, if g and h are elements of the group, then the following are some elements of the free vector space:
- g + h
- 1 + 2*g
- 2 + g*h + h/3

It's pretty obvious how to define an algebra structure on this vector space:
- the unit is 1, the identity element of the group
- the multiplication is the multiplication in the group, lifted to the vector space by linearity.

So for example:
(1 + 2g)(g + h/3) = g + 2g^2 + h/3 + 2gh/3

This is called the group algebra. (It's a special case of the monoid algebra construction that we looked at previously.) Given some particular field k and group G, it is usually written as kG.

How can we represent this in Haskell? Well in HaskellForMaths, we already have code for working in permutation groups, and code for forming free vector spaces. So it's fairly straightforward:


module Math.Algebras.GroupAlgebra where



-- ... imports ...



instance (Eq k, Num k) => Algebra k (Permutation Int) where

    unit x = x *> return 1

    mult = nf . fmap (\(g,h) -> g*h)



type GroupAlgebra k = Vect k (Permutation Int)



p :: [[Int]] -> GroupAlgebra Q

p = return . fromCycles

Then for example we can do the following:

$ cabal update
$ cabal install HaskellForMaths
$ ghci
> :m Math.Core.Utils Math.Algebras.GroupAlgebra
> (1 + p[[1,2,3],[4,5]])^2
1+2[[1,2,3],[4,5]]+[[1,3,2]]

(Actually, in HaskellForMaths <= 0.4.3, the first term will be shown as [] instead of 1. That's just a "bug" in the Show instance, which I have a fix for in the next release.)

For reference, in a maths book, the same result would be written:
( 1 + (1 2 3)(4 5) )^2 = 1 + 2(1 2 3)(4 5) + (1 3 2)

So I guess one thing to point out is that in effect this code defines the group algebra for the group of all permutations of the integers. In practice however, we can always think of ourselves as working in some finite subgroup of this group. For example, if we want to work in the group of symmetries of a square, generated by a rotation (1 2 3 4) and a reflection (1 2)(3 4), then we just need to consider only sums of the eight elements in the generated group.

Another thing to point out is that this code could easily be modified to allow permutations over an arbitrary type, since that is supported by the underlying permutation code.

So what is this group algebra then? What sort of thing is it, and how should one think about it?

Well, first, as an algebra, it has zero divisors. For example:

> (1+p[[1,2]])*(1-p[[1,2]])
0

However, a lot of the elements aren't zero divisors, and whenever they're not, they have inverses. The group elements themselves have inverses of course, but so do many sums of group elements. For example:

> (1+p[[1,2,3]])^-1
1/2-1/2[[1,2,3]]+1/2[[1,3,2]]
> (1+2*p[[1,2,3]])^-1
1/9-2/9[[1,2,3]]+4/9[[1,3,2]]

Just to check:

> (1+p[[1,2,3]]) * (1-p[[1,2,3]]+p[[1,3,2]])
2
> (1+2*p[[1,2,3]]) * (1-2*p[[1,2,3]]+4*p[[1,3,2]])
9

How do we calculate the inverses? Well it's quite clever actually. Let's work through an example. Suppose we want to find an inverse for
x = 1+2*p[[1,2]]+3*p[[1,2,3]]
The inverse, if it exists, will be a linear combination of elements of the group generated by 1, p[[1,2]] and p[[1,2,3]]. So it will be a sum
y = a*1 + b*p[[1,2]] + c*p[[1,3]] + d*p[[2,3]] + e*p[[1,2,3]] + f*p[[1,3,2]]
By supposition x*y = 1, so
(1+2*p[[1,2]]+3*p[[1,2,3]]) * (a*1+b*p[[1,2]]+c*p[[1,3]]+d*p[[2,3]]+e*p[[1,2,3]]+f*p[[1,3,2]]) =
1 + 0*p[[1,2]] + 0*p[[1,3]] + 0*p[[2,3]] + 0*p[[1,2,3]] + 0*p[[1,3,2]]

If we multiply out and equate coefficients, we will get a linear system in a,b,c,d,e,f. Something like:


 a+2b      +3e    = 1 (coefficients of 1)

2a+ b+3c          = 0 (coefficients of p[[1,2]])

       c+3d+2e    = 0 (coefficients of p[[1,3]])

   3b   + d   +2f = 0 (coefficients of p[[2,3]])

etc

So we just solve the linear system to find a,b,c,d,e,f.

Here's the code:


newtype X a = X a deriving (Eq,Ord,Show)




instance HasInverses (GroupAlgebra Q) where

    inverse x@(V ts) =

        let gs = elts $ map fst ts -- all elements in the group generated by the terms

            n = length gs

            y = V $ zip gs $ map (glexvar . X) [1..n] -- x1*1+x2*g2+...+xn*gn

            x' = V $ map (\(g,c) -> (g, unit c)) ts -- lift the coefficients in x into the polynomial algebra

            one = x' * y

            m = [ [coeff (mvar (X j)) c | j <- [1..n]] | i <- gs, let c = coeff i one] -- matrix of the linear system

            b = 1 : replicate (n-1) 0

        in case solveLinearSystem m b of -- find v such that m v == b

            Just v -> nf $ V $ zip gs v

            Nothing -> error "GroupAlgebra.inverse: not invertible"

I won't explain it in detail. I'll just remark that this is one of the places where HaskellForMaths shows its power. We happen to have a type for polynomials lying around. They're a Num instance, so we can use them as the coefficients in GroupAlgebra k. We can create new variables x1, x2, ... (using glexvar . X), lift field elements into the polynomial algebra (using unit), multiply x and y in the group algebra, extract the coefficients into a matrix, and solve the linear system.

I should point out that unfortunately, since this method involves solving a linear system in |G| variables, it's only going to be efficient for small groups.

So what is the group algebra useful for? Well actually quite a lot. It's fundamental to the study of representation theory - representing groups as matrices. It's also used for "Fourier analysis of groups" - though I don't know much about that. But those will have to wait for another time.

New release of HaskellForMaths

2011-11-12T21:24:00.001+00:00

I've just uploaded a new version v0.4.1 of HaskellForMaths, containing three new modules and a couple of other improvements. The additions are as follows:

Math.Algebras.Quaternions

This module was already present: it defines the quaternion algebra on the basis {1,i,j,k}, where multiplication is defined by:
i^2 = j^2 = k^2 = ijk = -1

This is enough information to figure out the full multiplication table. For example:
ijk = -1
=> (ijk)k = -k
=> ij(kk) = -k (associativity of multiplication)
=> ij = k
It turns out that the basis elements i,j,k anti-commute in pairs, eg ij = -ji, etc.

In this release I've added a couple of new things.

First, the quaternions are a division algebra, so I've added a Fractional instance.

Specifically, we can define a conjugation operation on the quaternions (similar to complex conjugation) via
conj (w+xi+yj+zk) = w-xi-yj-zk
Then we can define a quadratic norm via
sqnorm q = q * conj q = w^2+x^2+y^2+z^2
Since the norm is always a scalar, we can define a multiplicative inverse by
q^-1 = conj q / sqnorm q

For example:

$ cabal install HaskellForMaths
$ ghci
> :m Math.Algebras.Quaternions
> (2*i+3*j)^-1 :: Quaternion Q
-2/13i-3/13j

(If you leave out the type annotation, you'll be working in Quaternion Double.)

Second, the quaternions have an interesting role in 3- and 4-dimensional geometry.

Given any non-zero quaternion q, the map x -> q^-1 x q turns out to be a rotation of the 3-dimensional space spanned by {i,j,k}. To multiply rotations together (ie do one then another), just multiply the quaternions. This turns out to be a better way to represent rotations than 3*3 matrices:
- It's more compact - four scalars rather than nine
- They're faster to multiply - 16 scalar multiplications versus 27
- It's more robust against rounding error - whatever quaternion you end up with will still represent a rotation, whereas a sequence of matrix multiplications of rotations might not be quite a rotation any more, due to rounding error.

If you're curious, the function reprSO3 converts a quaternion to the corresponding 3*3 matrix:

> reprSO3 (1+2*i) :: [[Q]]
[[1,0,0],[0,-3/5,-4/5],[0,4/5,-3/5]]

(Exercise: Figure out why we got this matrix.)

Quaternions can also be used to represent rotations of 4-dimensional space - see the documentation.

Math.Algebras.Octonions

This is a new module, providing an implementation of the 8-dimensional non-associative division algebra of octonions. I follow Conway's notation [1], so the octonions have basis {1,e0,e1,e2,e3,e4,e5,e6}, with multiplication defined by:
e_i * e_i = -1, for i in [0..6]
e_i+1 * e_i+2 = e_i+4, where the indices are taken modulo 7.

The octonions are not associative, but they are an inverse loop, so they satisfy x^-1(xy) = y = (yx)x^-1. This is enough to enable us to deduce the full multiplication table from the relations above.

Like the quaternions, the octonions have conjugation and a norm, and multiplicative inverses:

> :l Math.Algebras.Octonions
> (2+i0+2*i3)^-1
2/9-1/9i0-2/9i3

The octonions are an exceptional object in mathematics: there's nothing else quite like them. They can be used to construct various other exceptional objects, such as the root lattice E8, or the Lie group G2. Hopefully I'll be able to cover some of that stuff in a future installment.

[1] Conway and Smith, On Quaternions and Octonions

Math.NumberTheory.Prime

The main function in this module is isPrime :: Integer -> Bool, which tells you whether a number is prime or not. It's implemented using the Miller-Rabin test.

The basic idea of the test is:
- If p is prime, then Zp is a field
- In a field, the equation x^2 = 1 has only two solutions, 1 and -1
- Given an arbitrary b coprime to p, we know from Fermat's little theorem that b^(p-1) = 1 (mod p)
- So if p-1 = q * 2^s, with q odd, then either b^q = 1 (mod p), or there is some r, 0 <= r < s with b^(q*2^r) = -1 (mod p)

The idea of the algorithm is to try to show that p isn't prime by trying to find a b where the above is not true. We take several different values of b at random, and repeatedly square b^q, to see whether we get -1 or not.

The advantage of the Miller-Rabin test, as compared to trial division say, is that it has a fast running time even for very large numbers. For example:

> :m Math.NumberTheory.Prime
> :set +s
> nextPrime $ 10^100
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000267
(0.09 secs, 14904632 bytes)

The potential disadvantage of the Miller-Rabin test is that it is probabilistic: There is a very small chance (1 in 10^15 in this implementation) that it could just fail to hit on a b which disproves n's primeness, so that it would say n is prime when it isn't. In practice, at those odds it's not worth worrying about.

Math.NumberTheory.Factor

The main function in this module is pfactors :: Integer -> [Integer], which returns the prime factors of a number (with multiplicity). It uses trial division to try to find prime factors less than 10000. After that, it uses the elliptic curve method to try to split what remains. The elliptic curve method relies on some quite advanced maths, but the basic idea is this:
- If p is a prime, then Zp is a field
- Given a field, we can do "arithmetic on elliptic curves" over the field.
- So to factor n, pretend that n is prime, try doing arithmetic on elliptic curves, and wait till something goes wrong.
- It turns out that if we look at what went wrong, we can figure out a non-trivial factor of n

Here it is in action:

> :m Math.NumberTheory.Factor
> pfactors $ 10^30+5
[3,5,4723,1399606163,10085210079364883]
(0.55 secs, 210033624 bytes)
> pfactors $ 10^30+6
[2,7,3919,758405810021,24032284101871]
(2.31 secs, 883504748 bytes)

I love the way it can crunch through 12-digit prime factors with relative ease.

Commutative Algebra and Algebraic Geometry

2011-09-18T19:22:00.000+01:00

Last time we saw how to create variables for use in polynomial arithmetic. This time I want to look at some of the things we can do next.

First, let's define the variables we are going to use:

> :l Math.CommutativeAlgebra.GroebnerBasis
> let [t,u,v,x,y,z,x',y',z'] = map glexvar ["t","u","v","x","y","z","x'","y'","z'"]

So now we can do arithmetic in the polynomial ring Q[t,u,v,x,y,z,x',y',z']. For example:

> (x+y)^2
x^2+2xy+y^2

The branch of mathematics dealing with the theory of polynomial rings is called commutative algebra, and it was "invented" mainly in support of algebraic geometry. Algebraic geometry is roughly the study of the curves, surfaces, etc that arise as the solution sets of polynomial equations. For example, the solution-set of the equation x^2+y^2=1 is the unit circle.

If we are given any polynomial equation f = g, then we can rewrite it more conveniently as f-g = 0. In other words, we only need to track individual polynomials, rather than pairs of polynomials. Call the solution set of f = 0 the zero-set of f.

Sometimes we're interested in the intersection of two or more curves, surfaces, etc. For example, it is well known that the hyperbola, parabola and ellipse all arise as "conic sections" - that is, as the intersection of a cone with a plane. So define the zero-set of a collection (or system) of polynomials to be the set of points which are zeros of all the polynomials simultaneously. For example, the zero-set of the system [x^2+y^2-z^2, z-1] is the unit circle x^2+y^2=1 situated on the plane z=1.

Okay, so how can commutative algebra help us to investigate curves and surfaces? Is there a way for us to "do geometry by doing algebra"? Well, first, what does "doing geometry" consist of? Well, at least some of the following:
- Looking at the shapes of curves and surfaces
- Looking at intersections, unions, differences and products of curves and surfaces
- Looking at when curves or surfaces can be mapped into or onto other curves or surfaces
- Looking at when two different curves or surfaces are equivalent, in some sense (for example, topologically equivalent)

(That phrase "curves and surfaces" is not only clumsy but also inaccurate, so from now on I'll use the proper term, "variety", for the zero-set of a system of polynomials, whether it's a set of isolated points, a curve, a surface, some higher dimensional thing, or a combination of some of the preceding.)

So can we do all those things using algebra? Well, let's have a go.

Let's start by looking at intersections and unions of varieties (remember, that's just the fancy name for curves, surfaces, etc.).

Well, we've already seen how to do intersections. If a variety V1 is defined by a system of polynomials [f1...fm], and a variety V2 is defined by [g1...gn], then their intersection is defined by the system [f1...fm,g1...gn] - the zero-set of both sets of polynomials simultaneously. We'll call this the "sum" of the systems of polynomials. (Note to the cognoscenti: yes, I'm really talking about ideals here.)

sumI fs gs = gb (fs ++ gs)

Don't worry too much about what that "gb" (Groebner basis) call is doing. Let's just say that it's choosing the best way to represent the system of polynomials. For example:

> sumI [x^2+y^2-z^2] [z-1]
[x^2+y^2-1,z-1]

Notice how the gb call has caused the first polynomial to be simplified slightly. The same variety might arise as the zero-set of many different systems of polynomials. That's something that we're going to need to look into - but later.

Okay, so what about unions of varieties. So suppose V1 is defined by [f1...fm], V2 is defined by [g1...gn]. A point in their union is in either V1 or V2, so it is in the zero-set of either [f1...fm] or [g1...gn]. So how about multiplying the polynomials together in pairs. That is, let's look at the system [fi*gj | fi <- fs, gj <- gs]. Call the zero-set of this system V. Then clearly, any point in either V1 or V2 is in V, since we then know that either all the fs or all the gs vanish at that point, and hence so do all the products. Conversely, suppose that some point is not in the union of V1 and V2. Then there must exist some fi, and some gj, which are non-zero at that point. Hence there is an fi*gj which is non-zero, so the point is not in V.

This construction is called, naturally enough, the product of the systems of polynomials.

productI fs gs = gb [f * g | f <- fs, g <- gs]

> productI [x^2+y^2-z^2] [z-1]
[x^2z+y^2z-z^3-x^2-y^2+z^2]

Just in case you're still a little unsure, let's confirm that a few arbitrary points in the union are in the zero-set of this polynomial:

> eval (x^2*z+y^2*z-z^3-x^2-y^2+z^2) [(x,100),(y,-100),(z,1)]
0
> eval (x^2*z+y^2*z-z^3-x^2-y^2+z^2) [(x,3),(y,4),(z,5)]
0

The first expression evaluates the polynomial at the point (100,-100,1), an arbitrary point on the plane z=1. The second evaluates at (3,4,5), an arbitrary point on the cone x^2+y^2=z^2. Both points are in the zero-set of our product polynomial.

Since we're in the neighbourhood, let's have a look at the other conic sections. First, let's rotate our coordinate system by 45 degrees, using the substitution x'=x+z, z'=z-x. (Okay, so this also scales - to save us having to handle a sqrt 2 factor.)

> let cone' = subst (x^2+y^2-z^2) [(x,(x'-z')/2),(y,y'),(z,(x'+z')/2)]
> cone'
-x'z'+y'^2

In these coordinates, the intersection of the cone with the plane z'=1 is the parabola x'=y'^2:

> sumI [cone'] [z'-1]
[y'^2-x',z'-1]

Alternatively, the intersection with the plane y'=1 is the hyperbola x'z'=1:

> sumI [cone'] [y'-1]
[x'z'-1,y'-1]

Okay, so we've made a start on seeing how to do geometry by doing algebra, by looking at unions and intersections of varieties. There's still plenty more to do. We mustn't forget that we have some unfinished business: we need to understand when different polynomial systems can define the same variety, and in what sense the gb (Groebner basis) function finds the "best" representation. That will have to wait for another time.

Incidentally, for the eval and subst functions that I used above, you will need to take the new release HaskellForMaths v0.4.0. In this release I also removed the older commutative algebra modules, so I revved the minor version number.

Commutative Algebra in Haskell, part 1

2011-09-04T21:11:00.000+01:00

Once again, it's been a little while since my last post, and once again, my excuse is partly that I've been too busy writing code.

I've just uploaded a new release, HaskellForMaths 0.3.4, which contains the following new modules:

Math.Core.Utils - this is a collection of utility functions used throughout the rest of the library. I've belatedly decided that it's better to put them all in one place rather than scattered here and there throughout other modules.

Math.Core.Field - this provides new, more efficient implementations of several finite fields. There already were implementations of these finite fields, in the Math.Algebra.Field.Base and ...Extension modules, as discussed here and here. However, that code was written to make the maths clear, rather than for speed. This new module is about speed. For the prime power fields in particular (eg F4, F8, F9), these implementations are significantly faster.

Math.Combinatorics.Matroid - Matroids are a kind of combinatorial abstraction of the concept of linear independence. They're something that I heard about years ago - both of my favourite combinatorics books have brief introductions - but I never bothered to follow up. Well anyway, so something finally piqued my curiosity, and I got Oxley's Matroid Theory. It turned out to be really interesting stuff, and this module is pretty much a translation of a large part of that book into Haskell code, written as I taught myself all about matroids.

Math.CommutativeAlgebra.Polynomial - Although I hadn't yet got around to discussing them in the blog, HaskellForMaths has always had modules for working with multivariate polynomials, namely Math.Algebra.Commutative.Monomial and ...MPoly. However, these were some of the earliest code I wrote, before my more recent free vector space and algebra code. So I saw an opportunity to simplify and improve this code, by building it on top of the free vector space code. Also, I'm trying to rationalise the module naming convention in HaskellForMaths, to more closely follow the categories used in arxiv.org or mathoverflow.net . In the long run, I expect this module to supercede the older modules.

Math.CommutativeAlgebra.GroebnerBasis - Again, there was already code for Groebner bases in Math.Algebra.Commutative.GBasis. This is pretty much the same code, ported to the new polynomial implementation, but I've also begun to build on this, with code to find the sum, product, intersection, and quotient of ideals.

So the matroid code was just new code that I wrote while teaching myself some new maths. But most of the other code comes from an ambition to organise and simplify the HaskellForMaths library. I've also been trying to improve the documentation.

My ultimate ambition is to get more people using the library. To do that, the structure of the library needs to be clearer, the documentation needs to be better, and I need to explain how to use it. So I thought I'd start by explaining how to use the new commutative algebra modules.

(So this is a bit of a digression from the series on quantum algebra that I've been doing the last few months. However, in terms of the cumulative nature of maths, it's probably better to do this first.)

Okay, so suppose we want to do some polynomial arithmetic. Well, first we need to create some variables to work with. How do we do that?

First, decide on a monomial ordering - that is, we need to decide in what order monomials are to be listed within a polynomial. For the moment, let's use "graded lexicographic" or Glex order. This says that you should put monomials of higher degree before those of lower degree (eg y^3 before x^2), and if two monomials have the same degree, you should use lexicographic (dictionary) order (eg xyz before y^3).

Next, decide on a field to work over. Most often, we'll want to work over Q, the rationals.

Then, our variables themselves can be of any Haskell type - but there are usually only two sensible choices:

The easiest way is to use String as the type for our variables.

Then we could make some variables like this:

> :l Math.CommutativeAlgebra.Polynomial
> let [x,y,z] = map glexvar ["x","y","z"]

And then we can do polynomial arithmetic:

> (x+y+z)^3
x^3+3x^2y+3x^2z+3xy^2+6xyz+3xz^2+y^3+3y^2z+3yz^2+z^3

If we want to use any other field besides Q, then we will have to use a type annotation to tell the compiler which field we're working over:

> let [x,y,z] = map var ["x","y","z"] :: [GlexPoly F3 String]
> (x+y+z)^3
x^3+y^3+z^3

The alternative to using String for our variables is to define our own type. For example



data Var = X | Y | Z | W deriving (Eq,Ord)



instance Show Var where

    show X = "x"

    show Y = "y"

    show Z = "z"

    show W = "w"



[x,y,z,w] = map glexvar [X,Y,Z,W]

So there you have it - now you can do polynomial arithmetic in Haskell.

So how does it work?

Well, fundamentally, k-polynomials are a free k-vector space on the basis of monomials. So we define a type to implement monomials:



data MonImpl v = M Int [(v,Int)] deriving (Eq)

-- The initial Int is the degree of the monomial. Storing it speeds up equality tests and comparisons



instance Show v => Show (MonImpl v) where

    show (M _ []) = "1"

    show (M _ xis) = concatMap (\(x,i) -> if i==1 then showVar x else showVar x ++ "^" ++ show i) xis

        where showVar x = filter ( /= '"' ) (show x) -- in case v == String

Notice that our monomial implementation is polymorphic in v, the type of the variables.

Next, monomials form a monoid, so we make them an instance of Mon (the HaskellForMaths class for monoids):



instance (Ord v) => Mon (MonImpl v) where

    munit = M 0 []

    mmult (M si xis) (M sj yjs) = M (si+sj) $ addmerge xis yjs

In principle, all we need to do now is define an Ord instance, and then an Algebra instance, using the monoid algebra construction.

However, for reasons that will become clear in future postings, we want to be able to work with various different orderings on monomials, such as Lex, Glex, or Grevlex. So we provide various newtype wrappers around this basic monomial implementation. Here's the code for the Glex ordering that we used above:



newtype Glex v = Glex (MonImpl v) deriving (Eq, Mon) -- GeneralizedNewtypeDeriving



instance Show v => Show (Glex v) where

    show (Glex m) = show m



instance Ord v => Ord (Glex v) where

    compare (Glex (M si xis)) (Glex (M sj yjs)) =

        compare (-si, [(x,-i) | (x,i) <- xis]) (-sj, [(y,-j) | (y,j) <- yjs])



type GlexPoly k v = Vect k (Glex v)



glexvar :: v -> GlexPoly Q v

glexvar v = return $ Glex $ M 1 [(v,1)]



instance (Num k, Ord v, Show v) => Algebra k (Glex v) where

    unit x = x *> return munit

    mult xy = nf $ fmap (\(a,b) -> a `mmult` b) xy

We also have similar newtypes for Lex and Grevlex orderings, which I'll discuss another time.

And that's pretty much it. Now that we have an instance of Algebra k (Glex v), we get a Num instance for free, so we get +, -, *, and fromInteger. That means we can enter expressions like the following:

> (2*x^2-y*z)^2
4x^4-4x^2yz+y^2z^2

Note that division is not supported: you can't write x/y, for example. However, as a convenience, I have defined a partial instance of Fractional, which does let you divide by scalars. That means that it's okay to write x/2, for example.

Next time, some more things you can do with commutative algebra.

The Tensor Algebra Monad

2011-07-10T20:09:00.001+01:00

It's been a little while since my last post. That's partly because I've been busy writing new code. I've put up a new release, HaskellForMaths 0.3.3, which contains three new modules:
- Math.Combinatorics.Digraph
- Math.Combinatorics.Poset
- Math.Combinatorics.IncidenceAlgebra

I'll go through their contents at some point, but this time I want to talk about the tensor algebra.

So recall that previously we defined the free vector space over a type, tensor products, algebras and coalgebras in Haskell code.

In HaskellForMaths, we always work with the free vector space over a type: that means, we take some type b as a basis, and form k-linear combinations of elements of b. This construction is represented by the type Vect k b.

Given two vector spaces A = Vect k a, B = Vect k b, we can form their tensor product A⊗B = Vect k (Tensor a b). So Tensor is a type constructor on basis types, which takes basis types a, b for vector spaces A, B, and returns a basis type for the tensor product A⊗B.

We also defined a type constructor DSum, which returns a basis type for the direct sum A⊕B.

Now, we saw that tensor product is a monoid (at the type level, up to isomorphism):
- it is associative: (A⊗B)⊗C is isomorphic to A⊗(B⊗C)
- it has a unit: the field k itself is an identity for tensor product, in the sense that k⊗A is isomorphic to A, is isomorphic to A⊗k

Given some specific vector space V, we can consider the tensor powers of V:
k, V, V⊗V, V⊗V⊗V, ...
(We can omit brackets in V⊗V⊗V because tensor product is associative.)

And indeed we can form their direct sum:
T(V) = k ⊕ V ⊕ V⊗V ⊕ V⊗V⊗V ⊕ ...
(where an element of T(V) is understood to be a finite sum of elements of the tensor powers.)

This is a vector space, since tensor products and direct sums are vector spaces. If V has a basis e1,e2,e3,..., then a typical element of T(V) might be something like 3 + 5e2 + 2e1⊗e3⊗e1.

Now the interesting thing is that T(V) can be given the structure of an algebra, as follows:
- for the unit, we use the injection of k into the first direct summand
- for the mult, we use tensor product

For example, we would have
e2 * (2 + 3e1 + e4⊗e2) = 2e2 + 3e2⊗e1 + e2⊗e4⊗e2

With this algebra structure, T(V) is called the tensor algebra.

So how should we represent the tensor algebra in HaskellForMaths? Suppose that V is the free vector space Vect k a over some basis type a. (Recall also that the field k itself can be represented as the free vector space Vect k () over the unit type.) Can we use the DSum and Tensor type constructors to build the tensor algebra? Something like:
Vect k (DSum () (DSum a (DSum (Tensor a a) (DSum ...))))

Hmm, that's not going to work - we can't build the whole of what we want that way. (Unless some type system wizard knows otherwise?) So instead of representing the direct sum and tensor product at the type level, we're going to have to do it at the value level. Here's the definition:

data TensorAlgebra a = TA Int [a] deriving (Eq,Ord)

Given the free vector space V = Vect k a over basis type a, then TensorAlgebra a is the basis type for the tensor algebra over a, so that Vect k (TensorAlgebra a) is the tensor algebra T(V). The Int in TA Int [a] tells us which direct summand we're in (ie which tensor power), and the [a] tells us the tensor multiplicands. So for example, e2⊗e1⊗e4 would be represented as TA 3 [e2,e1,e4]. Then Vect k (TensorAlgebra a) consists of k-linear combinations of these basis elements, so it is the vector space T(V) that we are after.

Here's a Show instance:

instance Show a => Show (TensorAlgebra a) where
    show (TA _ []) = "1"
    show (TA _ xs) = filter (/= '"') $ concat $ L.intersperse "*" $ map show xs

It will be helpful to have a vector space basis to work with, so here's one that we used previously:

newtype EBasis = E Int deriving (Eq,Ord)



instance Show EBasis where show (E i) = "e" ++ show i

Then, for example, our Show instance gives us:

> :l Math.Algebras.TensorAlgebra

> return (TA 0 []) <+> return (TA 2 [E 1, E 3])

1+e1*e3

(Recall that the free vector space is a monad, hence our use of return to put a basis element into the vector space.)

So note that in the show output, the "*" is meant to represent tensor product, so this is really 1+e1⊗e3. You can actually get Haskell to output the tensor product symbol - just replace "*" by "\x2297" in the definition of show - however I found that it didn't look too good in the Mac OS X terminal, and I wasn't sure it would work on all OSes.

Ok, how about an Algebra instance? Well, TensorAlgebra a is basically just a slightly frilly version of [a], so it's a monoid, and we can use the monoid algebra construction:

instance Mon (TensorAlgebra a) where
    munit = TA 0 []
    mmult (TA i xs) (TA j ys) = TA (i+j) (xs++ys)

instance (Num k, Ord a) => Algebra k (TensorAlgebra a) where
    unit x = x *> return munit
    mult = nf . fmap (\(a,b) -> a `mmult` b)

So now we can do arithmetic in the tensor algebra:

> let e_ i = return (TA 1 [E i]) :: Vect Q (TensorAlgebra EBasis)

> let e1 = e_ 1; e2 = e_ 2; e3 = e_ 3; e4 = e_ 4

> (e1+e2) * (1+e3*e4)

e1+e2+e1*e3*e4+e2*e3*e4

We've got into the habit of using QuickCheck to check algebraic properties. Let's just check that the tensor algebra, as we've defined it, is an algebra:

instance Arbitrary b => Arbitrary (TensorAlgebra b) where
    arbitrary = do xs <- listOf arbitrary :: Gen [b] -- ScopedTypeVariables
                   let d = length xs
                   return (TA d xs)

prop_Algebra_TensorAlgebra (k,x,y,z) = prop_Algebra (k,x,y,z)
    where types = (k,x,y,z) :: ( Q, Vect Q (TensorAlgebra EBasis), Vect Q (TensorAlgebra EBasis), Vect Q (TensorAlgebra EBasis) )

> quickCheck prop_Algebra_TensorAlgebra
+++ OK, passed 100 tests.

Ok, so what's so special about the tensor algebra? Well, it has a rather nice universal property:
Suppose A = Vect k a, B = Vect k b are vector spaces, and we have a linear map f : A -> B. Suppose that B is also an algebra. Then we can "lift" f to an algebra morphism f' : T(A) -> B, such that the following diagram commutes.

In other words, f' agrees with f on the copy of A within T(A): f = f' . i

Ah, but hold on, I didn't say what an algebra morphism is. Well, it's just the usual: a function which "commutes" with the algebra structure. Specifically, it's a linear map (so that it commutes with the vector space structure), which makes the following diagrams commute:

So how does this universal property work then? Here's the code:

injectTA :: Num k => Vect k a -> Vect k (TensorAlgebra a)
injectTA = fmap (\a -> TA 1 [a])

liftTA :: (Num k, Ord b, Show b, Algebra k b) =>
     (Vect k a -> Vect k b) -> Vect k (TensorAlgebra a) -> Vect k b
liftTA f = linear (\(TA _ xs) -> product [f (return x) | x <- xs])

In other words, any tensor product u⊗v⊗... is sent to f(u)*f(v)*...

Let's look at an example. Recall that the quaternion algebra H has the basis {1,i,j,k}, with i^2 = j^2 = k^2 = ijk = -1.

> let f = linear (\(E n) -> case n of 1 -> 1+i; 2 -> 1-i; 3 -> j+k; 4 -> j-k; _ -> zerov)

> let f' = liftTA f

> e1*e2

e1*e2

> f' (e1*e2)

2

Recall that we usually define a linear map by linear extension from its action on a basis - that's what the "linear" is doing in the definition of f. It's fairly clear what f' is doing: it's basically just variable substitution. That is, we can consider the basis elements ei as variables, and the tensor algebra as the algebra of non-commutative polynomials in the ei. Then the linear map f assigns a substitution to each basis element, and f' just substitutes and multiplies out in the target algebra. In this case, we have:
e1⊗e2 -> (1+i)*(1-i) = 1-i^2 = 2

We can use QuickCheck to verify that liftTA f is indeed the algebra morphism required by the universal property. Here's a QuickCheck property for an algebra morphism. (We don't bother to check that f is a linear map, since it's almost always clear from the definition. If in doubt, we can test that separately.)

prop_AlgebraMorphism f (k,x,y) =
    (f . unit) k == unit k &&
    (f . mult) (x `te` y) == (mult . (f `tf` f)) (x `te` y)

This is just a transcription of the diagrams into code.

In order to test the universal property, we have to check that liftTA f is an algebra morphism, and that it agrees with f on (the copy of) V (in T(V)):

prop_TensorAlgebra_UniversalProperty (fmatrix,(k,x,y),z) =
    prop_AlgebraMorphism f' (k,x,y) &&
    (f' . injectTA) z == f z
    where f = linfun fmatrix
          f' = liftTA f
          types = (fmatrix,(k,x,y),z) :: (LinFun Q EBasis HBasis,
                                         (Q,Vect Q (TensorAlgebra EBasis), Vect Q (TensorAlgebra EBasis)),
                                         Vect Q EBasis)

So the key to this code is the parameter fmatrix, which is an arbitrary (sparse) matrix from Q^n to H, the quaternions, from which we build an arbitrary linear function. Note that the universal property of course implies that we can choose any algebra as the target for f - I just chose the quaternions because they're familiar.

> quickCheck prop_TensorAlgebra_UniversalProperty

+++ OK, passed 100 tests.

With this construction, tensor algebra is in fact a functor from k-Vect to k-Alg. The action on objects is V -> T(V), Vect k a -> Vect k (TensorAlgebra a). But a functor also acts on the arrows of the source category.

How do we get an action on arrows? Well, we can use the universal property to construct one. If we have an arrow f: A -> B, then (injectTA . f) is an arrow A -> T(B). Then we use the universal property to lift to an arrow f': T(A) -> T(B).

Here's the code:

fmapTA :: (Num k, Ord b, Show b) =>
    (Vect k a -> Vect k b) -> Vect k (TensorAlgebra a) -> Vect k (TensorAlgebra b)
fmapTA f = liftTA (injectTA . f)

For example:

newtype ABasis = A Int deriving (Eq,Ord,Show)
newtype BBasis = B Int deriving (Eq,Ord,Show)

> let f = linear (\(A i) -> case i of 1 -> return (B 1) <+> return (B 2);
                                      2 -> return (B 3) <+> return (B 4);
                                      _ -> zerov :: Vect Q BBasis)
> let f' = fmapTA f
> return (TA 2 [A 1, A 2]) :: Vect Q (TensorAlgebra ABasis)
A 1*A 2
> f' it
B 1*B 3+B 1*B 4+B 2*B 3+B 2*B 4

So this is variable substitution again. In this case, as f is just a linear map between vector spaces, we can think of it as something like a change of basis of the underlying space. Then f' shows us how the (non-commutative) polynomials defined over the space transform under the change of basis.

Let's just verify that this is a functor. We have to show:
- That fmapTA f is an algebra morphism (ie it is an arrow in k-Alg)
- That fmapTA commutes with the category structure, ie fmapTA id = id, and fmapTA (g . f) = fmapTA g . fmapTA f.

Here's a QuickCheck property:

prop_Functor_Vect_TensorAlgebra (f,g,k,x,y) =
    prop_AlgebraMorphism (fmapTA f') (k,x,y) &&
    (fmapTA id) x == id x &&
    fmapTA (g' . f') x == (fmapTA g' . fmapTA f') x
    where f' = linfun f
          g' = linfun g
          types = (f,g,k,x,y) :: (LinFun Q ABasis BBasis, LinFun Q BBasis CBasis,
                                  Q, Vect Q (TensorAlgebra ABasis), Vect Q (TensorAlgebra ABasis) )

> quickCheck prop_Functor_Vect_TensorAlgebra
+++ OK, passed 100 tests.

So can we declare a Functor instance? Well no, actually. Haskell only allows us to declare type constructors as Functor instances, whereas what we would want to do is declare the type function (\Vect k a -> Vect k (TensorAlgebra a)) as a Functor, which isn't allowed.

Ok, so we have a functor T: k-Vect -> k-Alg, the tensor algebra functor. We also have a forgetful functor going the other way, k-Alg -> k-Vect, which consists in taking an algebra, and simply forgetting that it is an algebra, and seeing only the vector space structure. (As it does at least remember the vector space structure, perhaps we should call this a semi-forgetful, or merely absent-minded functor.)

The cognoscenti will no doubt have seen what is coming next: we have an adjunction, and hence a monad.

How so? Well, it's obvious from its type signature that injectTA is return. For (>>=) / bind, we can define the following:

bindTA :: (Num k, Ord b, Show b) =>
    Vect k (TensorAlgebra a) -> (Vect k a -> Vect k (TensorAlgebra b)) -> Vect k (TensorAlgebra b)
bindTA = flip liftTA

Note that in addition to flipping the arguments, bindTA also imposes a more restrictive signature than liftTA: the target algebra is constrained to be a tensor algebra.

> let f = linear (\(A i) -> case i of 1 -> return (TA 2 [B 1, B 2]);
                                      2 -> return (TA 1 [B 3]) + return (TA 1 [B 4]);
                                      _ -> zerov :: Vect Q (TensorAlgebra BBasis))
> return (TA 2 [A 1, A 2]) `bindTA` f
B 1*B 2*B 3+B 1*B 2*B 4

So the effect of bind is to feed a non-commutative polynomial through a variable substitution.

Monads are meant to satisfy the following monad laws:
- "Left identity": return a >>= f == f a
- "Right identity": m >>= return == m
- "Associativity": (m >>= f) >>= g == m >>= (\x -> f x >>= g)

As usual, we write a QuickCheck property:

prop_Monad_Vect_TensorAlgebra (fmatrix,gmatrix,a,ta)=
    injectTA a `bindTA` f == f a                                  && -- left identity
    ta `bindTA` injectTA == ta                                    && -- right identity
    (ta `bindTA` f) `bindTA` g == ta `bindTA` (\a -> f a `bindTA` g) -- associativity
    where f = linfun fmatrix
          g = linfun gmatrix
          types = (fmatrix,gmatrix,a,ta) :: (LinFun Q ABasis (TensorAlgebra BBasis),
                                             LinFun Q BBasis (TensorAlgebra CBasis),
                                             Vect Q ABasis, Vect Q (TensorAlgebra ABasis) )

> quickCheck prop_Monad_Vect_TensorAlgebra
+++ OK, passed 100 tests.

Once again, we can't actually declare a Monad instance, because our type function (\Vect k a -> Vect k (TensorAlgebra a)) is not a type constructor.

So, we have a functor, and indeed a monad, T: k-Vect -> k-Alg. Now recall that the free vector space construction (\a -> Vect k a) was itself a functor, indeed a monad, from Set -> k-Vect. What happens if we compose these two functors? Why then of course we get a functor, and a monad, from Set -> k-Alg. In Haskell terms, this is a functor a -> Vect k (TensorAlgebra a).

What does this functor look like? Well, relative to a, Vect k (TensorAlgebra a) is the free algebra on a, consisting of all expressions in which the elements of k and the elements of a are combined using (commutative) addition and (non-commutative) multiplication. In other words, the elements of a can be thought of as variable symbols, and Vect k (TensorAlgebra a) as the algebra of non-commutative polynomials in these variables.

Here's the code:

injectTA' :: Num k => a -> Vect k (TensorAlgebra a)
injectTA' = injectTA . return

liftTA' :: (Num k, Ord b, Show b, Algebra k b) =>
     (a -> Vect k b) -> Vect k (TensorAlgebra a) -> Vect k b
liftTA' = liftTA . linear

fmapTA' :: (Num k, Ord b, Show b) =>
    (a -> b) -> Vect k (TensorAlgebra a) -> Vect k (TensorAlgebra b)
fmapTA' = fmapTA . fmap

bindTA' :: (Num k, Ord b, Show b) =>
    Vect k (TensorAlgebra a) -> (a -> Vect k (TensorAlgebra b)) -> Vect k (TensorAlgebra b)
bindTA' = flip liftTA'

The only one of these which might require a little explanation is liftTA'. This works by applying a universal property twice, as shown by the diagram below: first, the universal property of free vector spaces is used to lift a function a -> Vect k (TensorAlgebra b) to a function Vect k a -> Vect k (TensorAlgebra b); then the universal property of the tensor algebra is used to lift that to a function Vect k (TensorAlgebra a) -> Vect k (TensorAlgebra b).

Here's an example, which shows that in the free algebra as in the tensor algebra, bind corresponds to variable substitution:

> let [t,x,y,z] = map injectTA' ["t","x","y","z"] :: [Vect Q (TensorAlgebra String)]

> let f "x" = 1-t^2; f "y" = 2*t; f "z" = 1+t^2

> (x^2+y^2-z^2) `bindTA'` f

0

What is a Coalgebra?

2011-04-23T21:48:00.000+01:00

Last time we saw how to define an algebra structure on a vector space, in terms of category theory. I think perhaps some readers wondered what we gained by using category theory. The answer may be: not much, yet. However, in due course, we would like to understand the connection between quantum algebra and knot theory, and for that, category theory is essential.

This week, I want to look at coalgebras. We already saw, in the case of products and coproducts, how given a structure in category theory, you can define a dual structure by reversing the directions of all the arrows. So it is with algebras and coalgebras.

Recall that an algebra consisted of a k-vector space A together with linear functions
unit :: k -> A
mult :: A⊗A -> A
satisfying two commutative diagrams, associativity and unit.

Well, a coalgebra consists of a k-vector space C together with two linear functions:
counit :: C -> k
comult :: C -> C⊗C
satisfying the following two commutative diagrams:

Coassociativity:

This diagram is actually shorthand for the following diagram:

The isomorphisms at the top are the assocL and assocR isomorphisms that we defined here.

Counit:

These are just the associativity and unit diagrams, but with arrows reversed and relabeled.

Recall that when we say that a diagram commutes, we mean that it doesn't matter which way you follow the arrows, you end up with the same result. In other words, what these diagrams are saying is:

comult⊗id . comult == id⊗comult . comult  (Coassoc)

counit⊗id . comult == unitInL             (Left counit)

id⊗comult . comult == unitInR             (Right counit)

(where unitInL, unitInR are the isomorphisms that we defined in here.)

In HaskellForMaths, recall that we work with free vector spaces over a basis type, so the definition comes out slightly different:

module Math.Algebras.Structures where

...

class Coalgebra k c where
    counit :: Vect k c -> k
    comult :: Vect k c -> Vect k (Tensor c c)

What this definition really says is that c is the basis for a coalgebra. As before, we could try using type families to make this look more like the mathematical definition:

type TensorProd k u v =
    (u ~ Vect k a, v ~ Vect k b) => Vect k (Tensor a b)

class Coalgebra2 k c where
    counit2 :: c -> k
    comult2 :: c -> TensorProd k c c

In this definition, c is the coalgebra itself. However, I'm not going to pursue this approach for now.

What do coalgebras look like? Well, they look a bit like algebras would, if you looked at them through the wrong end of the telescope.

More specifically, given any basis b, define a dual basis as follows:

newtype Dual basis = Dual basis deriving (Eq,Ord)

instance Show basis => Show (Dual basis) where
    show (Dual b) = show b ++ "'"

(For those who know what a dual vector space is - this is it. For those who don't, I'll explain in a minute.)

Then, given an Algebra instance on some finite-dimensional basis b, we can define a Coalgebra instance on Dual b as follows:

1. Write out the unit and mult operations in the algebra as matrices.

For example, in the case of the quaternions, we have
unit:

     1 i j k

1 -> 1 0 0 0

mult:

        1  i  j  k

1⊗1 ->  1  0  0  0
1⊗i ->  0  1  0  0
1⊗j ->  0  0  1  0
1⊗k ->  0  0  0  1
i⊗1 ->  0  1  0  0
i⊗i -> -1  0  0  0
i⊗j ->  0  0  0  1
i⊗k ->  0  0 -1  0
j⊗1 ->  0  0  1  0
j⊗i ->  0  0  0 -1
j⊗j -> -1  0  0  0
j⊗k ->  0  1  0  0
k⊗1 ->  0  0  0  1
k⊗i ->  0  0  1  0
k⊗j ->  0 -1  0  0
k⊗k -> -1  0  0  0

2. Then transpose these two matrices, and use them as the definitions for counit and comult, but replacing each basis element by its dual.

So, in the case of the quaternions, we would get
counit:

      1'

1' -> 1
i' -> 0
j' -> 0
k' -> 0

comult:

     1'⊗1' 1'⊗i' 1'⊗j' 1'⊗k' i'⊗1' i'⊗i' i'⊗j' i'⊗k' j'⊗1' j'⊗i' j'⊗j' j'⊗k' k'⊗1' k'⊗i' k'⊗j' k'⊗k'

1' ->  1     0     0     0      0    -1     0     0     0     0    -1     0     0     0     0    -1
i' ->  0     1     0     0      1     0     0     0     0     0     0     1     0     0    -1     0
j' ->  0     0     1     0      0     0     0    -1     1     0     0     0     0     1     0     0
k' ->  0     0     0     1      0     0     1     0     0    -1     0     0     1     0     0     0

In code, we get:

instance Num k => Coalgebra k (Dual HBasis) where
    counit = unwrap . linear counit'
        where counit' (Dual One) = return ()
              counit' _ = zero
    comult = linear comult'
        where comult' (Dual One) = return (Dual One, Dual One) <+>
                  (-1) *> ( return (Dual I, Dual I) <+> return (Dual J, Dual J) <+> return (Dual K, Dual K) )
              comult' (Dual I) = return (Dual One, Dual I) <+> return (Dual I, Dual One) <+>
                  return (Dual J, Dual K) <+> (-1) *> return (Dual K, Dual J)
              comult' (Dual J) = return (Dual One, Dual J) <+> return (Dual J, Dual One) <+>
                  return (Dual K, Dual I) <+> (-1) *> return (Dual I, Dual K)
              comult' (Dual K) = return (Dual One, Dual K) <+> return (Dual K, Dual One) <+>
                  return (Dual I, Dual J) <+> (-1) *> return (Dual J, Dual I)

unwrap :: Num k => Vect k () -> k
unwrap (V []) = 0
unwrap (V [( (),x)]) = x

(Recall that when we want to think of k as a vector space, we have to represent it as Vect k ().)

We should check that this does indeed define a coalgebra. It's clear, by definition, that counit and comult are linear, as required. Here's a quickcheck property for coassociativity and counit:

prop_Coalgebra x =
    ((comult `tf` id) . comult) x == (assocL . (id `tf` comult) . comult) x  && -- coassociativity
    ((counit' `tf` id) . comult) x == unitInL x                              && -- left counit
    ((id `tf` counit') . comult) x == unitInR x                                 -- right counit
        where counit' = wrap . counit

wrap :: Num k => k -> Vect k ()
wrap 0 = zero
wrap x = V [( (),x)]

(It's a bit awkward that we have to keep wrapping and unwrapping between k and Vect k (). I think we could have avoided this if we had defined counit to have signature Vect k c -> Vect k () instead of Vect k c -> k. To be honest, I think that is probably the right thing to do, so perhaps I'll change it in some future release of HaskellForMaths. What does anyone else think?)

Anyway, here's a quickCheck property to test that the dual quaternions are a coalgebra:

instance Arbitrary b => Arbitrary (Dual b) where
    arbitrary = fmap Dual arbitrary

prop_Coalgebra_DualQuaternion x = prop_Coalgebra x
    where types = x :: Vect Q (Dual HBasis)

> quickCheck prop_Coalgebra_DualQuaternion
+++ OK, passed 100 tests.

So, given an algebra structure on some vector space (basis), we can define a coalgebra structure on the dual vector space (dual basis). But why did we use the dual? Why didn't we just use the above construction to define a coalgebra structure on the vector space itself? For example, for the quaternions, that would look like this:

instance Num k => Coalgebra k HBasis where
    counit = unwrap . linear counit'
        where counit' One = return ()
              counit' _ = zero
    comult = linear comult'
        where comult' One = return (One,One) <+> (-1) *> ( return (I,I) <+> return (J,J) <+> return (K,K) )
              comult' I = return (One,I) <+> return (I,One) <+> return (J,K) <+> (-1) *> return (K,J)
              comult' J = return (One,J) <+> return (J,One) <+> return (K,I) <+> (-1) *> return (I,K)
              comult' K = return (One,K) <+> return (K,One) <+> return (I,J) <+> (-1) *> return (J,I)

Well, we could indeed do that. However, it obscures the underlying mathematics. The point is that an algebra structure on a finite-dimensional vector space gives rise "naturally" to a coalgebra structure on the dual space. It is then also true that a finite-dimensional vector space is naturally isomorphic to its dual, which is I guess why the second construction works.

Okay, so why is the coalgebra structure on the dual vector space "natural"?

Well first, what is a dual vector space anyway? Given a vector space V, the dual space V* is Hom(V,k), the space of linear maps from V to k. If V is finite-dimensional with basis b, then as we have seen, we can define a basis Dual b for V* which is in one-to-one correspondence with b. Given a basis element ei, then the dual basis element Dual ei represents the linear map that sends ei to 1, and any other basis element ej, j /= i, to 0.

It is convenient to define a linear map called the evaluation map, ev: V*⊗V -> k, with ev(a⊗x) = a(x):

ev :: (Num k, Ord b) => Vect k (Tensor (Dual b) b) -> k

ev = unwrap . linear (\(Dual bi, bj) -> delta bi bj *> return ())



-- where delta i j = if i == j then 1 else 0

Then given an element a in V* = Vect k (Dual b), and an element x in V = Vect k b, we can evaluate a(x) by calling ev (a `te` x).

For example:

dual = fmap Dual



> ev $ dual e1 `te` (4 *> e1 <+> 5 *> e2)

4

> ev $ dual e2 `te` (4 *> e1 <+> 5 *> e2)

5

> ev $ dual (e1 <+> 2 *> e2) `te` (4 *> e1 <+> 5 *> e2)

14

Provided V is finite-dimensional, then every element of V* can be expressed in the form dual v, for some v in V.

If we want to turn an element of Vect k (Dual b) into a real Haskell function Vect k b -> k, we can use the following code:

reify :: (Num k, Ord b) => Vect k (Dual b) -> (Vect k b -> k)

reify a x = ev (a `te` x)

For example:

> let f = reify (dual e2)

> f (4 *> e1 <+> 5 *> e2)

5

Now, suppose that we have a linear map f: U -> V between vector spaces. This gives rise to a linear map f*: V* -> U*, by defining:
ev (f* a ⊗ x) = ev (a ⊗ f x)
It turns out that the matrix for f* will be the transpose of the matrix for f.

For, suppose f(ui) = sum [mij vj | j <- ...] and f*(vi*) = sum [m*ij uj* | j <- ...]. Then

ev (f* vk* ⊗ ui) = ev (vk* ⊗ f ui)                                           -- definition of f*

=> ev (sum [m*kl ul* | l <- ...] ⊗ ui) = ev (vk* ⊗ sum [mij vj | j <- ...])  -- expanding f and f*

=> sum [m*kl ev(ul* ⊗ ui) | l <- ...] = sum [mij ev (vk* ⊗ vj) | j <- ...]   -- linearity of ev

=> sum [m*kl (delta l i) | l <- ...] = sum [mij (delta k j) | j <- ...]       -- definition of ev

=> m*ki = mik                                                                 -- definition of delta

So m* is the transpose of m.

Hence we have a contravariant functor * from the category of finite-dimensional vector spaces to itself, which takes a space V to the dual space V*, and a linear map f: U -> V to the dual map f*: V* -> U*. ("Contravariant" just means that it reverses the directions of arrows.)

Hopefully this explains why it is natural to think of an algebra structure on V as giving rise to a coalgebra structure on V*: a coalgebra is basically an algebra but with arrows reversed - and in going from vector spaces to their duals, we reverse arrows.

By the way, the converse is also true: A coalgebra structure on V gives rise to an algebra structure on V*, in the same way.

What is an Algebra?

2011-04-16T20:21:00.001+01:00

Over the last few months, we've spent somewhat longer than I originally expected looking at vector spaces, direct sums and tensor products. I hope you haven't forgotten that the reason we were doing this is because we want to look at quantum algebra, and "quantum groups". What are quantum groups? Well, one thing they are is algebras - so the next thing we need to do is define algebras.

Informally, an algebra is just a vector space which is also a ring (with unit) - or to put it another way, a ring (with unit) which is also a vector space. So a straightforward definition would be, A is an algebra if
(i) A is an additive group (this is required by both vector spaces and rings)
(ii) There is a scalar multiplication smult :: k×A -> A (satisfying some laws, as discussed in a previous post)
(iii) There is a multiplication mult :: A×A -> A, satisfying some laws:
- mult is associative: a(bc) = (ab)c
- mult distributes over addition: a(b+c) = (ab)+(ac), (a+b)c = (ac)+(bc)
(iv) There is a unit :: A, which is an identity for mult: 1a = a = a1

Some examples:
- C is an algebra over R (2-dimensional as a vector space)
- 2×2 matrices over a field k form a k-algebra (4-dimensional)
- polynomials over a field k form a k-algebra (infinite-dimensional)

It would be fairly straightforward to translate these definitions into a Haskell type class as they are. However, we're going to do things slightly differently, for two reasons.

Firstly, we would like to use the language of category theory, and define multiplication and unit in terms of linear maps (arrows in the category of vector spaces). Specifically, we define linear maps:
unit :: k -> A
mult :: A⊗A -> A

We then require that the following diagrams commute:

Associativity:

When we say that this diagram commutes, what it means is that it doesn't matter which way you decide to follow the arrows, the result is the same. Specifically:
mult . mult⊗id == mult . id⊗mult

Unit:

In this case there are two commuting triangles, so we are saying:
mult . unit⊗id == unitOutL
mult . id⊗unit == unitOutR
(where unitOutL, unitOutR are the relevant isomorphisms, which we defined last time.)

But hold on, haven't we forgotten one of the requirements? What about distributivity? Well, notice that the signature of mult was A⊗A -> A, not A×A -> A. A linear map from the tensor product is bilinear in each component (by definition of tensor product) - and bilinearity implies distributivity. So in this version, distributivity is built into the definition of mult. Neat, eh?

The second reason our definition will be different is that in HaskellForMaths, all our vector spaces are free vector spaces over some basis type b: V = Vect k b. Consequently, our algebras will be A = Vect k a, where a is a k-basis for the algebra. Because of this, it will turn out to be more natural to express some things in terms of a, rather than A.

With those forewarnings, here's the HaskellForMaths definition of an algebra:

module Math.Algebras.Structures where

import Math.Algebras.VectorSpace
import Math.Algebras.TensorProduct

class Algebra k a where
    unit :: k -> Vect k a
    mult :: Vect k (Tensor a a) -> Vect k a

In this definition, a represents the basis of the algebra, not the algebra itself, which is A = Vect k a. Recall that we defined a type Tensor a b, which is a basis for Vect k a ⊗ Vect k b.

If we wanted to stay a bit closer to the category theory definition, we could try continuing with the type family approach that we looked at last time:

type TensorProd k u v =
    (u ~ Vect k a, v ~ Vect k b) => Vect k (Tensor a b)

class Algebra2 k a where
    unit2 :: k -> a
    mult2 :: TensorProd k a a -> a

In this definition a is the algebra itself. I'm not going to pursue that approach any further here, but if anyone fancies giving it a go, I'd be interested to hear how they get on.

Anyway, as discussed, unit and mult are required to be linear maps. We could check this using QuickCheck, but in practice we will always define unit and mult in such a way that they are clearly linear.

However, we can write a QuickCheck property to check the other requirements:

prop_Algebra (k,x,y,z) =
    (mult . (id `tf` mult)) (x `te` (y `te` z)) ==
         (mult . (mult `tf` id)) ((x `te` y) `te` z)              && -- associativity
    unitOutL (k' `te` x) == (mult . (unit' `tf` id)) (k' `te` x)  && -- left unit
    unitOutR (x `te` k') == (mult . (id `tf` unit')) (x `te` k')     -- right unit
    where k' = k *> return ()

(Recall that when we wish to consider k as a vector space, we represent it as the free vector space Vect k ().)

When we have an algebra, then we have a ring, so we can define a Num instance:

instance (Num k, Eq b, Ord b, Show b, Algebra k b) => Num (Vect k b) where
    x+y = x <+> y
    negate x = neg x
    x*y = mult (x `te` y)
    fromInteger n = unit (fromInteger n)
    abs _ = error "Prelude.Num.abs: inappropriate abstraction"
    signum _ = error "Prelude.Num.signum: inappropriate abstraction"

This means that when we have an algebra, we'll be able to write expressions using the usual arithmetic operators.

Okay, so how about some examples of algebras. Well, we mentioned the complex numbers as an algebra over the reals, but let's go one better and define the quaternion algebra. This is a four dimensional algebra over any field k, which is generated by {1, i, j, k}, satisfying the relations i^2 = j^2 = k^2 = ijk = -1. (It follows, for example that (ijk)k = (-1)k, so ij(k^2) = -k, so ij = k.)

Here's the code. First, we define our basis. The quaternions are traditionally denoted H, after Hamilton, who discovered them:

data HBasis = One | I | J | K deriving (Eq,Ord)

type Quaternion k = Vect k HBasis

i,j,k :: Num k => Quaternion k
i = return I
j = return J
k = return K

instance Show HBasis where
    show One = "1"
    show I = "i"
    show J = "j"
    show K = "k"

instance (Num k) => Algebra k HBasis where
    unit x = x *> return One
    mult = linear m
         where m (One,b) = return b
               m (b,One) = return b
               m (I,I) = unit (-1)
               m (J,J) = unit (-1)
               m (K,K) = unit (-1)
               m (I,J) = return K
               m (J,I) = -1 *> return K
               m (J,K) = return I
               m (K,J) = -1 *> return I
               m (K,I) = return J
               m (I,K) = -1 *> return J

Note that unit and mult are both linear by definition.

Let's just check that the code works as expected:

> :l Math.Algebras.Quaternions

> i^2

-1

> j^2

-1

> i*j

k

Now, are we sure that the quaternions are an algebra? Well, it's clear from the definition that the left and right unit conditions hold - see the lines m (One,b) = m (b,One) = return b. But it's not obvious that the associativity condition holds, so perhaps we should quickCheck:

instance Arbitrary HBasis where
    arbitrary = elements [One,I,J,K]

prop_Algebra_Quaternion (k,x,y,z) = prop_Algebra (k,x,y,z)
    where types = (k,x,y,z) :: (Q, Quaternion Q, Quaternion Q, Quaternion Q)

> quickCheck prop_Algebra_Quaternion
+++ OK, passed 100 tests.

Ok, how about 2*2 matrices. These form a four dimensional algebra generated by the elementary matrices {e11, e12, e21, e22}, where eij is the matrix with a 1 in the (i,j) position, and 0s elsewhere:

data Mat2 = E2 Int Int deriving (Eq,Ord,Show)

instance Num k => Algebra k Mat2 where
   unit x = x *> V [(E2 i i, 1) | i <- [1..2] ]
   mult = linear mult' where
        mult' (E2 i j, E2 k l) = delta j k *> return (E2 i l)

delta i j | i == j    = 1
          | otherwise = 0

Notice the way that we only have to define multiplication on our basis elements, the elementary matrices eij, and the rest follows by bilinearity. Notice also that unit and mult are linear by definition. Let's just sanity check that this works as expected:

> :l Math.Algebras.Matrix

> unit 1 :: Vect Q Mat2

E2 1 1+E2 2 2

> let a = 2 *> return (E2 1 2) + 3 *> return (E2 2 1)

> a^2

6E2 1 1+6E2 2 2

It's straightforward to define an Arbitrary instance for Mat2, and quickCheck that it satisfies the algebra conditions.

Finally, what about polynomials? Let's go one better, and define polynomials in more than one variable. An obvious basis for these polynomials as a vector space is the set of monomials. For example, polynomials in x,y,z are a vector space on the basis { x^i y^j z^k | i <- [0..], j <- [0..], k <- [0..] }.

Recall that our vector space code requires that the basis be an Ord instance, so we need to define an ordering on monomials. There are many ways to do this. We'll use the graded lex or glex ordering, which says that monomials of higher degree sort before those of lower degree, and among those of equal degree, lexicographic (alphabetical) ordering applies.

Given any set X of variables, we can construct the polynomial algebra over X as the vector space with basis the monomials in X. In the example above, we had X = {x,y,z}. For this reason, we'll allow our glex monomials to be polymorphic in the type of the variables. (In practice though, as you will see shortly, we will often just use String as the type of our variables.)

So a glex monomial over variables v is basically just a list of powers of elements of v:

data GlexMonomial v = Glex Int [(v,Int)] deriving (Eq)

-- The initial Int is the degree of the monomial. Storing it speeds up equality tests and comparisons

For example x^3 y^2 would be represented as Glex 5 [("x",3),("y",2)].

instance Ord v => Ord (GlexMonomial v) where
    compare (Glex si xis) (Glex sj yjs) =
        compare (-si, [(x,-i) | (x,i) <- xis]) (-sj, [(y,-j) | (y,j) <- yjs])
-- all the minus signs are to make things sort in the right order

[There's a bug in the HaskellForMaths v0.3.2 version of this Ord instance - the above code, which fixes it, will be in the next release.]

I won't bore you with the Show instance - it's a bit fiddly.

The Algebra instance uses the fact that monomials form a monoid under multiplication, with unit 1. Given any monoid, we can form the free vector space having the monoid elements as basis, and then lift the unit and multiplication in the monoid into the vector space, thus forming an algebra called the monoid algebra. Here's the construction:

instance (Num k, Ord v) => Algebra k (GlexMonomial v) where
    unit x = x *> return munit
        where munit = Glex 0 []
    mult xy = nf $ fmap (\(a,b) -> a `mmult` b) xy
        where mmult (Glex si xis) (Glex sj yjs) = Glex (si+sj) $ addmerge xis yjs

(In the addmerge function, we ensure that provided the variables were listed in ascending order in both inputs, then they are still so in the output.)

Finally, here's a convenience function for injecting a variable into the polynomial algebra:

glexVar v = V [(Glex 1 [(v,1)], 1)]

Then for example, we can do the following:

type GlexPoly k v = Vect k (GlexMonomial v)



> let x = glexVar "x" :: GlexPoly Q String

> let y = glexVar "y" :: GlexPoly Q String

> let z = glexVar "z" :: GlexPoly Q String

> (x+y+z)^3

x^3+3x^2y+3x^2z+3xy^2+6xyz+3xz^2+y^3+3y^2z+3yz^2+z^3

I hope you were impressed at how easy that all was. The foundation of free vector spaces, tensor products and algebras is only around a hundred lines of code. It then takes just another dozen lines or so to define an algebra: just define a basis, and define how the basis elements multiply - the rest follows by linearity.

Tensor Products, part 2: Monoids and Arrows

2011-03-18T10:23:00.000+00:00

[New release, HaskellForMaths v0.3.2, available on Hackage]

Last time we looked at the tensor product of free vector spaces. Given A = Vect k a, B = Vect k b, then the tensor product A⊗B can be represented as Vect k (a,b). As we saw, the tensor product is the "mother of all bilinear functions".

In the HaskellForMaths library, I have defined a couple of type synonyms for direct sum and tensor product:

type DSum a b = Either a b

type Tensor a b = (a,b)

This means that in type signatures we can write the type of a direct sum as Vect k (DSum a b), and of a tensor product as Vect k (Tensor a b). The idea is that this will remind us what we're dealing with, and make things clearer.

During development, I initially called the tensor type TensorBasis. In maths, tensor product is thought of as an operation on vector spaces - A⊗B - rather than on their bases. It would be nicer if we could define direct sum and tensor product as operators on the vector spaces themselves, rather than their bases.

Well, we can have a go, something like this:

{-# LANGUAGE TypeFamilies, RankNTypes #-}

type DirectSum k u v =
    (u ~ Vect k a, v ~ Vect k b) => Vect k (DSum a b)

type TensorProd k u v =
    (u ~ Vect k a, v ~ Vect k b) => Vect k (Tensor a b)

type En = Vect Q EBasis

This appears to work:

$ ghci -XTypeFamilies

...

> :l Math.Test.TAlgebras.TTensorProduct

...

> e1 `te` e2 :: TensorProd Q En En

(e1,e2)

I'll reserve judgement. (Earlier in the development of the quantum algebra code for HaskellForMaths, I tried something similar to this, and ran into problems later on - but I can't now remember exactly what I did, so perhaps this will work.)

Okay, so what can we do with tensor products. Well first, given vectors u in A = Vect k a and v in B = Vect k b, we can form their tensor product, u⊗v, an element of A⊗B = Vect k (Tensor a b). To calculate u⊗v, we use the bilinearity of tensor product to reduce the tensor product of arbitrary vectors to a linear combination of tensor products of basis elements:
(x1 a1 + x2 a2)⊗(y1 b1 + y2 b2) = x1 y1 a1⊗b1 + x1 y2 a1⊗b2 + x2 y1 a2⊗b1 + x2 y2 a2⊗b2
Here's the code:

te :: Num k => Vect k a -> Vect k b -> Vect k (Tensor a b)

te (V us) (V vs) = V [((a,b), x*y) | (a,x) <- us, (b,y) <- vs]

This is in essence just the "tensor" function from last time, but rewritten to take its two inputs separately rather than in a direct sum. Mnemonic: "te" stands for "tensor product of elements". Note that the definition respects normal form: provided the inputs are in normal form (the as and bs are in order, and the xs and ys are non-zero), then so is the output.

Associativity

We can form tensor products of tensor products, such as A⊗(B⊗C) = Vect k (a,(b,c)), and likewise (A⊗B)⊗C = Vect k ((a,b),c). These two are isomorphic as vector spaces. This is obvious if you think about it in the right way. Recall from last week that we can think of elements of A⊗B = Vect k (a,b) as 2-dimensional matrices with rows indexed by a, columns indexed by b, and entries in k. Well A⊗B⊗C (we can drop the parentheses as it makes no difference) is the space of three-dimensional matrices, with one dimension indexed by a, one by b, and one by c.

We can define isomorphisms either way with the following Haskell code:

assocL :: Vect k (Tensor u (Tensor v w)) -> Vect k (Tensor (Tensor u v) w)

assocL = fmap ( \(a,(b,c)) -> ((a,b),c) )



assocR :: Vect k (Tensor (Tensor u v) w) -> Vect k (Tensor u (Tensor v w))

assocR = fmap ( \((a,b),c) -> (a,(b,c)) )

It's clear that these functions are linear, since they're defined using fmap. It's also clear that they are bijections, since they are mutually inverse. Hence they are the required isomorphisms.

Unit

Last time we saw that the field k is itself a vector space, which can be represented as the free vector space Vect k (). What happens if we take the tensor product k⊗A of the field with some other vector space A = Vect k a? Well, if you think about it in terms of matrices, Vect k () is a one-dimensional vector space, so Vect k ((),a) will be a 1*n matrix (where n is the number of basis elements in a). But a 1*n matrix looks just the same as an n-vector:

    a1 a2 ...      a1 a2 ...
() ( .  .    ) ~= ( .  .    )

So we should expect that k⊗A = Vect k ((),a) is isomorphic to A = Vect k a. And indeed it is - here are the relevant isomorphisms:

unitInL = fmap ( \a -> ((),a) )



unitOutL = fmap ( \((),a) -> a )



unitInR = fmap ( \a -> (a,()) )



unitOutR = fmap ( \(a,()) -> a )

So tensor product is associative, and has a unit. In other words, vector spaces form a monoid under tensor product.

Tensor product of functions

Given linear functions f: A -> A', g: B -> B', we can define a linear function f⊗g: A⊗B -> A'⊗B' by
(f⊗g)(a⊗b) = f(a)⊗g(b)

Exercise: Prove that f⊗g is linear

Here's the Haskell code:

tf :: (Num k, Ord a', Ord b') => (Vect k a -> Vect k a') -> (Vect k b -> Vect k b')
   -> Vect k (Tensor a b) -> Vect k (Tensor a' b')
tf f g (V ts) = sum [x *> te (f $ return a) (g $ return b) | ((a,b), x) <- ts]
    where sum = foldl add zero

(Mnemonic: "tf" stands for "tensor product of functions".)

Let's just check that this is linear:

prop_Linear_tf ((f,g),k,(a1,a2,b1,b2)) = prop_Linear (linfun f `tf` linfun g) (k, a1 `te` b1, a2 `te` b2)
    where types = (f,g,k,a1,a2,b1,b2) :: (LinFun Q ABasis SBasis, LinFun Q BBasis TBasis, Q,
                                          Vect Q ABasis, Vect Q ABasis, Vect Q BBasis, Vect Q BBasis)

> quickCheck prop_Linear_tf
+++ OK, passed 100 tests.

So we now have tensor product operations on objects and on arrows. In each case, tensor product takes a pair of objects/arrows, and returns a new object/arrow.

There is a product category k-Vect×k-Vect, consisting of pairs of objects and pairs of arrows from k-Vect. The identity arrow is defined to be (id,id), and composition is defined by (f,g) . (f',g') = (f . f', g . g'). Given these definitions, it turns out that tensor product is a functor from k-Vect×k-Vect to k-Vect. (Another way to say this is that tensor product is a bifunctor in the category of vector spaces.)

Recall that a functor is just a map that "commutes" with the category operations, id and . (composition).
So the conditions for tensor product to be a functor are:
id⊗id = id
(f' . f)⊗(g' . g) = (f'⊗g') . (f⊗g)

Both of these follow immediately from the definition of f⊗g that was given above. However, just in case you don't believe me, here's a quickCheck property to prove it:

prop_TensorFunctor ((f1,f2,g1,g2),(a,b)) =
    (id `tf` id) (a `te` b) == id (a `te` b) &&
    ((f' . f) `tf` (g' . g)) (a `te` b) == ((f' `tf` g') . (f `tf` g)) (a `te` b)
    where f = linfun f1
          f' = linfun f2
          g = linfun g1
          g' = linfun g2
          types = (f1,f2,g1,g2,a,b) :: (LinFun Q ABasis ABasis, LinFun Q ABasis ABasis,
                                        LinFun Q BBasis BBasis, LinFun Q BBasis BBasis,
                                        Vect Q ABasis, Vect Q BBasis)

> quickCheck prop_TensorFunctor
+++ OK, passed 100 tests.

We can think of composition as doing things in series, and tensor as doing things in parallel.

Then the second bifunctor condition can be paraphrased as "Doing things in parallel, in series, is the same as doing things in series, in parallel", as represented by the following diagram.

You might recall that there are a couple of Haskell type classes for that kind of thing. The Category typeclass from Control.Category is about doing things in series. Here is the definition:

class Category cat where
    id :: cat a a
    (.) :: cat b c -> cat a b -> cat a c

The Arrow typeclass from Control.Arrow is about doing things in parallel:

class Category arr => Arrow arr where
    arr :: (a -> b) -> arr a b
    first :: arr a b -> arr (a,c) (b,c)
    second :: arr a b -> arr (c,a) (c,b)
    (***) :: arr a b -> arr a' b' -> arr (a,a') (b,b')
    (&&&) :: arr a b -> arr a b' -> arr a (b,b')

Intuitively, linear functions (Vect k a -> Vect k b) are arrows, via the definitions:
id = id
(.) = (.)
arr = fmap
first f = f `tf` id
second f = id `tf` f
f *** g = f `tf` g
(f &&& g) a = (f `tf` g) (a `te` a)

However, in order to define an Arrow instance we'll have to wrap the functions in a newtype.

import Prelude as P
import Control.Category as C
import Control.Arrow

newtype Linear k a b = Linear (Vect k a -> Vect k b)

instance Category (Linear k) where
    id = Linear P.id
    (Linear f) . (Linear g) = Linear (f P.. g)

instance Num k => Arrow (Linear k) where
    arr f = Linear (fmap f)
    first (Linear f) = Linear f *** Linear P.id
    second (Linear f) = Linear P.id *** Linear f
    Linear f *** Linear g = Linear (f `tf2` g)
        where tf2 f g (V ts) = V $ concat
                  [let V us = x *> te (f $ return a) (g $ return b) in us | ((a,b), x) <- ts]
    Linear f &&& Linear g = (Linear f *** Linear g) C.. Linear (\a -> a `te` a)

Note that we can't use tf directly, because it requires Ord instances for a and b, and Haskell doesn't give us a way to require these. For this reason we define a tf2 function, which is equivalent except that it doesn't guarantee that results are in normal form.

There is loads of other stuff I could talk about:
Exercise: Show that direct sum is also a monoid, with the zero vector space as its identity. (Write Haskell functions for the necessary isomorphisms.)
Exercise: Show that tensor product distributes over direct sum - A⊗(B⊕C) is isomorphic to (A⊗B)⊕(A⊗C). (Write the isomorphisms.)
Exercise: Show that given f: A->A', g: B->B', it is possible to define a linear function f⊕g: A⊕B->A'⊕B' by (f⊕g)(a⊕b) = f(a)⊕g(b). (Write a dsumf function analogous to tf.)

There is another arrow related typeclass called ArrowChoice. It represents arrows where you have a choice of doing either one thing or another thing:

class Arrow arr => ArrowChoice arr where
    left :: arr a b -> arr (Either a c) (Either b c)
    right :: arr a b -> arr (Either c a) (Either c b)
    (+++) :: arr a a' -> arr b b' -> arr (Either a b) (Either a' b')
    (|||) :: arr a c -> arr b c -> arr (Either a b) c

Exercise: Show that the dsumf function can be used to give an ArrowChoice instance for linear functions, where the left summand goes down one path and the right summand down another.

Tensor products of vector spaces, part 1

2011-02-21T20:13:00.001+00:00

A little while back on this blog, we defined the free k-vector space over a type b:

newtype Vect k b = V [(b,k)] deriving (Eq,Ord)
Elements of Vect k b are k-linear combinations of elements of b.

Whenever we have a mathematical structure like this, we want to know about building blocks and new-from-old constructions.

We already looked at one new-from-old construction: given free k-vector spaces A = Vect k a and B = Vect k b, we can construct their direct sum A⊕B = Vect k (Either a b).

We saw that the direct sum is both the product and the coproduct in the category of free vector spaces - which means that it is the object which satisfies the universal properties implied by the following two diagrams:

So we have injections i1, i2 : Vect k a, Vect k b -> Vect k (Either a b), to put elements of A and B into the direct sum A⊕B, and projections p1, p2 : Vect k (Either a b) -> Vect k a, Vect k b to take them back out again.

However, there is another obvious new-from-old construction: Vect k (a,b). What does this represent?

In order to answer that question, we need to look at bilinear functions. The basic idea of a bilinear function is that it is a function of two arguments, which is linear in each argument. So we might start by looking at functions f :: Vect k a -> Vect k b -> Vect k t.

However, functions of two arguments don't really sit very well in category theory, where arrows are meant to have a single source. (We can handle functions of two arguments in multicategories, but I don't want to go there just yet.) In order to stay within category theory, we need to combine the two arguments into a single argument, using the direct sum construction. So instead of looking at functions f :: Vect k a -> Vect k b -> Vect k t, we will look at functions f :: Vect k (Either a b) -> Vect k t.

To see that they are equivalent, recall from last time that Vect k (Either a b) is isomorphic to (Vect k a, Vect k b), via the isomorphisms:

to :: (Vect k a, Vect k b) -> Vect k (Either a b)

to = \(u,v) -> i1 u <+> i2 v

from :: Vect k (Either a b) -> (Vect k a, Vect k b)

from = \uv -> (p1 uv, p2 uv)

So in going from f :: Vect k a -> Vect k b -> Vect k t to f :: Vect k (Either a b) -> Vect k t, we're really just uncurrying.

Ok, so suppose we are given f :: Vect k (Either a b) -> Vect k t. It helps to still think of this as a function of two arguments, even though we've wrapped them up together in either side of a direct sum. Then we say that f is bilinear, if it is linear in each side of the direct sum. That is:
- for any fixed a0 in A, the function f_a0 :: Vect k b -> Vect k t, f_a0 = \b -> f (i1 a0 <+> i2 b) is linear
- for any fixed b0 in B, the function f_b0 :: Vect k a -> Vect k t, f_b0 = \a -> f (i1 a <+> i2 b0) is linear

Here's a QuickCheck property to test whether a function is bilinear:

prop_Bilinear :: (Num k, Ord a, Ord b, Ord t) =>
     (Vect k (Either a b) -> Vect k t) -> (k, Vect k a, Vect k a, Vect k b, Vect k b) -> Bool
prop_Bilinear f (k,a1,a2,b1,b2) =
    prop_Linear (\b -> f (i1 a1 <+> i1 b)) (k,b1,b2) &&
    prop_Linear (\a -> f (i1 a <+> i1 b1)) (k,a1,a2)

prop_BilinearQn f (a,u1,u2,v1,v2) = prop_Bilinear f (a,u1,u2,v1,v2)
    where types = (a,u1,u2,v1,v2) :: (Q, Vect Q EBasis, Vect Q EBasis, Vect Q EBasis, Vect Q EBasis)

What are some examples of bilinear functions?

Well, perhaps the most straightforward is the dot product of vectors. If our vector spaces A and B are the same, then we can define the dot product:

dot0 uv = sum [ if a == b then x*y else 0 | (a,x) <- u, (b,y) <- v]
    where V u = p1 uv
          V v = p2 uv

However, as it stands, this won't pass our QuickCheck property - because it has the wrong type! This has the type dot0 :: Vect k (Either a b) -> k, whereas we need something of type Vect k (Either a b) -> Vect k t.

Now, it is of course true that k is a k-vector space. However, as it stands, it's not a free k-vector space over some basis type t. Luckily, this is only a technicality, which is easily fixed. When we want to consider k as itself a (free) vector space, we will take t = (), the unit type, and equate k with Vect k (). Since the type () has only a single inhabitant, the value (), then Vect k () consists of scalar multiples of () - so it is basically just a single copy of k itself. The isomorphism between k and Vect k () is \k -> k *> return ().

Okay, so now that we know how to represent k as a free k-vector space, we can define dot product again:

dot1 uv = nf $ V [( (), if a == b then x*y else 0) | (a,x) <- u, (b,y) <- v]
    where V u = p1 uv
          V v = p2 uv

This now has the type dot1 :: Vect k (Either a b) -> Vect k (). Here's how you use it:

> dot1 ( i1 (e1 <+> 2 *> e2) <+> i2 (3 *> e1 <+> e2) )

5()

(So thinking of our function as a function of two arguments, what we do is use i1 to inject the first argument into the left hand side of the direct sum, and i2 to inject the second argument into the right hand side.)

So we can now use the QuickCheck property:

> quickCheck (prop_BilinearQn dot1)

+++ OK, passed 100 tests.

Another example of a bilinear function is polynomial multiplication. Polynomials of course form a vector space, with basis {x^i | i <- [0..] }. So we could define a type to represent the monomials x^i, and then form the polynomials as the free vector space in the monomials. In a few weeks we will do that, but for the moment, to save time, let's just use our existing EBasis type, and take E i to represent x^i. Then polynomial multiplication is the following function:

polymult1 uv = nf $ V [(E (i+j) , x*y) | (E i,x) <- u, (E j,y) <- v]
    where V u = p1 uv
          V v = p2 uv

Let's just convince ourselves that this is polynomial multiplication:

> polymult1 (i1 (e 0 <+> e 1) <+> i2 (e 0 <+> e 1))

e0+2e1+e2

So this is just our way of saying that (1+x)*(1+x) = 1+2x+x^2.

Again, let's verify that this is bilinear:

> quickCheck (prop_BilinearQn polymult1)

+++ OK, passed 100 tests.

So what's all this got to do with Vect k (a,b)? Well, here's another bilinear function:

tensor :: (Num k, Ord a, Ord b) => Vect k (Either a b) -> Vect k (a, b)
tensor uv = nf $ V [( (a,b), x*y) | (a,x) <- u, (b,y) <- v]
    where V u = p1 uv; V v = p2 uv

> quickCheck (prop_BilinearQn tensor)
+++ OK, passed 100 tests.

So this "tensor" function takes each pair of basis elements a, b in the input to a basis element (a,b) in the output. The thing that is interesting about this bilinear function is that it is in some sense "the mother of all bilinear functions". Specifically, you can specify a bilinear function completely by specifying what happens to each pair (a,b) of basis elements. It follows that any bilinear function f :: Vect k (Either a b) -> Vect k t can be factored as f = f' . tensor, where f' :: Vect k (a,b) -> Vect k t is the linear function having the required action on the basis elements (a,b) of Vect k (a,b).

For example:

bilinear :: (Num k, Ord a, Ord b, Ord c) =>
    ((a, b) -> Vect k c) -> Vect k (Either a b) -> Vect k c
bilinear f = linear f . tensor

dot = bilinear (\(a,b) -> if a == b then return () else zero)

polymult = bilinear (\(E i, E j) -> return (E (i+j)))

We can check that these are indeed the same functions as we were looking at before:

> quickCheck (\x -> dot1 x == dot x)

+++ OK, passed 100 tests.

> quickCheck (\x -> polymult1 x == polymult x)

+++ OK, passed 100 tests.

So Vect k (a,b) has a special role in the theory of bilinear functions. If A = Vect k a, B = Vect k b, then we write A⊗B = Vect k (a,b) (pronounced "A tensor B").

[By the way, it's possible that this diagram might upset category theorists - because the arrows in the diagram are not all arrows in the category of vector spaces. Specifically, note that bilinear maps are not, in general, linear. We'll come back to this in a moment.]

So a bilinear map can be specified by its action on the tensor basis (a,b). This corresponds to writing out matrices. To specify any bilinear map Vect k (Either a b) -> Vect k t, you write out a matrix with rows indexed by a, columns indexed by b, and entries in Vect k t.

      b1  b2 ...
 a1 (t11 t12 ...)
 a2 (t21 t22 ...)
... (...        )

So this says that (ai,bj) is taken to tij. Then given an element of A⊕B = Vect k (Either a b), which we can think of as a vector (x1 a1 + x2 a2 + ...) in A = Vect k a together with a vector (y1 b1 + y2 b2 + ...) in B = Vect k b, then we can calculate its image under the bilinear map by doing matrix multiplication as follows:

 a1 a2 ...        b1  b2 ...
(x1 x2 ...)  a1 (t11 t12 ...)  b1 (y1)
             a2 (t21 t22 ...)  b2 (y2)
            ... (...        ) ... (...)

(Sorry, this diagram might be a bit confusing. The ai, bj are labeling the rows and columns. The xi are the entries in a row vector in A, the yj are the entries in a column vector in B, and the tij are the entries in the matrix.)

So xi ai <+> yj bj goes to xi yj tij.

For example, dot product corresponds to the matrix:

(1 0 0)
(0 1 0)
(0 0 1)

Polynomial multiplication corresponds to the matrix:

    e0 e1 e2 ...
e0 (e0 e1 e2 ...)
e1 (e1 e2 e3 ...)
e2 (e2 e3 e4 ...)
...

A matrix with entries in T = Vect k t is just a convenient way of specifying a linear map from A⊗B = Vect k (a,b) to T.

Indeed, any matrix, provided that all the entries are in the same T, defines a bilinear function. So bilinear functions are ten-a-penny.

Now, I stated above that bilinear functions are not in general linear. For example:

> quickCheck (prop_Linear polymult)

*** Failed! Falsifiable (after 2 tests and 2 shrinks): 

(0,Right e1,Left e1)

What went wrong? Well:

> polymult (Right e1)

0

> polymult (Left e1)

0

> polymult (Left e1 <+> Right e1)

e2

So we fail to have f (a <+> b) = f a <+> f b, which is one of the requirements of a linear function.

Conversely, it's also important to realise that linear functions (on Vect k (Either a b)) are not in general bilinear. For example:

> quickCheck (prop_BilinearQn id)

*** Failed! Falsifiable (after 2 tests): 

(1,0,0,e1,0)

The problem here is:

> id $ i1 (zero <+> zero) <+> i2 e1

Right e1

> id $ (i1 zero <+> i2 e1) <+> (i1 zero <+> i2 e1)

2Right e1

So we fail to have linearity in the left hand side (or the right for that matter).

Indeed we can kind of see that linearity and bilinearity are in conflict.
- Linearity requires that f (a1 <+> a2 <+> b) = f a1 <+> f a2 <+> f b
- Bilinearity requires that f (a1 <+> a2 <+> b) = f (a1 <+> b) <+> f (a2 <+> b)

Exercise: Find a function which is both linear and bilinear.

Products of lists and vector spaces

2011-02-01T21:47:00.000+00:00

Last time, we looked at coproducts - of sets/types, of lists, and of free vector spaces. I realised afterwards that there were a couple more things I should have said, but forgot.

Recall that the coproduct of A and B is an object A+B, together with injections i1: A -> A+B, i2: B-> A+B, with the property that whenever we have arrows f: A -> T, g: B -> T, they can be factored through A+B to give an arrow f+g, satisfying f+g . i1 = f, f+g . i2 = g.

Firstly then, I forgot to say why we called the coproduct A+B with a plus sign. Well, it's because, via the injections i1 and i2, it contains (a copy of) A and (a copy of) B. So it's a bit like a sum of A and B.

Second, I forgot to say that in the case of vector spaces, the coproduct is called the direct sum, and has its own special symbol A⊕B.

Okay, so this time I want to look at products.

Suppose we have objects A and B in some category. Then their product (if it exists) is an object A×B, together with projections p1: A×B -> A, p2: A×B -> B, with the following universal property: whenever we have arrows f: S -> A and g: S -> B, then they can be factored through A×B to give an arrow f×g: S -> A×B, such that f = p1 . f×g, g = p2 . f×g.

(The definitions of product and coproduct are dual to one another - the diagrams are the same but with the directions of the arrows reversed.)

In the category Set, the product of sets A and B is their Cartesian product A×B. In the category Hask, of course, the product of types a and b is written (a,b), p1 is called fst, and p2 is called snd.

We can then define the required product map as:
(f .*. g) x = (f x, g x)

Then it should be clear that fst . (f .*. g) = f, and snd . (f .*. g) = g, as required.

Okay, so what do products look like in the category of lists (free monoids)? (Recall that in this category, the arrows are required to be monoid homomorphisms, meaning that f [] = [] and f (xs++ys) = f xs ++ f ys. It follows that we can express f = concatMap f', for some f'.)

Well, the obvious thing to try as the product is the Cartesian product ([a],[b]). Is the Cartesian product of two monoids a monoid? Well yes it is actually. We could give it a monoid structure as follows:

(as1, bs1) ++ (as2, bs2) = (as1++as2, bs1++bs2)

[] = ([],[])

This isn't valid Haskell code of course. It's just my shorthand way of expressing the following code from Data.Monoid:

instance Monoid [a] where
        mempty  = []
        mappend = (++)

instance (Monoid a, Monoid b) => Monoid (a,b) where
        mempty = (mempty, mempty)
        (a1,b1) `mappend` (a2,b2) =
                (a1 `mappend` a2, b1 `mappend` b2)

From these two instances, it follows that ([a],[b]) is a monoid, with monoid operations equivalent to those I gave above. (In particular, it's clear that the construction satisfies the monoid laws: associativity of ++, identity of [].)

But it feels like there's something unsatisfactory about this. Wouldn't it be better for the product of list types [a] and [b] to be another list type [x], for some type x?

Our first thought might be to try [(a.b)]. The product map would then need to be something like \ss -> zip (f ss) (g ss). However, we quickly see that this won't work: what if f ss and g ss are not the same length.

What else might work? Well, if you think of ([a],[b]) as some as on the left and some bs on the right, then the answer should spring to mind. Let's try [Either a b]. We can then define:

p1 xs = [x | Left x <- xs] -- this is doing a filter and a map at the same time

p2 xs = [x | Right x <- xs]

with the product map

f×g = \ss -> map Left (f ss) ++ map Right (g ss)

Then it is clear that p1 . f×g = f and p2 . f×g = g, as required.

What is the relationship between ([a],[b]) and [Either a b]? Well, ([a],[b]) looks a bit like a subset of [Either a b], via the injection i (as,bs) = map Left as ++ map Right bs. However, this injection is not a monoid homomorphism, since
i ([],[b1] ++ ([a1],[]) /= i ([],[b1]) ++ i ([a1],[])
So ([a],[b]) is not a submonoid of [Either a b].

On the other hand, there is a projection p :: [Either a b] -> ([a],[b]), p xs = (p1 xs, p2 xs). This is a monoid homomorphism, so ([a],[b]) is a quotient of [Either a b].

So which is the right answer? Which of ([a],[b]) and [Either a b] is really the product of [a] and [b]?

Well, it depends. It depends which category we think we're working in. If we're working in the category of monoids, then it is ([a],[b]). However, if we're working in the category of free monoids (lists), then it is [Either a b].

You see, ([a],[b]) is not a free monoid. What does this mean? Well, it basically means it's not a list. But how do we know that ([a],[b]) isn't equivalent to some list? And anyway, what does "free" mean in free monoid?

"Free" is a concept that can be applied to many algebraic theories, not just monoids. There is more than one way to define it.

An algebraic theory defines various constants and operations. In the case of monoids, there is one constant - which we may variously call [] or mempty or 0 or 1 - and one operation - ++ or mappend or + or *. Now, a given monoid may turn out to be generated by some subset of its elements - meaning that every element of the monoid can be equated with some expression in the generators, constants, and operations.

For example, the monoid of natural numbers is generated by the prime numbers: every natural number is equal to some expression in 1, *, and the prime numbers. The monoid [x] is generated by the singleton lists: every element of [x] is equal to some expression in [], ++, and the singleton lists. By a slight abuse of notation, we can say that [x] is generated by x - by identifying the singleton lists with the image of x under \x -> [x].

Then we say that a monoid is free on its generators if there are no relations among its elements other than those implied by the monoid laws. That is, no two expressions in the generators, constants, and operators are equal to one another, unless it is as a consequence of the monoid laws.

For example, suppose it happens that
(x ++ y) ++ z = x ++ (y ++ z)
That's okay, because it follows from the monoid laws. On the other hand, suppose that
x ++ y = y ++ x
This does not follow from the monoid laws (unless x = [] or y = []), so is a non-trivial relation. (Thus the natural numbers under multiplication are not a free monoid - because they're commutative.)

What about our type ([a],[b]) then? Well consider the following relations:

(as,[]) ++ ([],bs) = ([],bs) ++ (as,[])

(as1++as2,bs1) ++ ([],bs2) = (as1,bs1) ++ (as2,bs2) = (as1,[]) ++ (as2,bs1++bs2)

We have commutativity relations between the [a] and [b] parts of the product. Crucially, these relations are not implied by the monoid structure alone. So intuitively, we can see that ([a],[b]) is not free.

The "no relations" definition of free is the algebraic way to think about it. However, there is also a category theory way to define it. The basic idea is that if a monoid is free on its generators, then given any other monoid with the same generators, we can construct it as a homomorphic image of our free monoid, by "adding" the appropriate relations.

In order to express this properly, we're going to need to use some category theory, and specifically the concept of the forgetful functor. Recall that given any algebraic category, such as Mon (monoids), there is a forgetful functor U: Mon -> Set, which consists in simply forgetting the algebraic structure. U takes objects to their underlying sets, and arrows to the underlying functions. In Haskell, U: Mon -> Hask consists in forgetting that our objects (types) are monoids, and forgetting that our arrows (functions) are monoid homomorphisms. (As a consequence, U is syntactically invisible in Haskell. However, to properly understand the definition of free, we have to remember that it's there.)

Then, given an object x (the generators), a free monoid on x is a monoid y, together with a function i: x -> U y, such that whenever we have an object z in Mon and a function f': x -> U z, then we can lift it to a unique arrow f: y -> z, such that f' = Uf . i.

When we say that lists are free monoids, we mean specifically that (the type) [x] is free on (the type) x, via the function i = \x -> [x] (on values). This is free, because given any other monoid z, and function f' :: x -> z, then we can lift to a monoid homomorphism f :: [x] -> z, with f' = f . i. How? Well, the basic idea is to use concatMap. The type of concatMap is:
concatMap :: (a -> [b]) -> [a] -> [b]
So it's doing the lifting we want. However this isn't quite right, because this assumes that the target monoid z is a list. So we need this slight variant:

mconcatmap :: (Monoid z) => (x -> z) -> [x] -> z

mconcatmap f xs = mconcat (map f xs)

If we set f = mconcatmap f', then we will have

(f . i) x

= f (i x)

= f [x]

= mconcatmap f' [x]

= mconcat (map f' [x])

= mconcat [f' x]

= foldr mappend mempty [f' x]  -- definition of mconcat

= mappend mempty (f' x)  -- definition of foldr

= f' x  -- identity of mempty

Now, what would it mean for ([a],[b]) to be free? Well, first, what is it going to be free on? To be free on a and b is the same as being free on Either a b (the disjoint union of a and b). Then our function i is going to be

i (Left a) = ([a],[])

i (Right b) = ([],[b])

Then for ([a],[b]) to be free would mean that whenever we have a function f' :: Either a b -> z, with z a monoid, then we can lift it to a monoid homomorphism f : ([a],[b]) -> z, such that f' = f . i.

So can we?

Well, what if our target monoid z doesn't satisfy the a-b commutativity relations that we saw. That is, what if:
f' a1 `mappend` f' b1 /= f' b1 `mappend` f' a1 -- (A)
That would be a problem.

We are required to find an f such that f' = f . i.
We know that i a1 = ([a1],[]), i b1 = ([],[b1]). So we know that i a1 `mappend` i b1 = i b1 `mappend` i a1.
f is required to be a monoid homomorphism, so by definition:

f (i a1 `mappend` i b1) = f (i a1) `mappend` f (i b1)

f (i b1 `mappend` i a1) = f (i b1) `mappend` f (i a1)

But then since the two left hand sides are equal, then so are the two right hand sides, giving:
f (i a1) `mappend` f (i b1) = f (i b1) `mappend` f (i a1) -- (B)

But now we have a contradiction between (A) and (B), since f' = f . i.

So for a concrete counterexample, showing that ([a],[b]) is not free, all we need is a monoid z in which the a-b commutativity relations don't hold. Well that's easy: [Either a b]. Just take f' :: Either a b -> [Either a b], f' = \x -> [x]. Now try to find an f :: ([a],[b]) -> [Either a b], with f' = f . i.

The obvious f is f (as,bs) = map Left as ++ map Right bs.
But the problem is that this f isn't a monoid homomorphism:
f ( ([],[b1]) `mappend` ([a1],[]) ) /= f ([],[b1]) `mappend` f ([a1],[])

Notice the connection between the two definitions of free. It was because ([a],[b]) had non-trivial relations that we couldn't lift a function to a monoid homomorphism in some cases. The cases where we couldn't were where the target monoid z didn't satisfy the relations.

Okay, so sorry, that got a bit technical. To summarise, the product of [a], [b] in the category of lists / free monoids is [Either a b].

What about vector spaces? What is the product of Vect k a and Vect k b?

Well, similarly to lists, we can make (Vect k a, Vect k b) into a vector space, by defining
0 = (0,0)
(a1,b1) + (a2,b2) = (a1+a2,b1+b2)
k(a,b) = (ka,kb)

Exercise: Show that with these definitions, fst, snd and f .*. g are vector space morphisms (linear maps).

Alternatively, Vect k (Either a b) is of course a vector space. We can define:

p1 = linear p1' where
    p1' (Left a) = return a
    p1' (Right b) = zero

p2 = linear p2' where
    p2' (Left a) = zero
    p2' (Right b) = return b

prodf f g = linear fg' where
    fg' b = fmap Left (f (return b)) <+> fmap Right (g (return b))

In this case p1, p2, f×g are vector space morphisms by definition, since they were constructed using "linear". How do we know that they satisfy the product property? Well, this looks like a job for QuickCheck. The following code builds on the code we developed last time:

prop_Product (f',g',x) =
    f x == (p1 . fg) x &&
    g x == (p2 . fg) x
    where f = linfun f'
          g = linfun g'
          fg = prodf f g

newtype SBasis = S Int deriving (Eq,Ord,Show,Arbitrary)

prop_ProductQn (f,g,x) = prop_Product (f,g,x)
    where types = (f,g,x) :: (LinFun Q SBasis ABasis, LinFun Q SBasis BBasis, Vect Q SBasis)

> quickCheck prop_ProductQn
+++ OK, passed 100 tests.

As we did with lists, we can ask again, which is the correct definition of product, (Vect k a, Vect k b), or Vect k (Either a b)?

Well, in this case it turns out that they are equivalent to one another, via the mutually inverse isomorphisms

\(va,vb) -> fmap Left va <+> fmap Right vb

\v -> (p1 v, p2 v)

Unlike in the list case, these are both vector space morphisms (linear functions).

Why the difference? Why does it work out for vector spaces whereas it didn't for lists? Well, I think it's basically because vector spaces are commutative.

(It is also the case that vector spaces are always free on a basis. So since we have an obvious bijection between the bases of (Vect k a, Vect k b) and Vect k (Either a b), then we must have an isomorphism between the vector spaces.)

Now, we're left with a little puzzle. We have found that both the product and the coproduct of two vector spaces is Vect k (Either a b). So we still haven't figured out what Vect k (a,b) represents.

Coproducts of lists and free vector spaces

2011-01-21T16:35:00.001+00:00

Recently we've been looking at vector spaces. We defined a type Vect k b, representing the free k-vector space over a type b - meaning, the vector space consisting of k-linear combinations of the inhabitants of b - so b is the basis. Like any good mathematical structure, vector spaces admit various new-from-old constructions. Last time I posed the puzzle, what do Vect k (Either a b) and Vect k (a,b) represent? As we're aiming for quantum algebra, I'm going to frame the answers in the language of category theory.

Suppose we have objects A and B in some category. Then their coproduct (if it exists) is an object A+B, together with injections i1: A -> A+B, i2: B -> A+B, with the following universal property: whenever we have arrows f: A -> T and g: B -> T, they can be factored through A+B to give an arrow f+g: A+B -> T, such that f = f+g . i1, g = f+g . i2.

Notice that this definition does not give us a construction for the coproduct. In any given category, it doesn't tell us how to construct the coproduct, or even if there is one. Even if we have a construction for the coproduct in one category, there is no guarantee that it, or something similar, will work in another related category.

In the category Set, the coproduct of sets A and B is their disjoint union. In order to see this, we can work in the category Hask of Haskell types. We can regard Hask as a subcategory of Set, by identifying a type with its set of inhabitants. If a and b are Haskell types / sets of inhabitants, then their disjoint union is Either a b. The elements of Either a b can be from either a or b (hence, from their union), and they are kept disjoint in the left and right parts of the union (so that for example Either a a contains two copies of a, not just one). The injections i1 and i2 are then the value constructors Left and Right. Given f :: a -> t, g :: b -> t, we define:

(f .+. g) (Left a) = f a

(f .+. g) (Right b) = g b

Then it should be clear that (f .+. g) . Left = f, and (f .+. g) . Right = g, as required

In a moment, we'll look at coproducts of vector spaces, but first, as a warmup, let's think about the coproducts in a simpler category: lists / free monoids. Recall that a monoid is an algebraic structure having an associative operation ++, and an identity for ++ called []. (That is [] ++ x = x = x ++ [].)

A monoid homomorphism is a function f :: [a] -> [b] such that f [] = [] and f (a1 ++ a2) = f a1 ++ f a2. With a little thought, you should be able to convince yourself that all monoid homomorphisms are of the form concatMap f', where f' :: a -> [b]. (Which is, incidentally, the same as saying that they are of the form (>>= f').) In the category of free monoids, the arrows are constrained to be monoid homomorphisms.

So for our coproduct, we are looking for an object satisfying the universal property shown in the following diagram:

Perhaps the first thing to try is the disjoint union: Either [a] [b]. This is the coproduct of [a] and [b] as sets, but is it also their coproduct as monoids? Well, let's see.

Hmm, firstly, it's not a list (doh!): so you can't apply ++ to it, and it doesn't have a []. However, before we give up on that, let's consider whether we're asking the right question. Perhaps we should only be requiring that Either [a] [b] is (or can be made to be) a Monoid instance. Is the disjoint union of two monoids a monoid? Suppose we try to define ++ for it:
Left a1 ++ Left a2 = Left (a1++a2)
Right b1 ++ Right b2 = Right (b1++b2)
But now we begin to see the problem. What are we going to do for Left as ++ Right bs? There's nothing sensible we can do, because our disjoint union Either [a] [b] does not allow mixed lists of as and bs.

However, this immediately suggests that we would be better off looking at [Either a b] - the free monoid over the disjoint union of a and b. This is a list - and it does allow us to form mixed lists of as and bs.

We can then set i1 = map Left, i2 = map Right, and these are list homomorphisms (they interact with [] and ++ in the required way). Then we can define:

h = concatMap h' where
    h' (Left a) = f' a
    h' (Right b) = g' b

So our suspicion is that [Either a b] is the coproduct, with h the required coproduct map.

Let's just check the coproduct conditions:
h . i1
= concatMap h' . map Left
= concatMap f'
= f
and similarly, h . i2 = g, as required.

Notice that the disjoint union Either [a] [b] is (isomorphic to) a subset of the free monoid [Either a b], via Left as -> map Left as; Right bs -> map Right bs. So we were thinking along the right lines in suggesting Either [a] [b]. The problem was that Either [a] [b] isn't a monoid, it's only a set. We can regard [Either a b] as the closure of Either [a] [b] under the monoid operations. [Either a b] is the smallest free monoid containing the disjoint union Either [a] [b] (modulo isomorphism of Haskell types).

(This is a bit hand-wavy. This idea of closure under algebraic operations makes sense in maths / set theory, but I'm not quite sure how best to express it in Haskell / type theory. If anyone has any suggestions, I'd be pleased to hear them.)

Okay, so what about a coproduct in the category of k-vector spaces. First, recall that the arrows in this category are linear maps f satisfying f (a+b) = f a + f b, f (k*a) = k * f a. Again, it should be obvious that a linear map is fully determined by its action on basis elements - so every linear map f :: Vect k a -> Vect k b can be expressed as linear f' where f' :: a -> Vect k b.

Recall that we defined linear f' last time - it's really just (>>= f'), but followed by reduction to normal form:
linear f v = nf $ v >>= f

Okay, so vector spaces of course have underlying sets, so we will expect the coproduct of Vect k a and Vect k b to contain the disjoint union Either (Vect k a) (Vect k b). As with lists though, we will have the problem that this is not closed under vector addition - we can't add an element of Vect k a to an element of Vect k b within this type.

So as before, let's try Vect k (Either a b). Then we can set i1 = fmap Left, i2 = fmap Right, and they are both linear maps by construction. (We don't need to call nf afterwards, since Left and Right are order-preserving.)

i1 = fmap Left

i2 = fmap Right

Then we can define the coproduct map (f+g) as follows:

coprodf f g = linear fg' where
    fg' (Left a) = f (return a)
    fg' (Right b) = g (return b)

We need to verify that this satisfies the coproduct conditions: f+g . i1 = f and f+g . i2 = g. It would be nice to test this using a QuickCheck property. In order to do that, we need a way to construct arbitrary linear maps. (This is not the same thing as arbitrary functions Vect k a -> Vect k b, so I don't think that I can use QuickCheck's Coarbitrary class - but the experts may know better.) Luckily, that is fairly straightforward: we can construct arbitrary lists [(a, Vect k b)], and then each pair (a,vb) can be interpreted as saying that the basis element a is taken to the vector vb.

type LinFun k a b = [(a, Vect k b)]

linfun :: (Eq a, Ord b, Num k) => LinFun k a b -> Vect k a -> Vect k b
linfun avbs = linear f where
    f a = case lookup a avbs of
          Just vb -> vb
          Nothing -> zero

With that preparation, here is a QuickCheck property that expresses the coproduct condition.

prop_Coproduct (f',g',a,b) =
    f a == (fg . i1) a &&
    g b == (fg . i2) b
    where f = linfun f'
          g = linfun g'
          fg = coprodf f g

That property can be used for any vector spaces. Let's define some particular vector spaces to do the test on.

newtype ABasis = A Int deriving (Eq,Ord,Show,Arbitrary) -- GeneralizedNewtypeDeriving
newtype BBasis = B Int deriving (Eq,Ord,Show,Arbitrary)
newtype TBasis = T Int deriving (Eq,Ord,Show,Arbitrary)

instance (Num k, Ord b, Arbitrary k, Arbitrary b) => Arbitrary (Vect k b) where
    arbitrary = do ts <- arbitrary :: Gen [(b, k)] -- ScopedTypeVariables
                   return $ nf $ V ts

(I should emphasize that not all vector space bases are newtypes around Int - we can have finite bases, or bases with other interesting internal structure, as we will see in later installments. For the purposes of this test however, I think this is sufficient.)

prop_CoproductQn (f,g,a,b) = prop_Coproduct (f,g,a,b)
    where types = (f,g,a,b) :: (LinFun Q ABasis TBasis, LinFun Q BBasis TBasis, Vect Q ABasis, Vect Q BBasis)

> quickCheck prop_CoproductQn
+++ OK, passed 100 tests.

So we do indeed have a coproduct on vector spaces. To summarise: The coproduct of free vector spaces is the free vector space on the coproduct (of the bases).

Next time, we'll look at products - where there might be a small surprise.

The free vector space on a type, part 2

2011-01-10T21:26:00.000+00:00

Last time, I defined the free k-vector space over a type b:

data Vect k b = V [(b,k)]
Elements of Vect k b represent formal sums of scalar multiples of elements of b, where the scalars are taken from the field k. (For example, V [(E1,5),(E3,2)] represents the formal sum 5E1+2E3.) Thus b is the basis for the vector space.

We saw that there is a functor (in the mathematical sense) from the category Hask (of Haskell types and functions) to the category k-Vect (of k-vector spaces and k-linear maps). In maths, we would usually represent a functor by a capital letter, eg F, and apply it to objects and arrows by prefixing. For example, if a, b are objects in the source category, then the image objects in the target category would be called F a and F b. If f :: a -> b is an arrow in the source category, then the image arrow in the target category would be called F f.

Haskell allows us to declare a type constructor as a Functor instance, and give it an implementation of fmap. This corresponds to describing a functor (in the mathematical sense), but with a different naming convention. In our case, we declared the type constructor (Vect k) as a Functor instance. So the functor's action on objects is called (Vect k) - given any object b in Hask (ie a Haskell type), we can apply (Vect k) to get an object Vect k b in k-Vect (ie a k-vector space). However, the functor's action on arrows is called fmap - given any arrow f :: a -> b in Hask (ie a Haskell function), we can apply fmap to get an arrow fmap f :: Vect k a -> Vect k b in k-Vect (ie a k-linear map).

Haskell allows us to declare only type constructors as functors. In maths, there are many functors which are not of this form. For example, the simplest is the forgetful functor. Given any algebraic category A, we have the forgetful functor A -> Set, which simply forgets the algebraic structure. The forgetful functor takes the objects of A to their underlying sets, and the arrows of A to the underlying functions.

For example, in our case, the forgetful functor k-Vect -> Hask consists in forgetting that the objects Vect k b are vector spaces (with addition, scalar multiplication etc defined), and considering them just as Haskell types; and forgetting that the arrows Vect k a -> Vect k b are linear maps, and considering them just as Haskell functions.

(Notice that when working in Haskell, the category Hask acts as a kind of stand-in for the category Set. If we identify a type with its inhabitants (which form a set), then Hask is something like the computable subcategory of Set.)

So we actually have functors going both ways. We have the free functor F: Hask -> k-Vect - which takes types and functions to free k-vector spaces and linear maps. And we have the forgetful functor G: k-Vect -> Hask - which takes free k-vector spaces and linear maps, and just forgets their algebraic structure, so that they're just types and functions again.

Note that these two functors are not mutual inverses. (G . F) is not the identity on Hask - indeed it takes b to Vect k b, both considered as objects in Hask (and similarly with arrows). The two functors are however adjoint. (I'm not going to explain what this means, but see Wikipedia, or most books on category theory.)

Whenever we have an adjunction, then in fact we also have a monad. Here's the definition for our case:

instance Num k => Monad (Vect k) where
    return a = V [(a,1)]
    V ts >>= f = V $ concat [ [(b,y*x) | let V us = f a, (b,y) <- us] | (a,x) <- ts]

This monad is most easily understood using the monad as container analogy. A free k-vector space over b is just a container of elements of b. Okay, so it's a slightly funny sort of container. It most resembles a multiset or bag - an unordered container in which you can have more than one of each element. However, free k-vector spaces go further. A free Q-vector space is a container in which we're allowed to have fractional or negative amounts of each basis element, such as 1/2 e1 - 3 e2. In a free C-vector space, we're even allowed imaginary amounts of each element, such as i e3.

These oddities aside though, free k-vector spaces are monads in much the same way as any other container. For example, let's compare them to the list monad.
- For a container monad, return means "put into the container". For List, return a is [a]. For k-Vect, return a is 1 a (where 1 is the scalar 1 in our field k).
- For container monads, it's most natural to look next at the join operation, which combines a container of containers into a single container. For List, join = concat. For k-Vect, join combines a linear combination of linear combinations into a single linear combination (in the obvious way).
- Finally there is bind or (>>=). bind can be defined in terms of join and fmap:
x >>= f = join ((fmap f) x)
For lists, bind is basically concatMap. For k-Vect, bind corresponds to extending a function on basis elements to a function on vectors "by linearity". That is, if f :: a -> Vect k b, then (>>= f) :: Vect k a -> Vect k b is defined (in effect, by structural induction) so as to be linear, by saying that (>>= f) 0 = 0, (>>= f) (k a) = k (f a), (>>= f) (a + b) = f a + f b.

So k-Vect is like just a strange sort of list. Think of them as distant cousins. Incidentally, this is not only because they're both containers (if you believed my story about k-Vect being a container). There's another reason: Both the List and k-Vect monads arise from free-forgetful adjunctions. In the case of lists, the list datatype is the free monoid, and the list monad arises from the free-forgetful adjunction for monoids.

That's most of what I wanted to say for now. However, there's one small detail to add. Last time we defined a normal form for elements of Vect k b, in which the basis elements are in order, without repeats, and none of the scalars are zero. In order to calculate this normal form, we require an Ord instance for b. Unfortunately, Haskell doesn't let us specify that when defining the Monad instance for (Vect k). So whenever we use (>>=), we should call nf afterwards, to put the result in normal form.

For this reason, we define a convenience function:

linear :: (Ord b, Num k) => (a -> Vect k b) -> Vect k a -> Vect k b

linear f v = nf $ v >>= f

Given f :: a -> Vect k b, linear f :: Vect k a -> Vect k b is the extension of f from basis elements to vectors "by linearity". Hence, linear f is guaranteed to be linear, by construction. We can confirm this on an example using the QuickCheck properties we defined last time.

> let f (E 1) = e1; f (E 2) = e1 <+> e2; f _ = zero

> (linear f) (e1 <+> e2)

2e1+e2

> quickCheck (prop_LinearQn (linear f))

+++ OK, passed 100 tests.

Acknowledgement: I'm partially retreading in Dan Piponi's steps here. He first described the free vector space monad here. When I come to discuss the connection between quantum algebra and knot theory, I'll be revisiting some more material that Dan sketched out here.

Exercise for next time: What does Vect k (Either a b) represent? What does Vect k (a,b) represent?

The free vector space on a type, part 1

2010-12-13T19:53:00.000+00:00

As I mentioned last time, I want to spend the next few posts talking about quantum algebra. Well, we've got to start somewhere, so let's start with vector spaces.

You probably know what a vector space is. It's what is sounds like: a space of vectors, that you can add together, or multiply by scalars (that is, real numbers, or more generally, elements of some field k). Here's the official definition:
An additive (or Abelian) group is a set with a binary operation called addition, such that
- addition is associative: x+(y+z) = (x+y)+z
- addition is commutative: x+y = y+x
- there is an additive identity: x+0 = x = 0+x
- there are additive inverses: x+(-x) = 0 = (-x)+x
A vector space over a field k is an additive group V, together with an operation k * V -> V called scalar multiplication, such that:
- scalar multiplication distributes over vector addition: a(x+y) = ax+ay
- scalar multiplication distributes over scalar addition: (a+b)x = ax+bx
- associativity: (ab)x = a(bx)
- unit: 1a = a

There are some obvious examples:
- R^2 is a 2-dimensional vector space over the reals R, R^3 is a 3-dimensional vector space. (I'm not going to define dimension quite yet, but hopefully it's intuitively obvious what it means.)
- R^n is an R-vector space for any n.
- Indeed, k^n is a k-vector space for any field k.

Some slightly more interesting examples:
- C is a 2-dimensional vector space over R. (The reason it's more interesting is that of course C possesses additional algebraic structure, beyond the vector space structure.)
- 2*2 matrices over k form a 4-dimensional k-vector space.
- Polynomials in X with coefficients in k form an (infinite dimensional) k-vector space.

If we wanted to code the above definition into Haskell, probably the first idea that would come to mind would be to use type classes:

class AddGrp a where
    add :: a -> a -> a
    zero :: a
    neg :: a -> a -- additive inverse

class (Field k, AddGrp v) => VecSp k v where
    smult :: k -> v -> v -- scalar multiplication

(Type classes similar to these are defined are in the vector-space package.)

For most vector spaces that one encounters in "real life", there is some set of elements, usually obvious, which form a "basis" for the vector space, meaning that all elements can be expressed as linear combinations of basis elements. For example, in R^3, the obvious basis is {(0,0,1), (0,1,0), (1,0,0)}. Any element (x,y,z) of R^3 can be expressed as the linear combination x(1,0,0)+y(0,1,0)+z(0,0,1).

(Mathematicians would want to stress that there are other bases for R^3 that would serve equally well, and indeed, that a significant part of the theory of vector spaces can go through without even talking about bases. However, for our purposes - we want to write code to calculate in vector spaces - then working with a basis is natural.)

Okay, so we want a way to build a vector space from a basis. (More specifically, a k-vector space, for some given field k.) What sorts of things shall we allow as our basis? Well, why not just allow any type, whatsoever:

module Math.Algebras.VectorSpace where



import qualified Data.List as L



data Vect k b = V [(b,k)] deriving (Eq,Ord)

This says that a k-vector space over basis b consists of a linear combination of elements of b. (So the [(b,k)] is to be thought of as a sum, with each (b,k) pair representing a basis element in b multiplied by a scalar coefficient in k.)

For example, we can define the "boring basis" type, which just consists of numbered basis elements:

newtype EBasis = E Int deriving (Eq,Ord)



instance Show EBasis where show (E i) = "e" ++ show i



e i = return (E i) -- don't worry about what "return" is doing here for the moment

e1 = e 1

e2 = e 2

e3 = e 3

Then a typical element of Vect Double EBasis is:

> :load Math.Algebras.VectorSpace

> V [(E 1, 0.5), (E 3, 0.7)]

0.5e1+0.7e3

So of course, the Show instances for EBasis (see above), and Vect k b (not shown) are coming into play here.

How do we know that this is a vector space? Well actually, it's not yet, because we haven't defined the addition and scalar multiplication operations on it. So, without further ado:

infixr 7 *>
infixl 6 <+>

-- |The zero vector
zero :: Vect k b
zero = V []

-- |Addition of vectors
add :: (Ord b, Num k) => Vect k b -> Vect k b -> Vect k b
add (V ts) (V us) = V $ addmerge ts us

-- |Addition of vectors (same as add)
(<+>) :: (Ord b, Num k) => Vect k b -> Vect k b -> Vect k b
(<+>) = add

addmerge ((a,x):ts) ((b,y):us) =
    case compare a b of
    LT -> (a,x) : addmerge ts ((b,y):us)
    EQ -> if x+y == 0 then addmerge ts us else (a,x+y) : addmerge ts us
    GT -> (b,y) : addmerge ((a,x):ts) us
addmerge ts [] = ts
addmerge [] us = us

-- |Negation of vector
neg :: (Num k) => Vect k b -> Vect k b
neg (V ts) = V $ map (\(b,x) -> (b,-x)) ts

-- |Scalar multiplication (on the left)
smultL :: (Num k) => k -> Vect k b -> Vect k b
smultL 0 _ = zero -- V []
smultL k (V ts) = V [(ei,k*xi) | (ei,xi) <- ts]

-- |Same as smultL. Mnemonic is "multiply through (from the left)"
(*>) :: (Num k) => k -> Vect k b -> Vect k b
(*>) = smultL

A few things to mention:
- First, note that we required a Num instance for k. Strictly speaking, as we stated that k is a field, then we should have required a Fractional instance. However, on occasion we are going to break the rules slightly.
- Second, note that for addition, we required an Ord instance for b. We could have defined addition using (++) to concatenate linear combinations - however, the problem with that is that it wouldn't then easily follow that e1+e3 = e3+e1, or that e1+e1 = 2e1. By requiring an Ord instance, we can guarantee that there is a unique normal form in which to express any vector - namely, list the basis elements in order, combine duplicates, remove zero coefficients.
- Finally, note that I didn't define Vect k b as an instance of a vector space type class. That's just because I didn't yet see a reason to.

It will turn out to be useful to have a function that can take an arbitrary element of Vect k b and return a vector in normal form:

-- |Convert an element of Vect k b into normal form. Normal form consists in having the basis elements in ascending order,
-- with no duplicates, and all coefficients non-zero
nf :: (Ord b, Num k) => Vect k b -> Vect k b
nf (V ts) = V $ nf' $ L.sortBy compareFst ts where
    nf' ((b1,x1):(b2,x2):ts) =
        case compare b1 b2 of
        LT -> if x1 == 0 then nf' ((b2,x2):ts) else (b1,x1) : nf' ((b2,x2):ts)
        EQ -> if x1+x2 == 0 then nf' ts else nf' ((b1,x1+x2):ts)
        GT -> error "nf': not pre-sorted"
    nf' [(b,x)] = if x == 0 then [] else [(b,x)]
    nf' [] = []
    compareFst (b1,x1) (b2,x2) = compare b1 b2

Okay, so we ought to check that the Vect k b is a vector space. Let's write some QuickCheck properties:

prop_AddGrp (x,y,z) =
    x <+> (y <+> z) == (x <+> y) <+> z && -- associativity
    x <+> y == y <+> x                 && -- commutativity
    x <+> zero == x                    && -- identity
    x <+> neg x == zero                   -- inverse

prop_VecSp (a,b,x,y,z) =
    prop_AddGrp (x,y,z) &&
    a *> (x <+> y) == a *> x <+> a *> y && -- distributivity through vectors
    (a+b) *> x == a *> x <+> b *> x     && -- distributivity through scalars
    (a*b) *> x == a *> (b *> x)         && -- associativity
    1 *> x == x                            -- unit

instance Arbitrary EBasis where
    arbitrary = do n <- arbitrary :: Gen Int
                   return (E n)

instance Arbitrary Q where
    arbitrary = do n <- arbitrary :: Gen Integer
                   d <- arbitrary :: Gen Integer
                   return (if d == 0 then fromInteger n else fromInteger n / fromInteger d)

instance Arbitrary (Vect Q EBasis) where
    arbitrary = do ts <- arbitrary :: Gen [(EBasis, Q)]
                   return $ nf $ V ts

prop_VecSpQn (a,b,x,y,z) = prop_VecSp (a,b,x,y,z)
    where types = (a,b,x,y,z) :: (Q, Q, Vect Q EBasis, Vect Q EBasis, Vect Q EBasis)

> quickCheck prop_VecSpQn
+++ OK, passed 100 tests.

(I'm using Q instead of R as my field in order to avoid false negatives caused by the fact that arithmetic in Double is not exact.)

So it looks like Vect k b is indeed a vector space. In category theory, it is called the free k-vector space over b. "Free" here means that there are no relations among the basis elements: it will never turn out, for example, that e1 = e2+e3.

Vector spaces of course form a category, specifically an algebraic category (there are other types, as we'll see in due course). The objects in the category are the vector spaces. The arrows or morphisms in the category are the functions between vector spaces which "commute" with the algebraic structure. Specifically, they are the functions f such that:
- f(x+y) = f(x)+f(y)
- f(0) = 0
- f(-x) = -f(x)
- f(a.x) = a.f(x)

Such a function is called linear, the idea being that it preserves lines. This is because it follows from the the conditions that f(a.x+b.y) = a.f(x)+b.f(y) .

We can write a QuickCheck property to check whether a given function is linear:

prop_Linear f (a,x,y) =
    f (x <+> y) == f x <+> f y &&
    f zero == zero             &&
    f (neg x) == neg (f x)     &&
    f (a *> x) == a *> f x

prop_LinearQn f (a,x,y) = prop_Linear f (a,x,y)
    where types = (a,x,y) :: (Q, Vect Q EBasis, Vect Q EBasis)

For example:

> quickCheck (prop_LinearQn (2 *>))

+++ OK, passed 100 tests.

We won't need to use this quite yet, but it's handy to have around.

Now, in category theory we have the concept of a functor, which is a map from one category to another, which commutes with the category structure. Specifically, a functor F consists of a map from the objects of one category to the objects of the other, and from the arrows of one category to the arrows of the other, satisfying:
- F(id_A) = id_F(A)
- F(f . g) = F(f) . F(g) (where dot denotes function composition)

How does this relate to the Functor type class in Haskell? Well, the Haskell type class enables us to declare that a /type constructor/ is a functor. For example, (Vect k) is a type constructor, which acts on a type b to construct another type Vect k b. (Vect k) is indeed a functor, witness the following declaration:

instance Functor (Vect k) where
    fmap f (V ts) = V [(f b, x) | (b,x) <- ts]

This says that if we have a function f on our basis elements, then we can lift it to a function on linear combinations of basis elements in the obvious way.

In mathematics, we would think of the free vector space construction as a functor from Set (the category of sets) to k-Vect (the category of k-vector spaces). In Haskell, we need to think of the (Vect k) construction slightly differently. It operates on types, rather than sets, so the source category is Hask, the category of Haskell types.

What is the relationship between Hask and Set? Well, if we identify a type with the set of values which inhabit it, then we can regard Hask as a subcategory of Set, consisting of those sets and functions which can be represented in Haskell. (That would imply for example that we are restricted to computable functions.)

So (Vect k) is a functor from Hask to the subcategory of k-Vect consisting of vector spaces over sets/types in Hask.

So just to spell it out, the (Vect k) functor:
- Takes an object b in Hask/Set - ie a type, or its set of inhabitants - to an object Vect k b in k-Vect
- Takes an arrow f in Hask (ie a function f :: a -> b), to an arrow (fmap f) :: Vect k a -> Vect k b in k-Vect.

Now, there's just one small fly in the ointment. In order to get equality of vectors to work out right, we wanted to insist that they were expressed in normal form, which meant we needed an Ord instance for b. However, in the Functor instance for (Vect k), Haskell doesn't let us express this constraint, and our fmap is unable to use the Ord instance for b. What this means is that fmap f might return a vector which is not in normal form - so we need to remember to call nf afterwards. For example:

newtype FBasis = F Int deriving (Eq,Ord)



instance Show FBasis where show (F i) = "f" ++ show i



> let f = \(E i) -> F (10 - div i 2)

> let f' = fmap f :: Vect Q EBasis -> Vect Q FBasis

> f' (e1 <+> 2 *> e2 <+> e3)

f10+2f9+f9

> let f'' = nf . fmap f :: Vect Q EBasis -> Vect Q FBasis

> f'' (e1 <+> 2 *> e2 <+> e3)

3f9+f10

So it might be fairer to say that it is the combination of nf and fmap that forms the functor on arrows.

The definition of a functor requires that the target arrow is an arrow in the target category. In this case, the requirement is that it is a linear function, rather than just any function between vector spaces. So let's just check:

> quickCheck (prop_LinearQn f'')

+++ OK, passed 100 tests.

That's enough for now - more next time.

New modules - Quantum Algebra

2010-11-07T18:16:00.000+00:00

I've put up a new version of HaskellForMaths on Hackage, v0.3.1. It's quite a significant update, with more than a dozen new modules, plus improved documentation of several existing modules. I wrote the new modules in the course of reading Kassel's Quantum Groups. The modules are about algebras, coalgebras, bialgebras, Hopf algebras, tensor categories and quantum algebra.

The new modules fall into two groups:

Math.Algebras.* - Modules about algebras (and co-, bi- and Hopf algebras) in general
Math.QuantumAlgebra.* - Modules specifically about quantum algebra

In (slightly) more detail, here are the modules:

Math.Algebras.VectorSpace - defines a type for the free k-vector space over a basis set b
Math.Algebras.TensorProduct - defines tensor product of two vector spaces
Math.Algebras.Structures - defines a number of additional algebraic structures that can be given to vector spaces: algebra, coalgebra, bialgebra, Hopf algebra, module, comodule
Math.Algebras.Quaternions - a simple example of an algebra
Math.Algebras.Matrix - the 2*2 matrices - another simple example of an algebra
Math.Algebras.Commutative - commutative polynomials (such as x^2+3yz) - another algebra
Math.Algebras.NonCommutative - non-commutative polynomials (where xy /= yx) - another algebra
Math.Algebras.GroupAlgebra - a key example of a Hopf algebra
Math.Algebras.AffinePlane - the affine plane and its symmetries - more Hopf algebras, preparing for the quantum plane
Math.Algebras.TensorAlgebra
Math.Algebras.LaurentPoly - we use Laurent polynomials in q (that is, polynomials in q and q^-1) as our quantum scalars

Math.QuantumAlgebra.QuantumPlane - the quantum plane and its symmetries, as examples of non-commutative, non-cocommutative Hopf algebras
Math.QuantumAlgebra.TensorCategory
Math.QuantumAlgebra.Tangle - The tangle category (which includes knots, links and braids as subcategories), and some representations (from which we derive knot invariants)
Math.QuantumAlgebra.OrientedTangle

The following diagram is something like a dependency diagram, with "above" meaning "depends on".

Each layer depends on the layers beneath it. There are also a few dependencies within layers, that I've hinted at by proximity.

Some of these modules overlap somewhat in content with other modules that already exist in HaskellForMaths. In particular, there are already modules for commutative algebra, non-commutative algebra, and knot theory. For the moment, those existing modules still offer some features not offered by the new modules (for example, calculation of Groebner bases).

For the next little while in this blog, I want to start going through these new modules, and investigating quantum algebra. I should emphasize that this is still work in progress. For example, I'm intending to add modules for quantum enveloping algebras - but I have some reading to do first.

(Oh, and I know that I did previously promise to look at finite simple groups, and Coxeter groups, in this blog. I'll probably still come back to those at some point.)

Word length in the Symmetric group

2010-10-14T20:03:00.000+01:00

Previously on this blog, we saw how to think about groups abstractly via group presentations, where a group is given as a set of generators satisfying specified relations. Last time, we saw that questions about the length of reduced words in such a presentation can be visualised as questions about the length of paths in the Cayley graph of the group (relative to the generators).

This time, I want to focus on just one family of groups - the symmetric groups Sn, as generated by the adjacent transpositions {si = (i i+1)}. Here's the Haskell code defining this presentation of Sn:

newtype SGen = S Int deriving (Eq,Ord)

instance Show SGen where
    show (S i) = "s" ++ show i

_S n = (gs, r ++ s ++ t) where
    gs = map S [1..n-1]
    r = [([S i, S i],[]) | i <- [1..n-1]]
    s = [(concat $ replicate 3 [S i, S (i+1)],[]) | i <- [1..n-2]]
    t = [([S i, S j, S i, S j],[]) | i <- [1..n-1], j <- [i+2..n-1]]

The three sets of relations say: each generator si squares to the identity; if i, j are not adjacent, then si and sj commute; if i, j are adjacent, then (si*sj)^3 is the identity.

Here is the Cayley graph for S4 under this presentation:

The vertices are labeled with the reduced words in the generators si. How can we find out which permutations these correspond to? Well, that's easy:

fromTranspositions ts = product $ map (\(S i) -> p [[i,i+1]]) ts
For example:

> :load Math.Algebra.Group.CayleyGraph

> fromTranspositions [S 1, S 2, S 1, S 3, S 2, S 1]

[[1,4],[2,3]]

This is the permutation that reverses the list [1..4]

> map (.^ it) [1..4]

[4,3,2,1]

What about the other way round? Suppose we are given a permutation in Sn. How do we find its expression as a product of the transpositions si? Well the answer is (roughly): use bubblesort!

Here's bubblesort in Haskell:

bubblesort [] = []
bubblesort xs = bubblesort' [] xs where
    bubblesort' ls (r1:r2:rs) = if r1 <= r2 then bubblesort' (r1:ls) (r2:rs) else bubblesort' (r2:ls) (r1:rs)
    bubblesort' ls [r] = bubblesort (reverse ls) ++ [r]

So we sweep through the list from front to back, swapping any pairs that we find out of order - and then repeat. At the end of each sweep, we're guaranteed that the last element (in sort order) has reached the end of the list - so for the next sweep, we can leave it at the end and only sweep through the earlier elements. Hence the list we're sweeping through is one element shorter each time, so we're guaranteed to terminate. (We could terminate early, the first time a sweep makes no swaps - but I haven't coded that.)

Just to prove that the code works:

> bubblesort [2,3,1]

[1,2,3]

How does this help for turning a permutation into a sequence of transpositions? Well it's simple - every time we swap two elements, we are performing a transposition - so just record which swaps we perform. So here's a modified version of the above code, which records the swaps:

-- given a permutation of [1..n] (as a list), return the transpositions which led to it
toTrans [] = []
toTrans xs = toTrans' 1 [] [] xs where
    toTrans' i ts ls (r1:r2:rs) =
        if r1 <= r2
        then toTrans' (i+1) ts (r1:ls) (r2:rs)         -- no swap needed
        else toTrans' (i+1) (S i : ts) (r2:ls) (r1:rs) -- swap needed
    toTrans' i ts ls [r] = toTrans (reverse ls) ++ ts

Notice that the ts are returned in reverse to the order that they were used. This is because we are using them to undo the permutation - so we are performing the inverse of the permutation we are trying to find. Since each generator is its own inverse, we can recover the permutation we are after simply by reversing. In the code, we reverse as we go along.

For example:

> toTrans [2,3,1]

[s1,s2]

> toTrans [4,3,2,1]

[s1,s2,s1,s3,s2,s1]

Now, there's only one problem. As you can see, this code takes as input a rearrangement of [1..n]. This is a permutation, yes, but considered passively. Whereas in this blog we have been more accustomed to thinking of permutations actively, as something a bit like a function, which has an action on a graph, or other combinatorial structure, or if you like, just on the set [1..n]. In other words, our Permutation type represents the action itself, not the outcome of the action. (Recall that the implementation uses a Data.Map of (from,to) pairs.)

But of course it's easy to convert from one viewpoint to the other. So here's the code to take a permutation in cycle notation and turn it into a sequence of transpositions:

-- given a permutation action on [1..n], factor it into transpositions
toTranspositions 1 = []
toTranspositions g = toTrans [i .^ (g^-1) | i <- [1..n] ] where
    n = maximum $ supp g

For example:

> toTranspositions $ p [[1,4],[2,3]]

[s1,s2,s1,s3,s2,s1]

Why does the code have [i .^ (g^-1) | i <- [1..n]], rather than [i .^ g | i <- [1..n]]?
Well, suppose i .^ g = j. This says that g moves i to the j position. But we want to know what ends up in the i position. Suppose that j .^ g = i, for some j. Applying g^-1 to both sides, we see that j = i .^ (g^-1).

Okay, so given a permutation, in either form, we can reconstruct it as a reduced word in the generators.

We saw last time that the length of a reduced word is also the length of the shortest path from 1 to the element in the Cayley graph. Distance in the Cayley graph is a metric on the group, so the length of a reduced word tells us "how far" the element is from being the identity.

If it's only this distance that we're interested in, then there is a more direct way to work it out. Given a permutation g of [1..n], then an inversion is a pair (i,j) with i < j but i .^ g > j .^ g. In Haskell:

inversions g = [(i,j) | i <- [1..n], j <- [i+1..n], i < j, i .^ g > j .^ g]
    where n = maximum $ supp g

For example:

> inversions $ fromList [1,4,3,2]

[(2,3),(2,4),(3,4)]

With a little thought, you should be able to convince yourself that the number of inversions is equal to the length of the reduced word for g - because each swap that we perform during bubblesort corrects exactly one inversion.

Okay, so this is all very nice, but what use is it? Well, of course, maths doesn't have to be useful, any more than any other aesthetic pursuit. However, as it happens, in this case it is.

In statistics, the Kendall tau test gives an indicator of the correlation between two measured quantities (for example, the height and weight of the test subjects). Suppose that we are given a list of pairs (eg (height,weight) pairs), and we want to know how strongly correlated the first and second quantities are.

Ok, so what we do is, we rank the first quantities from lowest to highest, and replace each quantity by its rank (a number from 1 to n). We do the same for the second quantities. So we end up with a list of pairs of numbers from 1 to n. Now, we sort the list on the first element, and then count the number of inversions in the second element.

For example, suppose our original list was [(1.55m, 60kg), (1.8m, 80kg), (1.5m, 70kg), (1.6m, 72kg)]. Converting to ranks, we get [(2nd,1st),(4th,4th),(1st,2nd),(3rd,3rd)]. Sorting on fst, we get [(1,2),(2,1),(3,3),(4,4)]. Looking at snd, we see that we have just one inversion. The idea is that the fewer inversions we have, the better correlated the two quantities. (Of course in reality there's a bit more to it than that - to convert the number of inversions into a probability, we need to know the distribution of word lengths for Sn, where n is the number of pairs of test data that we have.)

So you can think of the Kendall tau test as saying: What permutation has been applied in moving from the first quantities (the heights) to the second quantities (the weights)? How far is that permutation from the identity (on the Cayley graph)? What proportion of all permutations (in Sn) lie at that distance or less from the identity? (Even more concretely, we can imagine colouring in successive shells of the Cayley graph, working out from the identity, until we hit the given permutation, and then asking what proportion of the "surface" of the graph has been coloured.)

Cayley graphs of groups

2010-09-20T21:06:00.000+01:00

[New version HaskellForMaths 0.2.2 released here]

Recently, we've been looking at group presentations, where a group is presented as a set of generators together with a set of relations that hold between those generators. Group elements are then represented as words in the generators.

One can then ask questions about these words, such as: What is the longest (reduced) word in the group? How many (reduced) words are there of each length?

This week I want to look at Cayley graphs, which are a way of visualising groups. Questions about word length translate to questions about path distance in the Cayley graph.

So, the Cayley graph of a group, relative to a generating set gs, is the graph
- with a vertex for each element of the group
- with an edge from x to y just whenever x*g = y for some generator g in gs

Notice that as we have defined it, the edges are directed (from x to y), so this is a directed graph, or digraph.

Here's the Haskell code:

module Math.Algebra.Group.CayleyGraph where

import Math.Algebra.Group.PermutationGroup as P
import Math.Algebra.Group.StringRewriting as SR
import Math.Combinatorics.Graph

import qualified Data.List as L
import qualified Data.Set as S


data Digraph a = DG [a] [(a,a)] deriving (Eq,Ord,Show)

-- Cayley digraph given a group presentation of generators and relations
cayleyDigraphS (gs,rs) = DG vs es where
    rs' = knuthBendix rs
    vs = L.sort $ nfs (gs,rs') -- reduced words
    es = [(v,v') | v <- vs, v' <- nbrs v ]
    nbrs v = L.sort [rewrite rs' (v ++ [g]) | g <- gs]

-- Cayley digraph given group generators as permutations
cayleyDigraphP gs = DG vs es where
    vs = P.elts gs
    es = [(v,v') | v <- vs, v' <- nbrs v ]
    nbrs v = L.sort [v * g | g <- gs]

As an example, let's look at the Cayley digraph of the dihedral group D8 (the symmetries of a square), generated by a rotation and a reflection:

> :load Math.Algebra.Group.CayleyGraph

> let a = p [[1,2,3,4]]

> let b = p [[1,2],[3,4]]

> a^3*b == b*a

True

> cayleyDigraphS (['a','b'],[("aaaa",""),("bb",""),("aaab","ba")])

DG ["","a","aa","aaa","aab","ab","b","ba"] [("","a"),("","b"),("a","aa"),("a","ab"),("aa","aaa"),("aa","aab"),("aaa",""),("aaa","ba"),("aab","aa"),("aab","ab"),("ab","a"),("ab","b"),("b",""),("b","ba"),("ba","aaa"),("ba","aab")]

> cayleyDigraphP [a,b]

DG [[],[[1,2],[3,4]],[[1,2,3,4]],[[1,3],[2,4]],[[1,3]],[[1,4,3,2]],[[1,4],[2,3]],[[2,4]]] [([],[[1,2],[3,4]]),([],[[1,2,3,4]]),([[1,2],[3,4]],[]),([[1,2],[3,4]],[[1,3]]),([[1,2,3,4]],[[1,3],[2,4]]),([[1,2,3,4]],[[2,4]]),([[1,3],[2,4]],[[1,4,3,2]]),([[1,3],[2,4]],[[1,4],[2,3]]),([[1,3]],[[1,4,3,2]]),([[1,3]],[[1,4],[2,3]]),([[1,4,3,2]],[]),([[1,4,3,2]],[[1,3]]),([[1,4],[2,3]],[[1,3],[2,4]]),([[1,4],[2,3]],[[2,4]]),([[2,4]],[[1,2],[3,4]]),([[2,4]],[[1,2,3,4]])]

These are of course the same Cayley digraph, just with different vertex labels. Here's a picture:

The picture might remind you of something. You can think of a Cayley digraph as a state transition diagram, where the states are the group elements, and the transitions are multiplication (on the right) by g, for each generator g. It might help to think of each edge as being labelled by the generator that caused it.

A few things to notice.

First, Cayley digraphs are always regular: the out-degree of each vertex, the number of edges leading out of it, will always equal the number of generators; and similarly for the in-degree, the number of edges leading into each vertex. (Exercise: Prove this.) In fact, we can say more - the graph "looks the same" from any vertex - this follows from the group properties. (Exercise: Explain.)

Second, notice how some of the edges come in pairs going in opposite directions. Why is that? In this case, it's because one of our generators is its own inverse (which one?) - so if it can take you from x to y, then it can take you back again. In general, whenever our set of generators contains a g such that g^-1 is also in the set, then the edges corresponding to g, g^-1 will come in opposing pairs.

Given this, we can omit the arrows on the edges if we adopt the convention that whenever we are given a set of generators, their inverses are also implied. In this way, we obtain an undirected or simple graph. Here's the code:

toSet = S.toList . S.fromList

-- The Cayley graph on the generators gs *and their inverses*, given relations rs
cayleyGraphS (gs,rs) = graph (vs,es) where
    rs' = knuthBendix rs
    vs = L.sort $ nfs (gs,rs') -- all reduced words
    es = toSet [ L.sort [v,v'] | v <- vs, v' <- nbrs v ] -- toSet orders and removes duplicates
    nbrs v = [rewrite rs' (v ++ [g]) | g <- gs]

cayleyGraphP gs = graph (vs,es) where
    vs = P.elts gs
    es = toSet [ L.sort [v,v'] | v <- vs, v' <- nbrs v ]
    nbrs v = [v * g | g <- gs]

For example:

> cayleyGraphS (['a','b'],[("aaaa",""),("bb",""),("aaab","ba")])

G ["","a","aa","aaa","aab","ab","b","ba"] [["","a"],["","aaa"],["","b"],["a","aa"],["a","ab"],["aa","aaa"],["aa","aab"],["aaa","ba"],["aab","ab"],["aab","ba"],["ab","b"],["b","ba"]]

One important point to note is that the Cayley graph of a group is relative to the generators. For example, we saw last time that the dihedral groups can also be generated by two reflections. In the case of D8, we can set r = (1 2)(3 4), s = (1 3).

Before scrolling down, see if you can guess what the Cayley graph looks like. I'll give you a clue: Cayley graphs are always regular - what is the valency of each vertex in this case?

> let r = p [[1,2],[3,4]]

> let s = p [[1,3]]

> (r*s)^4

[]

> cayleyGraphS (['r','s'],[("rr",""),("ss",""),("rsrsrsrs","")])

G ["","r","rs","rsr","rsrs","s","sr","srs"] [["","r"],["","s"],["r","rs"],["rs","rsr"],["rsr","rsrs"],["rsrs","srs"],["s","sr"],["sr","srs"]]

So the point to emphasize is that this graph and the one shown previously are Cayley graphs of the same group. The vertices represent the same elements (considered as permutations). However, because we have taken different sets of generators, we get different edges, and hence different graphs.

Ok, so what does the Cayley graph tell us about the group? Well, as an example, consider the Cayley graph of the Rubik cube group, as generated by the face rotations f, b, l, r, u, d (as defined previously). The vertices of the graph can be identified with the possible positions or states of the cube. The group element 1 corresponds to the solved cube. The edges correspond to single moves that can be made on the cube. If someone gives you a scrambled cube to solve, they are asking you to find a path from that vertex of the Cayley graph back to 1.

Given any graph, and vertices x and y, the distance from x to y is defined as the length of the shortest path from x to y. On the Rubik graph (ie, the Cayley graph of the Rubik cube), the distance from x to 1 is the minimum number of moves needed to solve position x. The HaskellForMaths library provides a distance function on graphs. Thus for example:

> let graphD8rs = cayleyGraphS (['r','s'],[("rr",""),("ss",""),("rsrsrsrs","")])

> distance graphD8rs "" "rsr"

3

The distance from 1 to an element g is of course the length of the reduced word for g.

The diameter of a graph is defined as the maximum distance between vertices. So the diameter of the Rubik graph is the maximum number of moves that are required to solve a scrambled position. It has recently been shown that this number is twenty.

> diameter graphD8rs

4

The diameter of a Cayley graph is the length of the longest reduced word.

That's really all I wanted to say for the moment. Next time, we'll take some of these ideas further.

Group presentations

2010-07-18T20:56:00.001+01:00

Last time we looked at string rewriting systems and the Knuth-Bendix completion algorithm. The motivation for doing that was to enable us to think about groups in a more abstract way than before.

The example we looked at last time was the symmetry group of the square. We found that this group could be generated by two elements a, b, satisfying the relations:
a^4 = 1
b^2 = 1
a^3 b = b a

This way of thinking about the group is called a group presentation, and the usual notation would be:
<a,b | a^4=1, b^2=1, a^3 b = b a>

In our Haskell code, we represent this as:

( ['a','b'], [("aaaa",""),("bb",""),("aaab","ba")] )

We saw how to use the Knuth-Bendix algorithm to turn the relations into a confluent rewrite system:

> :load Math.Algebra.Group.StringRewriting

> mapM_ print $ knuthBendix [("aaaa",""),("bb",""),("aaab","ba")]

("bb","")

("bab","aaa")

("baa","aab")

("aba","b")

("aaab","ba")

("aaaa","")

The rewrite system itself isn't particularly informative. Its importance lies in what it enables us to do. Given any word in the generators, we reduce it as follows: wherever we can find the left hand side of one of rules as a subword, we replace it by the right hand side of the rule. If we keep doing this until there are no more matches, then we end up with a normal form for the element - that is, another word in the generators, which represents the same group element, and is the smallest such word relative to the shortlex ordering. Several things follow from this.

First, the ability to find the shortest word is sometimes useful in itself. If we could do this for the Rubik cube group (taking the six face rotations as generators), then we would be able to code "God's algorithm" to find the shortest solution to any given cube position.

Second, any two words that represent the same group element will reduce to the same normal form. Hence, given any two words in the generators, we can tell whether they represent the same element. This is called "solving the word problem" for the group.

Third, this enables us to list (the normal forms of) all the elements of the group - and hence, among other things, to count them.

Fourth, it enables us to do arithmetic in the group:
- To multiply two elements, represented as words w1, w2, just concatenate them to w1++w2, then reduce using the rewrite system
- The identity element of the group is of course the empty word ""
- But what about inverses?

Strings (lists) under concatenation form a monoid, not a group. So what do we do about inverses?

Well, one possibility is to include them as additional symbols. So, suppose that our generators are a,b. Then we should introduce additional symbols a^-1, b^-1, and consider words over the four symbols {a,b,a^-1,b^-1}. (For brevity, it is customary to use the symbols A, B for a^-1, b^-1.)

If we take this approach, then we will need to add some new rules too. We will need the rules a a^-1 = 1, etc. We will probably also need the "inverses" of the relations in our presentation. For example, if we have a^4 = 1, then we should also have a rule (a^-1)^4 = 1.

It's going to be a bit of a pain. (And it's probably going to cause Knuth-Bendix to get indigestion, in some cases at least.) Luckily, for finite groups, we don't really need this. In a finite group, each generator must have finite order: in our example a^4 = 1, b^2 = 1. So the inverse of each generator is itself a power of that generator - a^-1 = a^3, b^-1 = b. So for a finite group - or in fact any group where the generators are all of finite order - then the inverses are already there, expressible as words in the generators.

So for most purposes, we have no need to introduce the inverses as new symbols. For example, if we want to list the elements of a finite group, or tell whether two words in the generators represent the same element, then we are fine. When it will matter is if we are specifically interested in the length of the words. For example, if we want God's algorithm for solving Rubik's cube, we are interested in the length of words in the generators and their inverses - the clockwise and anti-clockwise face rotations.

There is one situation when even this won't matter - and that is if the generators are their own inverses. If we have a generator g such that g^2 = 1, then it follows that g^-1 = g. Such an element is called an involution.

Are there groups which can be generated by involutions alone? Yes, there are. Let's have a look at a couple.

Consider the symmetry group of a regular polygon, say a pentagon. Consider the two reflections shown below. 'a' is the reflection in an axis through a vertex, and 'b' is the reflection in an axis through the midpoint of an adjacent edge. Hence the angle between the axes is pi/5 (or for an n-gon, pi/n).

It should be clear that ab is a 1/5 rotation of the pentagon. It follows that a,b generate the symmetry group of the pentagon, with a^2 = b^2 = (ab)^5 =1.

> elts (['a','b'], [("aa",""), ("bb",""), ("ababababab","")])

["","a","b","ab","ba","aba","bab","abab","baba","ababa"]

> length it

10

Next, consider the symmetric group S4 (or in general, Sn). It can be generated by the transpositions s1 = (1 2), s2 = (2 3), and s3 = (3 4), which correspond to the diagrams below:

Now, multiplication in the group corresponds to concatenation of the diagrams, going down the page. For example:

In fact, each of those diagrams represents the identity element - as you can check by following each point along the lines down the page, and checking that it ends up where it started. Hence each diagram represents a relation for S4. The diagrams show that s1^2 = 1, (s1s2)^3=1, and (s1s3)^2 = 1.

In the general case, it's clear that Sn can be generated by n-1 transpositions s_i of the form (i i+1), and that they satisfy the following relations:
si^2 = 1
(si sj)^3 = 1 if |i-j| = 1
(si sj)^2 = 1 if |i-j| > 1

Here's some Haskell code to construct these presentations of Sn. (Did I mention that all of the string rewriting code works on arbitrary lists, not just strings?)

newtype S = S Int deriving (Eq,Ord)

instance Show S where
    show (S i) = "s" ++ show i

_S n = (gs, r ++ s ++ t) where
    gs = map S [1..n-1]
    r = [([S i, S i],[]) | i <- [1..n-1]]
    s = [(concat $ replicate 3 [S i, S (i+1)],[]) | i <- [1..n-2]]
    t = [([S i, S j, S i, S j],[]) | i <- [1..n-1], j <- [i+2..n-1]]

And just to check:

> _S 4

([s1,s2,s3],[([s1,s1],[]),([s2,s2],[]),([s3,s3],[]),([s1,s2,s1,s2,s1,s2],[]),([s2,s3,s2,s3,s2,s3],[]),([s1,s3,s1,s3],[])])

> elts $ _S 4

[[],[s1],[s2],[s3],[s1,s2],[s1,s3],[s2,s1],[s2,s3],[s3,s2],[s1,s2,s1],[s1,s2,s3],[s1,s3,s2],[s2,s1,s3],[s2,s3,s2],[s3,s2,s1],[s1,s2,s1,s3],[s1,s2,s3,s2],[s1,s3,s2,s1],[s2,s1,s3,s2],[s2,s3,s2,s1],[s1,s2,s1,s3,s2],[s1,s2,s3,s2,s1],[s2,s1,s3,s2,s1],[s1,s2,s1,s3,s2,s1]]

> length it

24

Anyway, that's it for now. Where I'm heading with this stuff is finite reflection groups and Coxeter groups, but I might take a couple of detours along the way.

String rewriting and Knuth-Bendix completion

2010-05-28T22:08:00.001+01:00

Previously in this blog we have been looking at symmetry groups of combinatorial structures. We have represented these symmetries concretely as permutations - for example, symmetries of graphs as permutations of their vertices. However, mathematicians tend to think about groups more abstractly.

Consider the symmetry group of the square (the cyclic graph C4). It can be generated by two permutations:

> :load Math.Algebra.Group.PermutationGroup

> let a = p [[1,2,3,4]]

> let b = p [[1,2],[3,4]]

We can list various relations that are satisfied by these generators:
a^4 = 1
b^2 = 1
a^3 b = b a

Of course, there are other relations that hold between the generators. However, the relations above are in fact sufficient to uniquely identify the group (up to isomorphism).

Since a and b generate the group, any element in the group can be expressed as a product of a's and b's (and also their inverses, but we'll ignore that for now). However, there are of course an infinite number of such expressions, but only a finite number of group elements, so many of these expressions must represent the same element. For example, since b^2=1, then abba represents the same element as aa.

Given two expressions, it would obviously be helpful to have a method for telling whether they represent the same group element. What we need is a string rewriting system. We can think of expressions in the generators as words in the symbols 'a' and 'b'. And we can reinterpret the relations above as rewrite rules:
"aaaa" -> ""
"bb" -> ""
"aaab" -> "ba"

Each of these rules consists of a left hand side and a right hand side. Given any word in the generator symbols, if we find the left hand side anywhere in the word, we can replace it by the right hand side. For example, in the word "abba", we can apply the rule "bb" -> "", giving "aa".

So, the idea is that given any word in the generator symbols, we repeatedly apply the rewrite rules until we can go no further. The hope is that if two words represent the same group element, then we will end up with the same word after rewriting. We'll see later that there's a bit more to do before that will work, but for the moment, let's at least write some Haskell code to do the string rewriting.

So the first thing we are going to need to do is try to find the left hand side of a rule as a subword within a word. Actually, we want to do a bit more than that - if X is our word, and Y the subword, then we want to find the A and B such that X = AYB.

import qualified Data.List as L

splitSubstring xs ys = splitSubstring' [] xs where
    splitSubstring' ls [] = Nothing
    splitSubstring' ls (r:rs) =
        if ys `L.isPrefixOf` (r:rs)
        then Just (reverse ls, drop (length ys) (r:rs))
        else splitSubstring' (r:ls) rs

Then if our rewrite rule is L -> R, then a single application of the rule consists in replacing L by R within the word:

rewrite1 (l,r) xs =
    case xs `splitSubstring` l of
    Nothing -> Nothing
    Just (a,b) -> Just (a++r++b)

Okay, so suppose we have a rewrite system (that is, a collection of rewrite rules), and a word. Then we want to repeatedly apply the rules until we find that no rule applies:

rewrite rules word = rewrite' rules word where
    rewrite' (r:rs) xs =
        case rewrite1 r xs of
        Nothing -> rewrite' rs xs
        Just ys -> rewrite' rules ys
    rewrite' [] xs = xs

For example:

> :load Math.Algebra.Group.StringRewriting

> rewrite [("aaaa",""),("bb",""),("aaab","ba")] "abba"

"aa"

So far, so good. However, there are some problems with the rewrite system that we constructed above. Suppose that the word we wanted to reduce was "aaabb".
If we apply the rule "aaab" -> "ba", then we have "aaabb" -> "bab".
However, if we apply the rule "bb" -> "", then we have "aaabb" -> "aaa".
Neither "bab" nor "aaa" reduces any further. So we have two problems:
- The same starting word can end up at different end words, depending on the order in which we apply the rules
- We can see from the example that the words "bab" and "aaa" actually represent the same element in our group, but our rewrite system can't rewrite either of them

What can we do about this? Well here's an idea. Let's just add "bab" -> "aaa" as a new rewrite rule to our system. We know that they are equal as elements of the group, so this is a valid thing to do.

That's good, but we still have problems. What about the word "aaaab"?
If we apply the rule "aaaa" -> "", then "aaaab" -> "b"
On the other hand, if we apply the rule "aaab" -> "ba", then "aaaab" -> "aba"

So let's do the same again, and add a new rule "aba" -> "b".

What we're doing here is called the Knuth-Bendix algorithm. Let's take a step back. So in each case, I found a word that could be reduced in two different ways. How did I do that? Well, what I was looking for is two rules with overlapping left hand sides. That is, I was looking for rules L1 -> R1, L2 -> R2, with
L1 = AB
L2 = BC
A pair of rules like this is called a critical pair. If we can find a critical pair, then by looking at the word ABC, we see that
ABC = (AB)C = L1 C -> R1 C
ABC = A(BC) = A L2 -> A R2
So we are justified in adding a new rule R1 C -> A R2

So the Knuth-Bendix algorithm basically says, for each critical pair, introduce a new rule, until there are no more critical pairs. There's a little bit more to it than that:
- We want the rewrite system to reduce the word. That means that we want an ordering on words, and given a pair, we want to make them into a rule that takes the greater to the lesser, rather than vice versa. The most obvious ordering to use is called shortlex: take longer words to shorter words, and if the lengths are equal, use alphabetical ordering.
- Whenever we introduce a new rule, it might be that the left hand side of some existing rule becomes reducible. In that case, the existing rule becomes redundant, since any word that it would reduce can now be reduced by the new rule.

Here's the code:

-- given two strings x,y, find if possible a,b,c with x=ab y=bc
findOverlap xs ys = findOverlap' [] xs ys where
    findOverlap' as [] cs = Nothing
    findOverlap' as (b:bs) cs =
        if (b:bs) `L.isPrefixOf` cs
        then Just (reverse as, b:bs, drop (length (b:bs)) cs)
        else findOverlap' (b:as) bs cs

shortlex x y = compare (length x, x) (length y, y)

ordpair x y =
    case shortlex x y of
    LT -> Just (y,x)
    EQ -> Nothing
    GT -> Just (x,y)

knuthBendix1 rules = knuthBendix' rules pairs where
    pairs = [(lri,lrj) | lri <- rules, lrj <- rules, lri /= lrj]
    knuthBendix' rules [] = rules
    knuthBendix' rules ( ((li,ri),(lj,rj)) : ps) =
        case findOverlap li lj of
        Nothing -> knuthBendix' rules ps
        Just (a,b,c) -> case ordpair (rewrite rules (ri++c)) (rewrite rules (a++rj)) of
                        Nothing -> knuthBendix' rules ps -- they both reduce to the same thing
                        Just rule' -> let rules' = reduce rule' rules
                                          ps' = ps ++
                                                [(rule',rule) | rule <- rules'] ++
                                                [(rule,rule') | rule <- rules']
                                      in knuthBendix' (rule':rules') ps'
   reduce rule@(l,r) rules = filter (\(l',r') -> not (L.isInfixOf l l')) rules

For example:

> knuthBendix1 [("aaaa",""), ("bb",""), ("aaab","ba")]

[("baa","aab"),("bab","aaa"),("aba","b"),("aaaa",""),("bb",""),("aaab","ba")]

A few words about the Knuth-Bendix algorithm
- It is not guaranteed to terminate. Every time we introduce a new rule, we have the potential to create new critical pairs, and there are pathological examples where this goes on forever
- The algorithm can be made slightly more efficient, by doing things like choosing to process shorter critical pairs first. In the HaskellForMaths library, a more efficient version is given, called simply "knuthBendix"

Back to the example. So Knuth-Bendix has found three new rules. The full system, with these new rules added, has no more critical pairs. As a consequence, it is a confluent rewrite system - meaning that if you start at some given word, and reduce it using the rules, then it doesn't matter in what order you apply the rules, you will always end up at the same word. This word that you end up with can therefore be used as a normal form.

This allows us to "solve the word problem" for this group. That is, given any two words in the generator symbols, we can find out whether they represent the same group element by rewriting them both, and seeing if they end up at the same normal form. For example:

> let rules = knuthBendix [("aaaa",""), ("bb",""), ("aaab","ba")]

> rewrite rules "aaaba"

"aab"

> rewrite rules "baabb"

"aab"

> rewrite rules "babab"

"b"

So we see that "aaaba" and "baabb" represent the same group element, whereas "babab" represents a different one. (If you want, you could go back and check this using the original permutations.)

We can even list (the normal forms of) all elements of the group. What we do is start with the empty word (which represents the identity element of the group), and then incrementally build longer and longer words. At each stage, we look at all combinations that can be formed by pre-pending a generator symbol to a word from the preceding stage. However, if we ever come across a word which can be reduced, then we know that it - and any word that could be formed from it at a later stage - is not a normal form, and so can be discarded. Here's the code:

nfs (gs,rs) = nfs' [[]] where
    nfs' [] = [] -- we have run out of words
    nfs' ws = let ws' = [g:w | g <- gs, w <- ws, (not . isNewlyReducible) (g:w)]
              in ws ++ nfs' ws'
    isNewlyReducible w = any (`L.isPrefixOf` w) (map fst rs)

elts (gs,rs) = nfs (gs, knuthBendix rs)

For example:

> elts (['a','b'], [("aaaa",""), ("bb",""), ("aaab","ba")])

["","a","b","aa","ab","ba","aaa","aab"]

As expected, we have eight elements.

That's enough for now. Next time (hopefully) I'll look at some more examples.

Block systems and block homomorphism

2010-04-25T20:55:00.000+01:00

Recently in this blog, we looked at the strong generating set (SGS) algorithm for permutation groups, and how we can use it to investigate the structure of groups. Last time, we saw how to partially "factor" intransitive groups, using the transitive constituent homomorphism. (Recall that by "factoring" a group G, we mean finding a proper normal subgroup K, and consequently also a quotient group G/K - which is equivalent to finding a proper homomorphism from G.) This time, I want to do the same for imprimitive groups. So, what is an imprimitive group?

Well, given a permutation group acting on a set X, it can happen that X consists of "blocks" Y1, Y2, ... of points which always "move together". That is, a subset Y of X is a block if for all g in G, Y^g (the image of Y under the action of g) is either equal to Y or disjoint from it. A full set of blocks (that is, blocks Y1, Y2, ... which are disjoint, and whose union is the whole of X) is called a block system.

For example, suppose that X is the vertices of the hexagon. The symmetry group of the hexagon is the dihedral group D12, generated by a rotation and a reflection:

> :load Math.Algebra.Group.Subquotients

> mapM_ print $ _D 12

[[1,2,3,4,5,6]]

[[1,6],[2,5],[3,4]]

A block system for the hexagon is shown below. The blocks are the pairs of opposite vertices. You can verify that they satisfy the definition of blocks: any symmetry must take a pair of opposite points either to itself, or to another pair disjoint from it.

A given group can have more than one block system. Here is another block system for the hexagon. The blocks are the two equilateral triangles formed by the vertices.

There are also the trivial block systems, consisting of either just one block containing all the points, or a block for each point. From now on, we will silently exclude these.

So, I was meant to be telling you what an imprimitive group is. Well, it's just a group which has a non-trivial block system. Conversely, a primitive group is one which has no non-trivial block system.

When we have an imprimitive group, we will be able to form a homomorphism - and hence factor the group - by considering the induced action of the group on the blocks. But I'm jumping ahead. First we need to write some Haskell code - to find block systems.

The idea is to write a function that, given a pair of points Y = {y1,y2} in X (or indeed any subset Y of X), can find the smallest block containing Y. The way it works is as follows. We start by supposing that each point is in a block of its own, except for the points in Y. We initialise a map, with the points in X as keys, and the blocks as values, where we represent a block by its least element.

Now, suppose that we currently think that the minimal block is Y = {y1,y2,...}. What we're going to do is work through the elements of Y, and work through the generators of G, trying to find a problem. So suppose that we have got as far as some element y of Y, and some generator g of G. We know that y is in the same block as y1, and what we have to check is that y^g is in the same block as y1^g. So we look up their representatives in the map, and check that they're the same. If they're not, then we need to merge the two classes. Here's the code (it's a little opaque, but it's basically doing what I just described).

minimalBlock gs ys@(y1:yt) = minimalBlock' p yt gs where
    xs = foldl union [] $ map supp gs
    p = M.fromList $ [(yi,y1) | yi <- ys] ++ [(x,x) | x <- xs \\ ys]
    minimalBlock' p (q:qs) (h:hs) =
        let r = p M.! q         -- representative of class containing q
            k = p M.! (q .^ h)  -- rep of class (q^h)
            l = p M.! (r .^ h)  -- rep of class (r^h)
        in if k /= l -- then we need to merge the classes
           then let p' = M.map (\x -> if x == l then k else x) p
                    qs' = qs ++ [l]
                in minimalBlock' p' (q:qs') hs
           else minimalBlock' p (q:qs) hs
    minimalBlock' p (q:qs) [] = minimalBlock' p qs gs
    minimalBlock' p [] _ =
        let reps = toListSet $ M.elems p
        in L.sort [ filter (\x -> p M.! x == r) xs | r <- reps ]

Once we have this function, then finding the block systems is simple - just take each pair {x1,xi} from X, and find the minimal block containing it.

blockSystems gs
    | isTransitive gs = toListSet $ filter (/= [x:xs]) $ map (minimalBlock gs) [ [x,x'] | x' <- xs ]
    | otherwise = error "blockSystems: not transitive"
    where x:xs = foldl union [] $ map supp gs

If we have an SGS for G, then we can do slightly better. For suppose that within the stabiliser G_x1, there is an element taking xi to xj. Then clearly xi and xj must be in the same minimal block. So in fact, we need only consider pairs {x1,xi}, with xi the minimal element of each orbit in G_x1. (Of course, the point is that if we have an SGS for G, then we can trivially list a set of generators for G_x1.)

blockSystemsSGS gs = toListSet $ filter (/= [x:xs]) $ map (minimalBlock gs) [ [x,x'] | x' <- rs ]
    where x:xs = foldl union [] $ map supp gs
          hs = filter (\g -> x < minsupp g) gs -- sgs for stabiliser Gx
          os = orbits hs
          rs = map head os ++ (xs \\ L.sort (concat os)) -- orbit representatives, including singleton cycles

Let's test it:

> mapM_ print $ blockSystems $ _D 12

[[1,3,5],[2,4,6]]

[[1,4],[2,5],[3,6]]

Okay, so given a group, we can find its non-trivial block systems, if any. What next? Well, as I hinted earlier, this enables us to factor the group. For if there is a non-trivial block system, then the action of the group on the points induces a well-defined action on the blocks. This induced action gives us a homomorphism from our original group G, a subgroup of Sym(X), to another group H, a subgroup of Sym(B), where B is the set of blocks.

So as we did last time, we can find the kernel and image of the homomorphism, and thus factor the group. How do we do that?

Well, it's simple. In the following code, the function lr takes a group element acting on the points, and returns a group element acting on the blocks (in the Left side) and the points (in the Right side) in an Either union. If we do this to all the group generators, and then find an SGS, then as the Left blocks sort before the Right points, then the SGS will split neatly into two parts:
- The initial segment of the SGS will consist of elements which move the Left blocks. If we restrict their action to just the blocks, we will have an SGS for the image of the homomorphism, acting on the blocks.
- The final segment of the SGS will consist of elements which fix all the Left blocks. These elements move points but not blocks, so they form an SGS for the kernel of the homomorphism.

blockHomomorphism' gs bs = (ker,im) where
    gs' = sgs $ map lr gs
    lr g = fromPairs $ [(Left b, Left $ b -^ g) | b <- bs] ++ [(Right x, Right y) | (x,y) <- toPairs g]
    ker = map unRight $ dropWhile (isLeft . minsupp) gs' -- stabiliser of the blocks
    im = map restrictLeft $ takeWhile (isLeft . minsupp) gs' -- restriction to the action on blocks

blockHomomorphism gs bs
    | bs == closure bs [(-^ g) | g <- gs] -- validity check on bs
        = blockHomomorphism' gs bs

Let's try it out on our two block systems for the hexagon:

> blockHomomorphism (_D 12) [[1,4],[2,5],[3,6]]

([[[1,4],[2,5],[3,6]]],

 [[[[1,4],[2,5],[3,6]]],[[[2,5],[3,6]]]])

I've formatted the output for clarity. The first line is (an SGS for) the kernel, consisting of elements of D12 which permute points within the blocks, without permuting the blocks. In this case, the kernel is generated by the 180 degree rotation, which swaps the points within each pair. The second line is (an SGS for) the image, consisting of the induced action of D12 on the blocks. In this case, we have the full permutation group S3 acting on the three pairs of points.

> blockHomomorphism (_D 12) [[1,3,5],[2,4,6]]

([[[1,5,3],[2,6,4]],[[2,6],[3,5]]],

 [[[[1,3,5],[2,4,6]]]])

In this case the kernel is generated by a 120 degree rotation and a reflection, and consists of all group elements which send odd points to odd and even points to even, thus preserving the blocks. The image has only one non-trivial element, which just swaps the two blocks.

Armed with this new tool, let's have another look at Rubik's cube. Recall that we labelled the faces of the cube as follows:

Last time, we split the Rubik cube group into two homomorphic images - a group acting on just the corner faces, and a group acting on just the edge faces. Let's look for block systems in these groups:

> :load Math.Projects.Rubik

> let [cornerBlocks] = blockSystems imCornerFaces

> let [edgeBlocks] = blockSystems imEdgeFaces

> cornerBlocks

[[1,17,23],[3,19,41],[7,29,31],[9,33,47],[11,21,53],[13,43,51],[27,37,59],[39,49,57]]

> edgeBlocks

[[2,18],[4,26],[6,44],[8,32],[12,52],[14,22],[16,42],[24,56],[28,34],[36,48],[38,58],[46,54]]

It's obvious really - in the corner group, we have a block system with blocks consisting of the three corner faces that belong to the same corner piece, and in the edge group, we have a block system with blocks consisting of the two edge faces that belong to the same edge piece. Furthermore, these are the only block systems.

So we can form the kernel and image under the block homomorphism:

> let (kerCornerBlocks,imCornerBlocks) = blockHomomorphism imCornerFaces cornerBlocks

> let (kerEdgeBlocks,imEdgeBlocks) = blockHomomorphism imEdgeFaces edgeBlocks

If we look at the sizes of these groups, the structure will be obvious:

> orderSGS kerCornerBlocks

2187

> orderSGS imCornerBlocks

40320

These are 3^7, and 8! respectively. The kernel is the permutations of the corner faces which leave the corner blocks where they are. It turns out that whenever you twist one corner block, you must untwist another. So when you have decided what to do with seven corners, the eighth is determined - hence 3^7. For the image, we have eight blocks, and 8! permutations of them, so this must be the full symmetry group S8 - meaning that we can perform any rearrangement of the corner blocks that is desired.

> orderSGS kerEdgeBlocks

2048

> orderSGS imEdgeBlocks

479001600

These are 2^11 and 12! respectively. For the kernel, whenever we flip one edge piece we must also flip another. So when we have decided what to do with eleven edges, the twelfth is determined - hence 2^11. For the image, we have twelve pieces, and 12! permutations of them, so we have the full symmetry group S12 on edge blocks.

That's it.

Incidentally, my references for this material are:
- Holt, Handbook of Computational Group Theory
- Seress, Permutation Group Algorithms
both of which are very good - but expensive.

These books, particularly the latter, go on to describe further algorithms that can be used to factor even transitive primitive groups, enabling us to arrive at a full decomposition of a group into simple groups. Unfortunately, the algorithms get a bit more complicated after this, and I haven't yet implemented the rest in HaskellForMaths.