Monday, 13 December 2010

The free vector space on a type, part 1

As I mentioned last time, I want to spend the next few posts talking about quantum algebra. Well, we've got to start somewhere, so let's start with vector spaces.

You probably know what a vector space is. It's what is sounds like: a space of vectors, that you can add together, or multiply by scalars (that is, real numbers, or more generally, elements of some field k). Here's the official definition:
An additive (or Abelian) group is a set with a binary operation called addition, such that
- addition is associative: x+(y+z) = (x+y)+z
- addition is commutative: x+y = y+x
- there is an additive identity: x+0 = x = 0+x
- there are additive inverses: x+(-x) = 0 = (-x)+x
A vector space over a field k is an additive group V, together with an operation k * V -> V called scalar multiplication, such that:
- scalar multiplication distributes over vector addition: a(x+y) = ax+ay
- scalar multiplication distributes over scalar addition: (a+b)x = ax+bx
- associativity: (ab)x = a(bx)
- unit: 1a = a

There are some obvious examples:
- R^2 is a 2-dimensional vector space over the reals R, R^3 is a 3-dimensional vector space. (I'm not going to define dimension quite yet, but hopefully it's intuitively obvious what it means.)
- R^n is an R-vector space for any n.
- Indeed, k^n is a k-vector space for any field k.

Some slightly more interesting examples:
- C is a 2-dimensional vector space over R. (The reason it's more interesting is that of course C possesses additional algebraic structure, beyond the vector space structure.)
- 2*2 matrices over k form a 4-dimensional k-vector space.
- Polynomials in X with coefficients in k form an (infinite dimensional) k-vector space.


If we wanted to code the above definition into Haskell, probably the first idea that would come to mind would be to use type classes:

class AddGrp a where
    add :: a -> a -> a
    zero :: a
    neg :: a -> a -- additive inverse

class (Field k, AddGrp v) => VecSp k v where
    smult :: k -> v -> v -- scalar multiplication

(Type classes similar to these are defined are in the vector-space package.)


For most vector spaces that one encounters in "real life", there is some set of elements, usually obvious, which form a "basis" for the vector space, meaning that all elements can be expressed as linear combinations of basis elements. For example, in R^3, the obvious basis is {(0,0,1), (0,1,0), (1,0,0)}. Any element (x,y,z) of R^3 can be expressed as the linear combination x(1,0,0)+y(0,1,0)+z(0,0,1).

(Mathematicians would want to stress that there are other bases for R^3 that would serve equally well, and indeed, that a significant part of the theory of vector spaces can go through without even talking about bases. However, for our purposes - we want to write code to calculate in vector spaces - then working with a basis is natural.)


Okay, so we want a way to build a vector space from a basis. (More specifically, a k-vector space, for some given field k.) What sorts of things shall we allow as our basis? Well, why not just allow any type, whatsoever:

module Math.Algebras.VectorSpace where

import qualified Data.List as L

data Vect k b = V [(b,k)] deriving (Eq,Ord)


This says that a k-vector space over basis b consists of a linear combination of elements of b. (So the [(b,k)] is to be thought of as a sum, with each (b,k) pair representing a basis element in b multiplied by a scalar coefficient in k.)

For example, we can define the "boring basis" type, which just consists of numbered basis elements:

newtype EBasis = E Int deriving (Eq,Ord)

instance Show EBasis where show (E i) = "e" ++ show i

e i = return (E i) -- don't worry about what "return" is doing here for the moment
e1 = e 1
e2 = e 2
e3 = e 3


Then a typical element of Vect Double EBasis is:

> :load Math.Algebras.VectorSpace
> V [(E 1, 0.5), (E 3, 0.7)]
0.5e1+0.7e3


So of course, the Show instances for EBasis (see above), and Vect k b (not shown) are coming into play here.

How do we know that this is a vector space? Well actually, it's not yet, because we haven't defined the addition and scalar multiplication operations on it. So, without further ado:

infixr 7 *>
infixl 6 <+>

-- |The zero vector
zero :: Vect k b
zero = V []

-- |Addition of vectors
add :: (Ord b, Num k) => Vect k b -> Vect k b -> Vect k b
add (V ts) (V us) = V $ addmerge ts us

-- |Addition of vectors (same as add)
(<+>) :: (Ord b, Num k) => Vect k b -> Vect k b -> Vect k b
(<+>) = add

addmerge ((a,x):ts) ((b,y):us) =
    case compare a b of
    LT -> (a,x) : addmerge ts ((b,y):us)
    EQ -> if x+y == 0 then addmerge ts us else (a,x+y) : addmerge ts us
    GT -> (b,y) : addmerge ((a,x):ts) us
addmerge ts [] = ts
addmerge [] us = us

-- |Negation of vector
neg :: (Num k) => Vect k b -> Vect k b
neg (V ts) = V $ map (\(b,x) -> (b,-x)) ts

-- |Scalar multiplication (on the left)
smultL :: (Num k) => k -> Vect k b -> Vect k b
smultL 0 _ = zero -- V []
smultL k (V ts) = V [(ei,k*xi) | (ei,xi) <- ts]

-- |Same as smultL. Mnemonic is "multiply through (from the left)"
(*>) :: (Num k) => k -> Vect k b -> Vect k b
(*>) = smultL

A few things to mention:
- First, note that we required a Num instance for k. Strictly speaking, as we stated that k is a field, then we should have required a Fractional instance. However, on occasion we are going to break the rules slightly.
- Second, note that for addition, we required an Ord instance for b. We could have defined addition using (++) to concatenate linear combinations - however, the problem with that is that it wouldn't then easily follow that e1+e3 = e3+e1, or that e1+e1 = 2e1. By requiring an Ord instance, we can guarantee that there is a unique normal form in which to express any vector - namely, list the basis elements in order, combine duplicates, remove zero coefficients.
- Finally, note that I didn't define Vect k b as an instance of a vector space type class. That's just because I didn't yet see a reason to.

It will turn out to be useful to have a function that can take an arbitrary element of Vect k b and return a vector in normal form:

-- |Convert an element of Vect k b into normal form. Normal form consists in having the basis elements in ascending order,
-- with no duplicates, and all coefficients non-zero
nf :: (Ord b, Num k) => Vect k b -> Vect k b
nf (V ts) = V $ nf' $ L.sortBy compareFst ts where
    nf' ((b1,x1):(b2,x2):ts) =
        case compare b1 b2 of
        LT -> if x1 == 0 then nf' ((b2,x2):ts) else (b1,x1) : nf' ((b2,x2):ts)
        EQ -> if x1+x2 == 0 then nf' ts else nf' ((b1,x1+x2):ts)
        GT -> error "nf': not pre-sorted"
    nf' [(b,x)] = if x == 0 then [] else [(b,x)]
    nf' [] = []
    compareFst (b1,x1) (b2,x2) = compare b1 b2

Okay, so we ought to check that the Vect k b is a vector space. Let's write some QuickCheck properties:

prop_AddGrp (x,y,z) =
    x <+> (y <+> z) == (x <+> y) <+> z && -- associativity
    x <+> y == y <+> x                 && -- commutativity
    x <+> zero == x                    && -- identity
    x <+> neg x == zero                   -- inverse

prop_VecSp (a,b,x,y,z) =
    prop_AddGrp (x,y,z) &&
    a *> (x <+> y) == a *> x <+> a *> y && -- distributivity through vectors
    (a+b) *> x == a *> x <+> b *> x     && -- distributivity through scalars
    (a*b) *> x == a *> (b *> x)         && -- associativity
    1 *> x == x                            -- unit

instance Arbitrary EBasis where
    arbitrary = do n <- arbitrary :: Gen Int
                   return (E n)

instance Arbitrary Q where
    arbitrary = do n <- arbitrary :: Gen Integer
                   d <- arbitrary :: Gen Integer
                   return (if d == 0 then fromInteger n else fromInteger n / fromInteger d)

instance Arbitrary (Vect Q EBasis) where
    arbitrary = do ts <- arbitrary :: Gen [(EBasis, Q)]
                   return $ nf $ V ts

prop_VecSpQn (a,b,x,y,z) = prop_VecSp (a,b,x,y,z)
    where types = (a,b,x,y,z) :: (Q, Q, Vect Q EBasis, Vect Q EBasis, Vect Q EBasis)

> quickCheck prop_VecSpQn
+++ OK, passed 100 tests.

(I'm using Q instead of R as my field in order to avoid false negatives caused by the fact that arithmetic in Double is not exact.)

So it looks like Vect k b is indeed a vector space. In category theory, it is called the free k-vector space over b. "Free" here means that there are no relations among the basis elements: it will never turn out, for example, that e1 = e2+e3.


Vector spaces of course form a category, specifically an algebraic category (there are other types, as we'll see in due course). The objects in the category are the vector spaces. The arrows or morphisms in the category are the functions between vector spaces which "commute" with the algebraic structure. Specifically, they are the functions f such that:
- f(x+y) = f(x)+f(y)
- f(0) = 0
- f(-x) = -f(x)
- f(a.x) = a.f(x)

Such a function is called linear, the idea being that it preserves lines. This is because it follows from the the conditions that f(a.x+b.y) = a.f(x)+b.f(y) .

We can write a QuickCheck property to check whether a given function is linear:

prop_Linear f (a,x,y) =
    f (x <+> y) == f x <+> f y &&
    f zero == zero             &&
    f (neg x) == neg (f x)     &&
    f (a *> x) == a *> f x

prop_LinearQn f (a,x,y) = prop_Linear f (a,x,y)
    where types = (a,x,y) :: (Q, Vect Q EBasis, Vect Q EBasis)

For example:

> quickCheck (prop_LinearQn (2 *>))
+++ OK, passed 100 tests.


We won't need to use this quite yet, but it's handy to have around.


Now, in category theory we have the concept of a functor, which is a map from one category to another, which commutes with the category structure. Specifically, a functor F consists of a map from the objects of one category to the objects of the other, and from the arrows of one category to the arrows of the other, satisfying:
- F(id_A) = id_F(A)
- F(f . g) = F(f) . F(g) (where dot denotes function composition)

How does this relate to the Functor type class in Haskell? Well, the Haskell type class enables us to declare that a /type constructor/ is a functor. For example, (Vect k) is a type constructor, which acts on a type b to construct another type Vect k b. (Vect k) is indeed a functor, witness the following declaration:

instance Functor (Vect k) where
    fmap f (V ts) = V [(f b, x) | (b,x) <- ts]

This says that if we have a function f on our basis elements, then we can lift it to a function on linear combinations of basis elements in the obvious way.

In mathematics, we would think of the free vector space construction as a functor from Set (the category of sets) to k-Vect (the category of k-vector spaces). In Haskell, we need to think of the (Vect k) construction slightly differently. It operates on types, rather than sets, so the source category is Hask, the category of Haskell types.

What is the relationship between Hask and Set? Well, if we identify a type with the set of values which inhabit it, then we can regard Hask as a subcategory of Set, consisting of those sets and functions which can be represented in Haskell. (That would imply for example that we are restricted to computable functions.)

So (Vect k) is a functor from Hask to the subcategory of k-Vect consisting of vector spaces over sets/types in Hask.

So just to spell it out, the (Vect k) functor:
- Takes an object b in Hask/Set - ie a type, or its set of inhabitants - to an object Vect k b in k-Vect
- Takes an arrow f in Hask (ie a function f :: a -> b), to an arrow (fmap f) :: Vect k a -> Vect k b in k-Vect.

Now, there's just one small fly in the ointment. In order to get equality of vectors to work out right, we wanted to insist that they were expressed in normal form, which meant we needed an Ord instance for b. However, in the Functor instance for (Vect k), Haskell doesn't let us express this constraint, and our fmap is unable to use the Ord instance for b. What this means is that fmap f might return a vector which is not in normal form - so we need to remember to call nf afterwards. For example:

newtype FBasis = F Int deriving (Eq,Ord)

instance Show FBasis where show (F i) = "f" ++ show i

> let f = \(E i) -> F (10 - div i 2)
> let f' = fmap f :: Vect Q EBasis -> Vect Q FBasis
> f' (e1 <+> 2 *> e2 <+> e3)
f10+2f9+f9
> let f'' = nf . fmap f :: Vect Q EBasis -> Vect Q FBasis
> f'' (e1 <+> 2 *> e2 <+> e3)
3f9+f10


So it might be fairer to say that it is the combination of nf and fmap that forms the functor on arrows.

The definition of a functor requires that the target arrow is an arrow in the target category. In this case, the requirement is that it is a linear function, rather than just any function between vector spaces. So let's just check:

> quickCheck (prop_LinearQn f'')
+++ OK, passed 100 tests.


That's enough for now - more next time.

Sunday, 7 November 2010

New modules - Quantum Algebra

I've put up a new version of HaskellForMaths on Hackage, v0.3.1. It's quite a significant update, with more than a dozen new modules, plus improved documentation of several existing modules. I wrote the new modules in the course of reading Kassel's Quantum Groups. The modules are about algebras, coalgebras, bialgebras, Hopf algebras, tensor categories and quantum algebra.

The new modules fall into two groups:

  • Math.Algebras.* - Modules about algebras (and co-, bi- and Hopf algebras) in general
  • Math.QuantumAlgebra.* - Modules specifically about quantum algebra

In (slightly) more detail, here are the modules:

  • Math.Algebras.VectorSpace - defines a type for the free k-vector space over a basis set b 
  • Math.Algebras.TensorProduct  - defines tensor product of two vector spaces
  • Math.Algebras.Structures - defines a number of additional algebraic structures that can be given to vector spaces: algebra, coalgebra, bialgebra, Hopf algebra, module, comodule
  • Math.Algebras.Quaternions - a simple example of an algebra
  • Math.Algebras.Matrix - the 2*2 matrices - another simple example of an algebra
  • Math.Algebras.Commutative - commutative polynomials (such as x^2+3yz) - another algebra
  • Math.Algebras.NonCommutative - non-commutative polynomials (where xy /= yx) - another algebra
  • Math.Algebras.GroupAlgebra - a key example of a Hopf algebra
  • Math.Algebras.AffinePlane - the affine plane and its symmetries - more Hopf algebras, preparing for the quantum plane
  • Math.Algebras.TensorAlgebra
  • Math.Algebras.LaurentPoly - we use Laurent polynomials in q (that is, polynomials in q and q^-1) as our quantum scalars



  • Math.QuantumAlgebra.QuantumPlane - the quantum plane and its symmetries, as examples of non-commutative, non-cocommutative Hopf algebras
  • Math.QuantumAlgebra.TensorCategory
  • Math.QuantumAlgebra.Tangle - The tangle category (which includes knots, links and braids as subcategories), and some representations (from which we derive knot invariants)
  • Math.QuantumAlgebra.OrientedTangle

The following diagram is something like a dependency diagram, with "above" meaning "depends on".


Each layer depends on the layers beneath it. There are also a few dependencies within layers, that I've hinted at by proximity.

Some of these modules overlap somewhat in content with other modules that already exist in HaskellForMaths. In particular, there are already modules for commutative algebra, non-commutative algebra, and knot theory. For the moment, those existing modules still offer some features not offered by the new modules (for example, calculation of Groebner bases).

For the next little while in this blog, I want to start going through these new modules, and investigating quantum algebra. I should emphasize that this is still work in progress. For example, I'm intending to add modules for quantum enveloping algebras - but I have some reading to do first.

(Oh, and I know that I did previously promise to look at finite simple groups, and Coxeter groups, in this blog. I'll probably still come back to those at some point.)

Thursday, 14 October 2010

Word length in the Symmetric group

Previously on this blog, we saw how to think about groups abstractly via group presentations, where a group is given as a set of generators satisfying specified relations. Last time, we saw that questions about the length of reduced words in such a presentation can be visualised as questions about the length of paths in the Cayley graph of the group (relative to the generators).

This time, I want to focus on just one family of groups - the symmetric groups Sn, as generated by the adjacent transpositions {si = (i i+1)}. Here's the Haskell code defining this presentation of Sn:

newtype SGen = S Int deriving (Eq,Ord)

instance Show SGen where
    show (S i) = "s" ++ show i

_S n = (gs, r ++ s ++ t) where
    gs = map S [1..n-1]
    r = [([S i, S i],[]) | i <- [1..n-1]]
    s = [(concat $ replicate 3 [S i, S (i+1)],[]) | i <- [1..n-2]]
    t = [([S i, S j, S i, S j],[]) | i <- [1..n-1], j <- [i+2..n-1]]

The three sets of relations say: each generator si squares to the identity; if i, j are not adjacent, then si and sj commute; if i, j are adjacent, then (si*sj)^3 is the identity.

Here is the Cayley graph for S4 under this presentation:


The vertices are labeled with the reduced words in the generators si. How can we find out which permutations these correspond to? Well, that's easy:

fromTranspositions ts = product $ map (\(S i) -> p [[i,i+1]]) ts
For example:

> :load Math.Algebra.Group.CayleyGraph
> fromTranspositions [S 1, S 2, S 1, S 3, S 2, S 1]
[[1,4],[2,3]]

This is the permutation that reverses the list [1..4]

> map (.^ it) [1..4]
[4,3,2,1]

What about the other way round? Suppose we are given a permutation in Sn. How do we find its expression as a product of the transpositions si? Well the answer is (roughly): use bubblesort!

Here's bubblesort in Haskell:

bubblesort [] = []
bubblesort xs = bubblesort' [] xs where
    bubblesort' ls (r1:r2:rs) = if r1 <= r2 then bubblesort' (r1:ls) (r2:rs) else bubblesort' (r2:ls) (r1:rs)
    bubblesort' ls [r] = bubblesort (reverse ls) ++ [r]

So we sweep through the list from front to back, swapping any pairs that we find out of order - and then repeat. At the end of each sweep, we're guaranteed that the last element (in sort order) has reached the end of the list - so for the next sweep, we can leave it at the end and only sweep through the earlier elements. Hence the list we're sweeping through is one element shorter each time, so we're guaranteed to terminate. (We could terminate early, the first time a sweep makes no swaps - but I haven't coded that.)

Just to prove that the code works:

> bubblesort [2,3,1]
[1,2,3]

How does this help for turning a permutation into a sequence of transpositions? Well it's simple - every time we swap two elements, we are performing a transposition - so just record which swaps we perform. So here's a modified version of the above code, which records the swaps:

-- given a permutation of [1..n] (as a list), return the transpositions which led to it
toTrans [] = []
toTrans xs = toTrans' 1 [] [] xs where
    toTrans' i ts ls (r1:r2:rs) =
        if r1 <= r2
        then toTrans' (i+1) ts (r1:ls) (r2:rs)         -- no swap needed
        else toTrans' (i+1) (S i : ts) (r2:ls) (r1:rs) -- swap needed
    toTrans' i ts ls [r] = toTrans (reverse ls) ++ ts

Notice that the ts are returned in reverse to the order that they were used. This is because we are using them to undo the permutation - so we are performing the inverse of the permutation we are trying to find. Since each generator is its own inverse, we can recover the permutation we are after simply by reversing. In the code, we reverse as we go along.

For example:

> toTrans [2,3,1]
[s1,s2]
> toTrans [4,3,2,1]
[s1,s2,s1,s3,s2,s1]

Now, there's only one problem. As you can see, this code takes as input a rearrangement of [1..n]. This is a permutation, yes, but considered passively. Whereas in this blog we have been more accustomed to thinking of permutations actively, as something a bit like a function, which has an action on a graph, or other combinatorial structure, or if you like, just on the set [1..n]. In other words, our Permutation type represents the action itself, not the outcome of the action. (Recall that the implementation uses a Data.Map of (from,to) pairs.)

But of course it's easy to convert from one viewpoint to the other. So here's the code to take a permutation in cycle notation and turn it into a sequence of transpositions:

-- given a permutation action on [1..n], factor it into transpositions
toTranspositions 1 = []
toTranspositions g = toTrans [i .^ (g^-1) | i <- [1..n] ] where
    n = maximum $ supp g

For example:

> toTranspositions $ p [[1,4],[2,3]]
[s1,s2,s1,s3,s2,s1]

Why does the code have [i .^ (g^-1) | i <- [1..n]], rather than [i .^ g | i <- [1..n]]?
Well, suppose i .^ g = j. This says that g moves i to the j position. But we want to know what ends up in the i position. Suppose that j .^ g = i, for some j. Applying g^-1 to both sides, we see that j = i .^ (g^-1).


Okay, so given a permutation, in either form, we can reconstruct it as a reduced word in the generators.

We saw last time that the length of a reduced word is also the length of the shortest path from 1 to the element in the Cayley graph. Distance in the Cayley graph is a metric on the group, so the length of a reduced word tells us "how far" the element is from being the identity.

If it's only this distance that we're interested in, then there is a more direct way to work it out. Given a permutation g of [1..n], then an inversion is a pair (i,j) with i < j but i .^ g > j .^ g. In Haskell:

inversions g = [(i,j) | i <- [1..n], j <- [i+1..n], i < j, i .^ g > j .^ g]
    where n = maximum $ supp g

For example:

> inversions $ fromList [1,4,3,2]
[(2,3),(2,4),(3,4)]

With a little thought, you should be able to convince yourself that the number of inversions is equal to the length of the reduced word for g - because each swap that we perform during bubblesort corrects exactly one inversion.

Okay, so this is all very nice, but what use is it? Well, of course, maths doesn't have to be useful, any more than any other aesthetic pursuit. However, as it happens, in this case it is.

In statistics, the Kendall tau test gives an indicator of the correlation between two measured quantities (for example, the height and weight of the test subjects). Suppose that we are given a list of pairs (eg (height,weight) pairs), and we want to know how strongly correlated the first and second quantities are.

Ok, so what we do is, we rank the first quantities from lowest to highest, and replace each quantity by its rank (a number from 1 to n). We do the same for the second quantities. So we end up with a list of pairs of numbers from 1 to n. Now, we sort the list on the first element, and then count the number of inversions in the second element.

For example, suppose our original list was [(1.55m, 60kg), (1.8m, 80kg), (1.5m, 70kg), (1.6m, 72kg)]. Converting to ranks, we get [(2nd,1st),(4th,4th),(1st,2nd),(3rd,3rd)]. Sorting on fst, we get [(1,2),(2,1),(3,3),(4,4)]. Looking at snd, we see that we have just one inversion. The idea is that the fewer inversions we have, the better correlated the two quantities. (Of course in reality there's a bit more to it than that - to convert the number of inversions into a probability, we need to know the distribution of word lengths for Sn, where n is the number of pairs of test data that we have.)

So you can think of the Kendall tau test as saying: What permutation has been applied in moving from the first quantities (the heights) to the second quantities (the weights)? How far is that permutation from the identity (on the Cayley graph)? What proportion of all permutations (in Sn) lie at that distance or less from the identity? (Even more concretely, we can imagine colouring in successive shells of the Cayley graph, working out from the identity, until we hit the given permutation, and then asking what proportion of the "surface" of the graph has been coloured.)

Monday, 20 September 2010

Cayley graphs of groups

[New version HaskellForMaths 0.2.2 released here]

Recently, we've been looking at group presentations, where a group is presented as a set of generators together with a set of relations that hold between those generators. Group elements are then represented as words in the generators.

One can then ask questions about these words, such as: What is the longest (reduced) word in the group? How many (reduced) words are there of each length?

This week I want to look at Cayley graphs, which are a way of visualising groups. Questions about word length translate to questions about path distance in the Cayley graph.

So, the Cayley graph of a group, relative to a generating set gs, is the graph
- with a vertex for each element of the group
- with an edge from x to y just whenever x*g = y for some generator g in gs

Notice that as we have defined it, the edges are directed (from x to y), so this is a directed graph, or digraph.

Here's the Haskell code:

module Math.Algebra.Group.CayleyGraph where

import Math.Algebra.Group.PermutationGroup as P
import Math.Algebra.Group.StringRewriting as SR
import Math.Combinatorics.Graph

import qualified Data.List as L
import qualified Data.Set as S


data Digraph a = DG [a] [(a,a)] deriving (Eq,Ord,Show)

-- Cayley digraph given a group presentation of generators and relations
cayleyDigraphS (gs,rs) = DG vs es where
    rs' = knuthBendix rs
    vs = L.sort $ nfs (gs,rs') -- reduced words
    es = [(v,v') | v <- vs, v' <- nbrs v ]
    nbrs v = L.sort [rewrite rs' (v ++ [g]) | g <- gs]

-- Cayley digraph given group generators as permutations
cayleyDigraphP gs = DG vs es where
    vs = P.elts gs
    es = [(v,v') | v <- vs, v' <- nbrs v ]
    nbrs v = L.sort [v * g | g <- gs]

As an example, let's look at the Cayley digraph of the dihedral group D8 (the symmetries of a square), generated by a rotation and a reflection:


> :load Math.Algebra.Group.CayleyGraph
> let a = p [[1,2,3,4]]
> let b = p [[1,2],[3,4]]
> a^3*b == b*a
True
> cayleyDigraphS (['a','b'],[("aaaa",""),("bb",""),("aaab","ba")])
DG ["","a","aa","aaa","aab","ab","b","ba"] [("","a"),("","b"),("a","aa"),("a","ab"),("aa","aaa"),("aa","aab"),("aaa",""),("aaa","ba"),("aab","aa"),("aab","ab"),("ab","a"),("ab","b"),("b",""),("b","ba"),("ba","aaa"),("ba","aab")]
> cayleyDigraphP [a,b]
DG [[],[[1,2],[3,4]],[[1,2,3,4]],[[1,3],[2,4]],[[1,3]],[[1,4,3,2]],[[1,4],[2,3]],[[2,4]]] [([],[[1,2],[3,4]]),([],[[1,2,3,4]]),([[1,2],[3,4]],[]),([[1,2],[3,4]],[[1,3]]),([[1,2,3,4]],[[1,3],[2,4]]),([[1,2,3,4]],[[2,4]]),([[1,3],[2,4]],[[1,4,3,2]]),([[1,3],[2,4]],[[1,4],[2,3]]),([[1,3]],[[1,4,3,2]]),([[1,3]],[[1,4],[2,3]]),([[1,4,3,2]],[]),([[1,4,3,2]],[[1,3]]),([[1,4],[2,3]],[[1,3],[2,4]]),([[1,4],[2,3]],[[2,4]]),([[2,4]],[[1,2],[3,4]]),([[2,4]],[[1,2,3,4]])]


These are of course the same Cayley digraph, just with different vertex labels. Here's a picture:


The picture might remind you of something. You can think of a Cayley digraph as a state transition diagram, where the states are the group elements, and the transitions are multiplication (on the right) by g, for each generator g. It might help to think of each edge as being labelled by the generator that caused it.

A few things to notice.

First, Cayley digraphs are always regular: the out-degree of each vertex, the number of edges leading out of it, will always equal the number of generators; and similarly for the in-degree, the number of edges leading into each vertex. (Exercise: Prove this.) In fact, we can say more - the graph "looks the same" from any vertex - this follows from the group properties. (Exercise: Explain.)

Second, notice how some of the edges come in pairs going in opposite directions. Why is that? In this case, it's because one of our generators is its own inverse (which one?) - so if it can take you from x to y, then it can take you back again. In general, whenever our set of generators contains a g such that g^-1 is also in the set, then the edges corresponding to g, g^-1 will come in opposing pairs.

Given this, we can omit the arrows on the edges if we adopt the convention that whenever we are given a set of generators, their inverses are also implied. In this way, we obtain an undirected or simple graph. Here's the code:

toSet = S.toList . S.fromList

-- The Cayley graph on the generators gs *and their inverses*, given relations rs
cayleyGraphS (gs,rs) = graph (vs,es) where
    rs' = knuthBendix rs
    vs = L.sort $ nfs (gs,rs') -- all reduced words
    es = toSet [ L.sort [v,v'] | v <- vs, v' <- nbrs v ] -- toSet orders and removes duplicates
    nbrs v = [rewrite rs' (v ++ [g]) | g <- gs]

cayleyGraphP gs = graph (vs,es) where
    vs = P.elts gs
    es = toSet [ L.sort [v,v'] | v <- vs, v' <- nbrs v ]
    nbrs v = [v * g | g <- gs]

For example:

> cayleyGraphS (['a','b'],[("aaaa",""),("bb",""),("aaab","ba")])
G ["","a","aa","aaa","aab","ab","b","ba"] [["","a"],["","aaa"],["","b"],["a","aa"],["a","ab"],["aa","aaa"],["aa","aab"],["aaa","ba"],["aab","ab"],["aab","ba"],["ab","b"],["b","ba"]]



One important point to note is that the Cayley graph of a group is relative to the generators. For example, we saw last time that the dihedral groups can also be generated by two reflections. In the case of D8, we can set r = (1 2)(3 4), s = (1 3).
Before scrolling down, see if you can guess what the Cayley graph looks like. I'll give you a clue: Cayley graphs are always regular - what is the valency of each vertex in this case?


> let r = p [[1,2],[3,4]]
> let s = p [[1,3]]
> (r*s)^4
[]
> cayleyGraphS (['r','s'],[("rr",""),("ss",""),("rsrsrsrs","")])
G ["","r","rs","rsr","rsrs","s","sr","srs"] [["","r"],["","s"],["r","rs"],["rs","rsr"],["rsr","rsrs"],["rsrs","srs"],["s","sr"],["sr","srs"]]



So the point to emphasize is that this graph and the one shown previously are Cayley graphs of the same group. The vertices represent the same elements (considered as permutations). However, because we have taken different sets of generators, we get different edges, and hence different graphs.

Ok, so what does the Cayley graph tell us about the group? Well, as an example, consider the Cayley graph of the Rubik cube group, as generated by the face rotations f, b, l, r, u, d (as defined previously). The vertices of the graph can be identified with the possible positions or states of the cube. The group element 1 corresponds to the solved cube. The edges correspond to single moves that can be made on the cube. If someone gives you a scrambled cube to solve, they are asking you to find a path from that vertex of the Cayley graph back to 1.

Given any graph, and vertices x and y, the distance from x to y is defined as the length of the shortest path from x to y. On the Rubik graph (ie, the Cayley graph of the Rubik cube), the distance from x to 1 is the minimum number of moves needed to solve position x. The HaskellForMaths library provides a distance function on graphs. Thus for example:

> let graphD8rs = cayleyGraphS (['r','s'],[("rr",""),("ss",""),("rsrsrsrs","")])
> distance graphD8rs "" "rsr"
3

The distance from 1 to an element g is of course the length of the reduced word for g.

The diameter of a graph is defined as the maximum distance between vertices. So the diameter of the Rubik graph is the maximum number of moves that are required to solve a scrambled position. It has recently been shown that this number is twenty.

> diameter graphD8rs
4

The diameter of a Cayley graph is the length of the longest reduced word.

That's really all I wanted to say for the moment. Next time, we'll take some of these ideas further.

Sunday, 18 July 2010

Group presentations

Last time we looked at string rewriting systems and the Knuth-Bendix completion algorithm. The motivation for doing that was to enable us to think about groups in a more abstract way than before.

The example we looked at last time was the symmetry group of the square. We found that this group could be generated by two elements a, b, satisfying the relations:
a^4 = 1
b^2 = 1
a^3 b = b a

This way of thinking about the group is called a group presentation, and the usual notation would be:
<a,b | a^4=1, b^2=1, a^3 b = b a>

In our Haskell code, we represent this as:

( ['a','b'], [("aaaa",""),("bb",""),("aaab","ba")] )

We saw how to use the Knuth-Bendix algorithm to turn the relations into a confluent rewrite system:


> :load Math.Algebra.Group.StringRewriting
> mapM_ print $ knuthBendix [("aaaa",""),("bb",""),("aaab","ba")]
("bb","")
("bab","aaa")
("baa","aab")
("aba","b")
("aaab","ba")
("aaaa","")


The rewrite system itself isn't particularly informative. Its importance lies in what it enables us to do. Given any word in the generators, we reduce it as follows: wherever we can find the left hand side of one of rules as a subword, we replace it by the right hand side of the rule. If we keep doing this until there are no more matches, then we end up with a normal form for the element - that is, another word in the generators, which represents the same group element, and is the smallest such word relative to the shortlex ordering. Several things follow from this.

First, the ability to find the shortest word is sometimes useful in itself. If we could do this for the Rubik cube group (taking the six face rotations as generators), then we would be able to code "God's algorithm" to find the shortest solution to any given cube position.

Second, any two words that represent the same group element will reduce to the same normal form. Hence, given any two words in the generators, we can tell whether they represent the same element. This is called "solving the word problem" for the group.

Third, this enables us to list (the normal forms of) all the elements of the group - and hence, among other things, to count them.

Fourth, it enables us to do arithmetic in the group:
- To multiply two elements, represented as words w1, w2, just concatenate them to w1++w2, then reduce using the rewrite system
- The identity element of the group is of course the empty word ""
- But what about inverses?

Strings (lists) under concatenation form a monoid, not a group. So what do we do about inverses?

Well, one possibility is to include them as additional symbols. So, suppose that our generators are a,b. Then we should introduce additional symbols a-1, b-1, and consider words over the four symbols {a,b,a-1,b-1}. (For brevity, it is customary to use the symbols A, B for a-1, b-1.)

If we take this approach, then we will need to add some new rules too. We will need the rules a a-1 = 1, etc. We will probably also need the "inverses" of the relations in our presentation. For example, if we have a^4 = 1, then we should also have a rule (a-1)^4 = 1.

It's going to be a bit of a pain. (And it's probably going to cause Knuth-Bendix to get indigestion, in some cases at least.) Luckily, for finite groups, we don't really need this. In a finite group, each generator must have finite order: in our example a^4 = 1, b^2 = 1. So the inverse of each generator is itself a power of that generator - a-1 = a^3, b-1 = b. So for a finite group - or in fact any group where the generators are all of finite order - then the inverses are already there, expressible as words in the generators.

So for most purposes, we have no need to introduce the inverses as new symbols. For example, if we want to list the elements of a finite group, or tell whether two words in the generators represent the same element, then we are fine. When it will matter is if we are specifically interested in the length of the words. For example, if we want God's algorithm for solving Rubik's cube, we are interested in the length of words in the generators and their inverses - the clockwise and anti-clockwise face rotations.

There is one situation when even this won't matter - and that is if the generators are their own inverses. If we have a generator g such that g^2 = 1, then it follows that g-1 = g. Such an element is called an involution.

Are there groups which can be generated by involutions alone? Yes, there are. Let's have a look at a couple.

Consider the symmetry group of a regular polygon, say a pentagon. Consider the two reflections shown below. 'a' is the reflection in an axis through a vertex, and 'b' is the reflection in an axis through the midpoint of an adjacent edge. Hence the angle between the axes is pi/5 (or for an n-gon, pi/n).



It should be clear that ab is a 1/5 rotation of the pentagon. It follows that a,b generate the symmetry group of the pentagon, with a^2 = b^2 = (ab)^5 =1.


> elts (['a','b'], [("aa",""), ("bb",""), ("ababababab","")])
["","a","b","ab","ba","aba","bab","abab","baba","ababa"]
> length it
10


Next, consider the symmetric group S4 (or in general, Sn). It can be generated by the transpositions s1 = (1 2), s2 = (2 3), and s3 = (3 4), which correspond to the diagrams below:


Now, multiplication in the group corresponds to concatenation of the diagrams, going down the page. For example:



In fact, each of those diagrams represents the identity element - as you can check by following each point along the lines down the page, and checking that it ends up where it started. Hence each diagram represents a relation for S4. The diagrams show that s1^2 = 1, (s1s2)^3=1, and (s1s3)^2 = 1.

In the general case, it's clear that Sn can be generated by n-1 transpositions si of the form (i i+1), and that they satisfy the following relations:
si^2 = 1
(si sj)^3 = 1 if |i-j| = 1
(si sj)^2 = 1 if |i-j| > 1

Here's some Haskell code to construct these presentations of Sn. (Did I mention that all of the string rewriting code works on arbitrary lists, not just strings?)


newtype S = S Int deriving (Eq,Ord)

instance Show S where
    show (S i) = "s" ++ show i

_S n = (gs, r ++ s ++ t) where
    gs = map S [1..n-1]
    r = [([S i, S i],[]) | i <- [1..n-1]]
    s = [(concat $ replicate 3 [S i, S (i+1)],[]) | i <- [1..n-2]]
    t = [([S i, S j, S i, S j],[]) | i <- [1..n-1], j <- [i+2..n-1]]


And just to check:

> _S 4
([s1,s2,s3],[([s1,s1],[]),([s2,s2],[]),([s3,s3],[]),([s1,s2,s1,s2,s1,s2],[]),([s2,s3,s2,s3,s2,s3],[]),([s1,s3,s1,s3],[])])
> elts $ _S 4
[[],[s1],[s2],[s3],[s1,s2],[s1,s3],[s2,s1],[s2,s3],[s3,s2],[s1,s2,s1],[s1,s2,s3],[s1,s3,s2],[s2,s1,s3],[s2,s3,s2],[s3,s2,s1],[s1,s2,s1,s3],[s1,s2,s3,s2],[s1,s3,s2,s1],[s2,s1,s3,s2],[s2,s3,s2,s1],[s1,s2,s1,s3,s2],[s1,s2,s3,s2,s1],[s2,s1,s3,s2,s1],[s1,s2,s1,s3,s2,s1]]
> length it
24


Anyway, that's it for now. Where I'm heading with this stuff is finite reflection groups and Coxeter groups, but I might take a couple of detours along the way.

Friday, 28 May 2010

String rewriting and Knuth-Bendix completion

Previously in this blog we have been looking at symmetry groups of combinatorial structures. We have represented these symmetries concretely as permutations - for example, symmetries of graphs as permutations of their vertices. However, mathematicians tend to think about groups more abstractly.

Consider the symmetry group of the square (the cyclic graph C4). It can be generated by two permutations:

> :load Math.Algebra.Group.PermutationGroup
> let a = p [[1,2,3,4]]
> let b = p [[1,2],[3,4]]




We can list various relations that are satisfied by these generators:
a^4 = 1
b^2 = 1
a^3 b = b a

Of course, there are other relations that hold between the generators. However, the relations above are in fact sufficient to uniquely identify the group (up to isomorphism).

Since a and b generate the group, any element in the group can be expressed as a product of a's and b's (and also their inverses, but we'll ignore that for now). However, there are of course an infinite number of such expressions, but only a finite number of group elements, so many of these expressions must represent the same element. For example, since b^2=1, then abba represents the same element as aa.

Given two expressions, it would obviously be helpful to have a method for telling whether they represent the same group element. What we need is a string rewriting system. We can think of expressions in the generators as words in the symbols 'a' and 'b'. And we can reinterpret the relations above as rewrite rules:
"aaaa" -> ""
"bb" -> ""
"aaab" -> "ba"

Each of these rules consists of a left hand side and a right hand side. Given any word in the generator symbols, if we find the left hand side anywhere in the word, we can replace it by the right hand side. For example, in the word "abba", we can apply the rule "bb" -> "", giving "aa".

So, the idea is that given any word in the generator symbols, we repeatedly apply the rewrite rules until we can go no further. The hope is that if two words represent the same group element, then we will end up with the same word after rewriting. We'll see later that there's a bit more to do before that will work, but for the moment, let's at least write some Haskell code to do the string rewriting.

So the first thing we are going to need to do is try to find the left hand side of a rule as a subword within a word. Actually, we want to do a bit more than that - if X is our word, and Y the subword, then we want to find the A and B such that X = AYB.




import qualified Data.List as L

splitSubstring xs ys = splitSubstring' [] xs where
    splitSubstring' ls [] = Nothing
    splitSubstring' ls (r:rs) =
        if ys `L.isPrefixOf` (r:rs)
        then Just (reverse ls, drop (length ys) (r:rs))
        else splitSubstring' (r:ls) rs


Then if our rewrite rule is L -> R, then a single application of the rule consists in replacing L by R within the word:




rewrite1 (l,r) xs =
    case xs `splitSubstring` l of
    Nothing -> Nothing
    Just (a,b) -> Just (a++r++b)


Okay, so suppose we have a rewrite system (that is, a collection of rewrite rules), and a word. Then we want to repeatedly apply the rules until we find that no rule applies:




rewrite rules word = rewrite' rules word where
    rewrite' (r:rs) xs =
        case rewrite1 r xs of
        Nothing -> rewrite' rs xs
        Just ys -> rewrite' rules ys
    rewrite' [] xs = xs


For example:

> :load Math.Algebra.Group.StringRewriting
> rewrite [("aaaa",""),("bb",""),("aaab","ba")] "abba"
"aa"


So far, so good. However, there are some problems with the rewrite system that we constructed above. Suppose that the word we wanted to reduce was "aaabb".
If we apply the rule "aaab" -> "ba", then we have "aaabb" -> "bab".
However, if we apply the rule "bb" -> "", then we have "aaabb" -> "aaa".
Neither "bab" nor "aaa" reduces any further. So we have two problems:
- The same starting word can end up at different end words, depending on the order in which we apply the rules
- We can see from the example that the words "bab" and "aaa" actually represent the same element in our group, but our rewrite system can't rewrite either of them

What can we do about this? Well here's an idea. Let's just add "bab" -> "aaa" as a new rewrite rule to our system. We know that they are equal as elements of the group, so this is a valid thing to do.

That's good, but we still have problems. What about the word "aaaab"?
If we apply the rule "aaaa" -> "", then "aaaab" -> "b"
On the other hand, if we apply the rule "aaab" -> "ba", then "aaaab" -> "aba"

So let's do the same again, and add a new rule "aba" -> "b".

What we're doing here is called the Knuth-Bendix algorithm. Let's take a step back. So in each case, I found a word that could be reduced in two different ways. How did I do that? Well, what I was looking for is two rules with overlapping left hand sides. That is, I was looking for rules L1 -> R1, L2 -> R2, with
L1 = AB
L2 = BC
A pair of rules like this is called a critical pair. If we can find a critical pair, then by looking at the word ABC, we see that
ABC = (AB)C = L1 C -> R1 C
ABC = A(BC) = A L2 -> A R2
So we are justified in adding a new rule R1 C -> A R2

So the Knuth-Bendix algorithm basically says, for each critical pair, introduce a new rule, until there are no more critical pairs. There's a little bit more to it than that:
- We want the rewrite system to reduce the word. That means that we want an ordering on words, and given a pair, we want to make them into a rule that takes the greater to the lesser, rather than vice versa. The most obvious ordering to use is called shortlex: take longer words to shorter words, and if the lengths are equal, use alphabetical ordering.
- Whenever we introduce a new rule, it might be that the left hand side of some existing rule becomes reducible. In that case, the existing rule becomes redundant, since any word that it would reduce can now be reduced by the new rule.

Here's the code:




-- given two strings x,y, find if possible a,b,c with x=ab y=bc
findOverlap xs ys = findOverlap' [] xs ys where
    findOverlap' as [] cs = Nothing
    findOverlap' as (b:bs) cs =
        if (b:bs) `L.isPrefixOf` cs
        then Just (reverse as, b:bs, drop (length (b:bs)) cs)
        else findOverlap' (b:as) bs cs

shortlex x y = compare (length x, x) (length y, y)

ordpair x y =
    case shortlex x y of
    LT -> Just (y,x)
    EQ -> Nothing
    GT -> Just (x,y)

knuthBendix1 rules = knuthBendix' rules pairs where
    pairs = [(lri,lrj) | lri <- rules, lrj <- rules, lri /= lrj]
    knuthBendix' rules [] = rules
    knuthBendix' rules ( ((li,ri),(lj,rj)) : ps) =
        case findOverlap li lj of
        Nothing -> knuthBendix' rules ps
        Just (a,b,c) -> case ordpair (rewrite rules (ri++c)) (rewrite rules (a++rj)) of
                        Nothing -> knuthBendix' rules ps -- they both reduce to the same thing
                        Just rule' -> let rules' = reduce rule' rules
                                          ps' = ps ++
                                                [(rule',rule) | rule <- rules'] ++
                                                [(rule,rule') | rule <- rules']
                                      in knuthBendix' (rule':rules') ps'
   reduce rule@(l,r) rules = filter (\(l',r') -> not (L.isInfixOf l l')) rules


For example:


> knuthBendix1 [("aaaa",""), ("bb",""), ("aaab","ba")]
[("baa","aab"),("bab","aaa"),("aba","b"),("aaaa",""),("bb",""),("aaab","ba")]


A few words about the Knuth-Bendix algorithm
- It is not guaranteed to terminate. Every time we introduce a new rule, we have the potential to create new critical pairs, and there are pathological examples where this goes on forever
- The algorithm can be made slightly more efficient, by doing things like choosing to process shorter critical pairs first. In the HaskellForMaths library, a more efficient version is given, called simply "knuthBendix"

Back to the example. So Knuth-Bendix has found three new rules. The full system, with these new rules added, has no more critical pairs. As a consequence, it is a confluent rewrite system - meaning that if you start at some given word, and reduce it using the rules, then it doesn't matter in what order you apply the rules, you will always end up at the same word. This word that you end up with can therefore be used as a normal form.

This allows us to "solve the word problem" for this group. That is, given any two words in the generator symbols, we can find out whether they represent the same group element by rewriting them both, and seeing if they end up at the same normal form. For example:


> let rules = knuthBendix [("aaaa",""), ("bb",""), ("aaab","ba")]
> rewrite rules "aaaba"
"aab"
> rewrite rules "baabb"
"aab"
> rewrite rules "babab"
"b"


So we see that "aaaba" and "baabb" represent the same group element, whereas "babab" represents a different one. (If you want, you could go back and check this using the original permutations.)

We can even list (the normal forms of) all elements of the group. What we do is start with the empty word (which represents the identity element of the group), and then incrementally build longer and longer words. At each stage, we look at all combinations that can be formed by pre-pending a generator symbol to a word from the preceding stage. However, if we ever come across a word which can be reduced, then we know that it - and any word that could be formed from it at a later stage - is not a normal form, and so can be discarded. Here's the code:




nfs (gs,rs) = nfs' [[]] where
    nfs' [] = [] -- we have run out of words
    nfs' ws = let ws' = [g:w | g <- gs, w <- ws, (not . isNewlyReducible) (g:w)]
              in ws ++ nfs' ws'
    isNewlyReducible w = any (`L.isPrefixOf` w) (map fst rs)

elts (gs,rs) = nfs (gs, knuthBendix rs)


For example:


> elts (['a','b'], [("aaaa",""), ("bb",""), ("aaab","ba")])
["","a","b","aa","ab","ba","aaa","aab"]


As expected, we have eight elements.

That's enough for now. Next time (hopefully) I'll look at some more examples.

Sunday, 25 April 2010

Block systems and block homomorphism



Recently in this blog, we looked at the strong generating set (SGS) algorithm for permutation groups, and how we can use it to investigate the structure of groups. Last time, we saw how to partially "factor" intransitive groups, using the transitive constituent homomorphism. (Recall that by "factoring" a group G, we mean finding a proper normal subgroup K, and consequently also a quotient group G/K - which is equivalent to finding a proper homomorphism from G.) This time, I want to do the same for imprimitive groups. So, what is an imprimitive group?

Well, given a permutation group acting on a set X, it can happen that X consists of "blocks" Y1, Y2, ... of points which always "move together". That is, a subset Y of X is a block if for all g in G, Y^g (the image of Y under the action of g) is either equal to Y or disjoint from it. A full set of blocks (that is, blocks Y1, Y2, ... which are disjoint, and whose union is the whole of X) is called a block system.

For example, suppose that X is the vertices of the hexagon. The symmetry group of the hexagon is the dihedral group D12, generated by a rotation and a reflection:

> :load Math.Algebra.Group.Subquotients
> mapM_ print $ _D 12
[[1,2,3,4,5,6]]
[[1,6],[2,5],[3,4]]


A block system for the hexagon is shown below. The blocks are the pairs of opposite vertices. You can verify that they satisfy the definition of blocks: any symmetry must take a pair of opposite points either to itself, or to another pair disjoint from it.

A given group can have more than one block system. Here is another block system for the hexagon. The blocks are the two equilateral triangles formed by the vertices.

There are also the trivial block systems, consisting of either just one block containing all the points, or a block for each point. From now on, we will silently exclude these.

So, I was meant to be telling you what an imprimitive group is. Well, it's just a group which has a non-trivial block system. Conversely, a primitive group is one which has no non-trivial block system.

When we have an imprimitive group, we will be able to form a homomorphism - and hence factor the group - by considering the induced action of the group on the blocks. But I'm jumping ahead. First we need to write some Haskell code - to find block systems.

The idea is to write a function that, given a pair of points Y = {y1,y2} in X (or indeed any subset Y of X), can find the smallest block containing Y. The way it works is as follows. We start by supposing that each point is in a block of its own, except for the points in Y. We initialise a map, with the points in X as keys, and the blocks as values, where we represent a block by its least element.

Now, suppose that we currently think that the minimal block is Y = {y1,y2,...}. What we're going to do is work through the elements of Y, and work through the generators of G, trying to find a problem. So suppose that we have got as far as some element y of Y, and some generator g of G. We know that y is in the same block as y1, and what we have to check is that y^g is in the same block as y1^g. So we look up their representatives in the map, and check that they're the same. If they're not, then we need to merge the two classes. Here's the code (it's a little opaque, but it's basically doing what I just described).


minimalBlock gs ys@(y1:yt) = minimalBlock' p yt gs where
    xs = foldl union [] $ map supp gs
    p = M.fromList $ [(yi,y1) | yi <- ys] ++ [(x,x) | x <- xs \\ ys]
    minimalBlock' p (q:qs) (h:hs) =
        let r = p M.! q         -- representative of class containing q
            k = p M.! (q .^ h)  -- rep of class (q^h)
            l = p M.! (r .^ h)  -- rep of class (r^h)
        in if k /= l -- then we need to merge the classes
           then let p' = M.map (\x -> if x == l then k else x) p
                    qs' = qs ++ [l]
                in minimalBlock' p' (q:qs') hs
           else minimalBlock' p (q:qs) hs
    minimalBlock' p (q:qs) [] = minimalBlock' p qs gs
    minimalBlock' p [] _ =
        let reps = toListSet $ M.elems p
        in L.sort [ filter (\x -> p M.! x == r) xs | r <- reps ]

Once we have this function, then finding the block systems is simple - just take each pair {x1,xi} from X, and find the minimal block containing it.

blockSystems gs
    | isTransitive gs = toListSet $ filter (/= [x:xs]) $ map (minimalBlock gs) [ [x,x'] | x' <- xs ]
    | otherwise = error "blockSystems: not transitive"
    where x:xs = foldl union [] $ map supp gs

If we have an SGS for G, then we can do slightly better. For suppose that within the stabiliser Gx1, there is an element taking xi to xj. Then clearly xi and xj must be in the same minimal block. So in fact, we need only consider pairs {x1,xi}, with xi the minimal element of each orbit in Gx1. (Of course, the point is that if we have an SGS for G, then we can trivially list a set of generators for Gx1.)

blockSystemsSGS gs = toListSet $ filter (/= [x:xs]) $ map (minimalBlock gs) [ [x,x'] | x' <- rs ]
    where x:xs = foldl union [] $ map supp gs
          hs = filter (\g -> x < minsupp g) gs -- sgs for stabiliser Gx
          os = orbits hs
          rs = map head os ++ (xs \\ L.sort (concat os)) -- orbit representatives, including singleton cycles


Let's test it:

> mapM_ print $ blockSystems $ _D 12
[[1,3,5],[2,4,6]]
[[1,4],[2,5],[3,6]]


Okay, so given a group, we can find its non-trivial block systems, if any. What next? Well, as I hinted earlier, this enables us to factor the group. For if there is a non-trivial block system, then the action of the group on the points induces a well-defined action on the blocks. This induced action gives us a homomorphism from our original group G, a subgroup of Sym(X), to another group H, a subgroup of Sym(B), where B is the set of blocks.

So as we did last time, we can find the kernel and image of the homomorphism, and thus factor the group. How do we do that?

Well, it's simple. In the following code, the function lr takes a group element acting on the points, and returns a group element acting on the blocks (in the Left side) and the points (in the Right side) in an Either union. If we do this to all the group generators, and then find an SGS, then as the Left blocks sort before the Right points, then the SGS will split neatly into two parts:
- The initial segment of the SGS will consist of elements which move the Left blocks. If we restrict their action to just the blocks, we will have an SGS for the image of the homomorphism, acting on the blocks.
- The final segment of the SGS will consist of elements which fix all the Left blocks. These elements move points but not blocks, so they form an SGS for the kernel of the homomorphism.

blockHomomorphism' gs bs = (ker,im) where
    gs' = sgs $ map lr gs
    lr g = fromPairs $ [(Left b, Left $ b -^ g) | b <- bs] ++ [(Right x, Right y) | (x,y) <- toPairs g]
    ker = map unRight $ dropWhile (isLeft . minsupp) gs' -- stabiliser of the blocks
    im = map restrictLeft $ takeWhile (isLeft . minsupp) gs' -- restriction to the action on blocks

blockHomomorphism gs bs
    | bs == closure bs [(-^ g) | g <- gs] -- validity check on bs
        = blockHomomorphism' gs bs


Let's try it out on our two block systems for the hexagon:

> blockHomomorphism (_D 12) [[1,4],[2,5],[3,6]]
([[[1,4],[2,5],[3,6]]],
 [[[[1,4],[2,5],[3,6]]],[[[2,5],[3,6]]]])


I've formatted the output for clarity. The first line is (an SGS for) the kernel, consisting of elements of D12 which permute points within the blocks, without permuting the blocks. In this case, the kernel is generated by the 180 degree rotation, which swaps the points within each pair. The second line is (an SGS for) the image, consisting of the induced action of D12 on the blocks. In this case, we have the full permutation group S3 acting on the three pairs of points.

> blockHomomorphism (_D 12) [[1,3,5],[2,4,6]]
([[[1,5,3],[2,6,4]],[[2,6],[3,5]]],
 [[[[1,3,5],[2,4,6]]]])


In this case the kernel is generated by a 120 degree rotation and a reflection, and consists of all group elements which send odd points to odd and even points to even, thus preserving the blocks. The image has only one non-trivial element, which just swaps the two blocks.

Armed with this new tool, let's have another look at Rubik's cube. Recall that we labelled the faces of the cube as follows:


Last time, we split the Rubik cube group into two homomorphic images - a group acting on just the corner faces, and a group acting on just the edge faces. Let's look for block systems in these groups:

> :load Math.Projects.Rubik
> let [cornerBlocks] = blockSystems imCornerFaces
> let [edgeBlocks] = blockSystems imEdgeFaces
> cornerBlocks
[[1,17,23],[3,19,41],[7,29,31],[9,33,47],[11,21,53],[13,43,51],[27,37,59],[39,49,57]]
> edgeBlocks
[[2,18],[4,26],[6,44],[8,32],[12,52],[14,22],[16,42],[24,56],[28,34],[36,48],[38,58],[46,54]]


It's obvious really - in the corner group, we have a block system with blocks consisting of the three corner faces that belong to the same corner piece, and in the edge group, we have a block system with blocks consisting of the two edge faces that belong to the same edge piece. Furthermore, these are the only block systems.

So we can form the kernel and image under the block homomorphism:

> let (kerCornerBlocks,imCornerBlocks) = blockHomomorphism imCornerFaces cornerBlocks
> let (kerEdgeBlocks,imEdgeBlocks) = blockHomomorphism imEdgeFaces edgeBlocks


If we look at the sizes of these groups, the structure will be obvious:

> orderSGS kerCornerBlocks
2187
> orderSGS imCornerBlocks
40320


These are 3^7, and 8! respectively. The kernel is the permutations of the corner faces which leave the corner blocks where they are. It turns out that whenever you twist one corner block, you must untwist another. So when you have decided what to do with seven corners, the eighth is determined - hence 3^7. For the image, we have eight blocks, and 8! permutations of them, so this must be the full symmetry group S8 - meaning that we can perform any rearrangement of the corner blocks that is desired.

> orderSGS kerEdgeBlocks
2048
> orderSGS imEdgeBlocks
479001600


These are 2^11 and 12! respectively. For the kernel, whenever we flip one edge piece we must also flip another. So when we have decided what to do with eleven edges, the twelfth is determined - hence 2^11. For the image, we have twelve pieces, and 12! permutations of them, so we have the full symmetry group S12 on edge blocks.

That's it.

Incidentally, my references for this material are:
- Holt, Handbook of Computational Group Theory
- Seress, Permutation Group Algorithms
both of which are very good - but expensive.

These books, particularly the latter, go on to describe further algorithms that can be used to factor even transitive primitive groups, enabling us to arrive at a full decomposition of a group into simple groups. Unfortunately, the algorithms get a bit more complicated after this, and I haven't yet implemented the rest in HaskellForMaths.

Tuesday, 23 March 2010

Transitive constituent homomorphism

[New version HaskellForMaths 0.2.1 available here]

Last time, we looked at what it means to be a strong generating set (SGS) for a permutation group, and how to find one. We saw that with an SGS we can easily calculate the number of elements in the group, and test whether an arbitrary permutation is a member of the group.

This time, I want to start showing how strong generating sets can be used as the basis for further algorithms to investigate the structure of groups. In a previous post, I explained about normal subgroups and quotient groups. Over the next couple of posts, I want to show how, for groups having the appropriate form, we can find some of the easier normal subgroups and quotient groups. Specifically, in this post I want to look at intransitive groups, and in the next at imprimitive groups.

Many of the groups we have looked at so far - such as the symmetries of the pentagon, cube, and so on - have been "transitive" - meaning that given any two points, there is a group element which takes one to the other. (Or to put it another way, all the points "look the same".)

However, it is easy to find groups that are not transitive on their points. For example, if we take any graph which is not regular (that is, it has vertices of different valencies), then the symmetry group is not transitive (because no symmetry can take a vertex of one valency to a vertex of another valency).

In order to demonstrate the code that I want to look at this week, I'm going to need an example - so here are the generators of a group G:

> :load Math.Algebra.Group.Subquotients
> let gs = map p [ [[1,2,3]], [[4,5,6]], [[1,2],[4,5]] ]


We can think of this group as the symmetry group of some kind of plastic puzzle, as shown below. The puzzle consists of a blue and a red triangle. Valid moves consist of rotating the blue triangle, rotating the red triangle, or performing a double flip, which swaps a pair of blue points and at the same time a pair of red points.



It's a small group, so let's have a look at the elements:

> mapM_ print $ elts gs
[]
[[1,2],[4,5]]
[[1,2],[4,6]]
[[1,2],[5,6]]
[[1,2,3]]
[[1,2,3],[4,5,6]]
[[1,2,3],[4,6,5]]
[[1,3,2]]
[[1,3,2],[4,5,6]]
[[1,3,2],[4,6,5]]
[[1,3],[4,5]]
[[1,3],[4,6]]
[[1,3],[5,6]]
[[2,3],[4,5]]
[[2,3],[4,6]]
[[2,3],[5,6]]
[[4,5,6]]
[[4,6,5]]


Notice that the group is not transitive: there is no move that takes a blue point to a red point, or vice versa.

So we have a group G, acting on a set X. If G is not transitive on X, then we can write X as a disjoint union of subsets A, B, C, ..., such that G is transitive on each of the subsets. In our example, we can take A = [1,2,3], B = [4,5,6], and we see that G is transitive on A, and on B. A and B are called the transitive constituents (or the orbits) of G.

It's fairly straightforward to write Haskell code to find the orbits of a permutation group. This allows us to find:

> orbits gs
[[1,2,3],[4,5,6]]


[The implementation is left as an exercise.]

Okay, so what's so interesting about intransitive groups? Well, an intransitive group always has a non-trivial normal subgroup, so we can always "factor" it into smaller groups. How?

Well, take one of the transitive constituents. In our example, let's take [1,2,3]. Then we can consider the restriction of the action of the group to just this constituent. For our generators, the restriction gives us:

[[1,2,3]] -> [[1,2,3]]
[[4,5,6]] -> []
[[1,2],[4,5]] -> [[1,2]]


It might look as if the first generator got taken to itself - but it is important to realise that these are elements of different groups. This would have been clearer if we had written out the permutations in row notation:

1 2 3 4 5 6 -> 1 2 3
2 3 1 4 5 6    2 3 1


Or if we had included singleton cycles in the cycle notation:

[[1,2,3],[4],[5],[6]] -> [[1,2,3]]

So this restriction of the action is a map from our group - a subgroup of Sym([1..6]) - to another group - a subgroup of Sym([1..3]).

If we call this map f, then it should be obvious that f has the following properties:
- f(1) = 1 (where 1 here means the identity element in the respective groups, not the point 1)
- f(g*h) = f(g)*f(h) (consider their action on a point x)
- f(g^-1) = f(g)^-1

Hence f is a homomorphism - a function between groups that preserves the group structure.

The kernel of a homomorphism is the set {g <- G | f(g) = 1}. The kernel is always a normal subgroup:
It's obviously a subgroup. (Why?)
And f(h^-1 g h) = f(h)^-1 f(g) f(h) (by the homomorphism properties).
So if f(g) = 1, then f(h^-1 g h) = 1.
In other words, if g is in the kernel, then so is h^-1 g h - which is just the condition for the kernel to be a normal subgroup.

The image of a homomorphism is the set {f(g) | g <- G}. This is naturally isomorphic to the quotient group G/K, where K is the kernel: since for k <- K, f(gk) = f(g) f(k) = f(g) (since f(k) = 1) - so elements of the image are in one-to-one correspondence with cosets of the kernel, and it is easy to check that the group operations correspond too.

Hence any (non-trivial) homomorphism induces a "factorisation" of the group into smaller groups K and G/K - the kernel and the image.

In our example, the kernel is < [[4,5,6]] > <= Sym([1..6]), and the image is < [[1,2,3]], [[1,2]] > = Sym([1..3]).

We would like a Haskell function that can find the kernel and image of a transitive constituent homomorphism for us, like so:

> transitiveConstituentHomomorphism gs [1,2,3]
([[[4,5,6]]],[[[1,2,3]],[[1,2]]])


So how do we do that? Well, the basic idea is that we will convert all our permutations on X into permutations on Either A B, where A is the transitive constituent we are restricting to, and B is the rest. In our group, we have X = [1..6], A = [1,2,3], B = [4,5,6], so for example we will convert [[1,2],[4,5]] into [[Left 1, Left 2], [Right 4, Right 5]].

Next, we construct a strong generating set for the new group. Since the SGS algorithm looks for point stabilisers /in order/ - it will first find point stabilisers for all the Left points, and then for all the Right points. So our SGS will split into two parts:
- A first part consisting of elements that move some of the Left points. If we restrict this part to its action on the Left points, we will have an SGS for the image.
- A second part consisting of elements that move only the Right points. This part is an SGS for the kernel.

For example:

> mapM_ print $ sgs $ map p [ [[Left 1, Left 2, Left 3]], [[Right 4, Right 5, Right 6]], [[Left 1, Left 2],[Right 4, Right 5]] ]
[[Left 1,Left 2,Left 3],[Right 4,Right 6,Right 5]]
[[Left 2,Left 3],[Right 4,Right 5]]
[[Right 4,Right 6,Right 5]]


[Note that as our SGS algorithm uses randomness, we might get a different set of generators each time we call it.]

Okay, so we know what to do. Time for some code. We need first of all a few utility functions for packing and unpacking our group elements into the Left / Right parts of the Either type.

isLeft (Left _) = True
isLeft (Right _) = False

isRight (Right _) = True
isRight (Left _) = False


unRight = fromPairs . map (\(Right a, Right b) -> (a,b)) . toPairs

restrictLeft g = fromPairs [(a,b) | (Left a, Left b) <- toPairs g]
-- note that this is doing a filter - taking only the left part of the action - and a map, unLefting


Then the code is simple:


transitiveConstituentHomomorphism gs delta
    | delta == closure delta [(.^ g) | g <- gs] -- delta is a transitive constituent
       = transitiveConstituentHomomorphism' gs delta

transitiveConstituentHomomorphism' gs delta = (ker, im) where
    gs' = sgs $ map (fromPairs . map (\(a,b) -> (lr a, lr b)) . toPairs) gs
    -- as delta is a transitive constituent, we will always have a and b either both Left or both Right
    lr x = if x `elem` delta then Left x else Right x
    ker = map unRight $ dropWhile (isLeft . minsupp) gs' -- pointwise stabiliser of delta
    im = map restrictLeft $ takeWhile (isLeft . minsupp) gs' -- restriction of the action to delta


That's it.


Okay, so the transitive constituent homomorphism gives us a way to "take apart" intransitive groups. Let's use it to take another look at the Rubik cube group.

Remember how we labelled the faces of the Rubik cube:



Then the generators of the Rubik cube group are:

f = p [[ 1, 3, 9, 7],[ 2, 6, 8, 4],[17,41,33,29],[18,44,32,26],[19,47,31,23]]
b = p [[51,53,59,57],[52,56,58,54],[11,27,39,43],[12,24,38,46],[13,21,37,49]]
l = p [[21,23,29,27],[22,26,28,24],[ 1,31,59,11],[ 4,34,56,14],[ 7,37,53,17]]
r = p [[41,43,49,47],[42,46,48,44],[ 3,13,57,33],[ 6,16,54,36],[ 9,19,51,39]]
u = p [[11,13,19,17],[12,16,18,14],[ 1,21,51,41],[ 2,22,52,42],[ 3,23,53,43]]
d = p [[31,33,39,37],[32,36,38,34],[ 7,47,57,27],[ 8,48,58,28],[ 9,49,59,29]]

rubikCube = [f,b,l,r,u,d]


What orbits does it have?

> :load Math.Projects.Rubik
> orbits rubikCube
[[1,3,7,9,11,13,17,19,21,23,27,29,31,33,37,39,41,43,47,49,51,53,57,59],[2,4,6,8,12,14,16,18,22,24,26,28,32,34,36,38,42,44,46,48,52,54,56,58]]


It's obvious really: the odd numbered points are corner faces, and the even numbered points are edge faces - and there is no valid move that can take a corner face to an edge face, or vice versa. So the group is intransitive. So we can define:

[cornerFaces,edgeFaces] = orbits rubikCube

We then have two different ways to split the group, depending which transitive constituent we take:

(kerCornerFaces,imCornerFaces) = transitiveConstituentHomomorphism rubikCube cornerFaces

(kerEdgeFaces,imEdgeFaces) = transitiveConstituentHomomorphism rubikCube edgeFaces


Note that:

> orderSGS imCornerFaces * orderSGS imEdgeFaces
86504006548979712000
> let rubikSGS = sgs rubikCube
> orderSGS rubikSGS
43252003274489856000


Why are they not equal? It is because in fact we can't operate on corners and edges totally independently. The following examples show that we can't swap a pair of corners without also swapping a pair of edges, and vice versa:

> isMemberSGS rubikSGS (p [[1,3],[17,41],[19,23],[2,6],[18,44]])
True
> isMemberSGS rubikSGS (p [[1,3],[17,41],[19,23]])
False
> isMemberSGS rubikSGS (p [[2,6],[18,44]])
False


(This is similar to the way that in our original example, we couldn't swap a pair of blue points without also swapping a pair of red points.)

That's it for now. Next time, I'll show how we can factor the corner and edge groups still further, to arrive at a fairly complete understanding of the structure of the Rubik cube group.

(By the way, I would like to acknowledge a debt to the following example from the GAP system: http://www.gap-system.org/Doc/Examples/rubik.html )

Sunday, 14 February 2010

How to find a strong generating set

Nearly three months since my last post - sorry! I have an excuse - we've been moving house - but the truth is I've had a bit of writer's block over this post. Anyway...

Previously in this blog we have been looking at permutation groups, and especially those which arise as the symmetry groups of graphs or other combinatorial objects. We developed a "graphAuts" function, which finds generators for the group of symmetries of a graph. In fact, "graphAuts" finds a strong generating set or SGS, which is a generating set of a special form, making various calculations in permutation groups relatively easy. What I want to look at this week, is how we can find a strong generating set for an arbitrary permutation group, given generators for the group.

Let's remind ourselves what a strong generating set looks like. Here's q 3, the graph of the cube.

The graphAuts function returns us a strong generating set for the symmetry group of the cube:

> :load Math.Combinatorics.GraphAuts
> mapM_ print $ graphAuts $ q 3
[[0,1],[2,3],[4,5],[6,7]]
[[0,2,3,1],[4,6,7,5]]
[[0,4,6,7,3,1],[2,5]]
[[1,2],[5,6]]
[[1,4,2],[3,5,6]]
[[2,4],[3,5]]

So what is special about this set of symmetries? Well, first of all, the strong generating set is a generating set for the group - all symmetries of the cube can be expressed as products of these generators (and their inverses). But notice how the SGS is composed of a number of "levels". The first level consists of some elements that move 0. The second level consists of some elements that fix 0 but move 1. The third level consists of an element that fixes 0 and 1 but moves 2. So we can think of there being a 0-level, a 1-level, and a 2-level. The sequence [0,1,2] is called the base for the SGS.

Let's call our group G. Now, if we discard the first level of the SGS (the 0-level), the elements that remain form an SGS for the subgroup of elements that fix 0, which is called the point stabiliser of 0, denoted G0. If we also discard the second level (the 1-level), the element that remains is an SGS for the subgroup of elements that fix both 0 and 1, which is called the pointwise stabiliser of {0,1}, denoted G{0,1}. So an SGS defines a sequence of subgroups called a point stabiliser chain for the group:

G_0 = G                    >  G_1 = G_{0}        >  G_2 = G_{0,1}  >  G_{0,1,2} = 1
[[0,1],[2,3],[4,5],[6,7]]     [[1,2],[5,6]]         [[2,4],[3,5]]
[[0,2,3,1],[4,6,7,5]]         [[1,4,2],[3,5,6]]
[[0,4,6,7,3,1],[2,5]]         [[2,4],[3,5]]
[[1,2],[5,6]]
[[1,4,2],[3,5,6]]
[[2,4],[3,5]]

Now, consider a pair of successive subgroups within this chain, for example, G_{0} and G_{0,1}. G_{0} consists of some elements that fix 1, some elements that send 1 to 2, and some elements that send 1 to 4. The elements that fix 1 are just the subgroup G_{0,1}, the point stabiliser of 1 within G_{0}. The elements that send 1 to 2, and those that send 1 to 4, are cosets of G_{0,1} within G_{0}. We look for a representative of each coset - that is, an element that sends 1 to 2 - [[1,2],[5,6]], an element that sends 1 to 4 - [[1,4,2],[3,5,6]], and for an element that sends 1 to 1 (fixes 1), we can take the identity. Once we have done that, every element of G_{0} can be expressed the product of an element of G_{0,1} with one of these representatives.

Just to show you what I mean, here are some representatives for each link in the chain for the cube group:

U_0                              U_1                      U_2
(0,[])                           (1,[])                   (2,[])
(1,[[0,1],[2,3],[4,5],[6,7]])    (2,[[1,2],[5,6]])        (4,[[2,4],[3,5]])
(2,[[0,2,3,1],[4,6,7,5]])        (4,[[1,4,2],[3,5,6]])
(3,[[0,3],[1,2],[4,7],[5,6]])
(4,[[0,4,6,7,3,1],[2,5]])
(5,[[0,5,6,3],[1,4,7,2]])
(6,[[0,6,3],[1,4,7]])
(7,[[0,7],[1,6],[2,5],[3,4]])

At each link in the point stabiliser chain, we have a base element b that we are going to stabilise. For each point within the orbit of b, we find a group element that takes b to that point. This set of group elements, U_b, is a set of coset representatives - a transversal for the cosets.

The Haskell code to find these transversals, given an SGS, is fairly straightforward:


baseTransversalsSGS gs = [let hs = filter ( (b <=) . minsupp ) gs in (b, cosetRepsGx hs b) | b <- bs]
    where bs = toListSet $ map minsupp gs

cosetRepsGx gs x = cosetRepsGx' gs M.empty (M.singleton x 1) where
    cosetRepsGx' gs interior boundary
        | M.null boundary = interior
        | otherwise =
            let interior' = M.union interior boundary
                boundary' = M.fromList [(p .^ g, h*g) | g <- gs, (p,h) <- M.toList boundary] M.\\ interior'
            in cosetRepsGx' gs interior' boundary'

Here it is in action:

> :load Math.Algebra.Group.RandomSchreierSims
> let gs = map p [ [[0,1],[2,3],[4,5],[6,7]], [[0,2,3,1],[4,6,7,5]], [[0,4,6,7,3,1],[2,5]], [[1,2],[5,6]], [[1,4,2],[3,5,6]], [[2,4],[3,5]] ]
> mapM_ print $ baseTransversalsSGS gs
(0, fromList [(0,[]), (1,[[0,1],[2,3],[4,5],[6,7]]), (2,[[0,2,3,1],[4,6,7,5]]), (3,[[0,3],[1,2],[4,7],[5,6]]), (4,[[0,4,6,7,3,1],[2,5]]), (5,[[0,5,6,3],[1,4,7,2]]), (6,[[0,6,3],[1,4,7]]), (7,[[0,7],[1,6],[2,5],[3,4]])])
(1, fromList [(1,[]), (2,[[1,2],[5,6]]), (4,[[1,4,2],[3,5,6]])])
(2, fromList [(2,[]), (4,[[2,4],[3,5]])])


Okay, so what do these transversals give us? Well, hopefully it is obvious (by induction) that every element in our original group G can be expressed as a product u2*u1*u0, taking an element ui from each transversal. Moreover, this expression is unique.

This is the key to the usefulness of strong generating sets. For example, it means we can easily calculate the order of the group (the number of elements). Just multiply the sizes of the transversals. For the cube group, this tells us that we have 8 * 3 * 2 = 48 elements. We can also use the transversals to create a search tree for the group (more on this some other time). Of particular importance, we can test an arbitrary permutation for membership in the group, by "sifting" it through the transversals, as follows.

Given a permutation g, we look at its action on the first base, b1, and see if we can find an element in the b1-transversal with the same action. If we can, say u1, then we replace g by g*u1^-1, and proceed to the second base. We now look at the action of g*u1^-1 on b2, and see if there is an element in the b2-transversal with the same action. If g is in our group G, then g = un*...*u1, so at the end of this process, we will get 1. Conversely, if g is not in our group, then at some stage we will fail to find a transversal element matching the action of our g (including possibly the case where we go through all the transversals, but are still left at the end with a g /= 1).

Here's the code:


isMemberSGS gs h = sift bts h where
    bts = baseTransversalsSGS gs
    sift _ 1 = True
    sift ((b,t):bts) g = case M.lookup (b .^ g) t of
                         Nothing -> False
                         Just h -> sift bts (g * inverse h)
    sift [] _ = False

Okay, so we've seen what a strong generating set looks like, and why they're useful. Now, suppose someone gives us a set of generators for a group. How do we find a strong generating set for the group?

Well, it's actually fairly simple. We start with a set of empty transversals, corresponding to an empty SGS. What we're going to try to do is add elements to the SGS (and hence to the transversals), until we end up with a full SGS for the group. First of all, we need a slightly modified version of the sift function from above:


sift _ 1 = Nothing
sift ((b,t):bts) g = case M.lookup (b .^ g) t of
                     Nothing -> Just g
                     Just h -> sift bts (g * inverse h)
sift [] g = Just g -- g == 1 case already caught above

In this version, instead of returning either True or False, we return either Nothing, indicating success, or Just g, indicating failure, where g is the part-sifted element that we had at the point of failure.

Okay, so to find an SGS, what we do is feed random elements of the group through the transversals, using the sifting procedure. (How do we generate random elements of the group? - well roughly, we just form random products of the generators - but see below.) If our random element sifts through, then we move on to the next random element. If our random element doesn't sift through, then at some transversal, we have a group element whose action on a base is not matched by any element in the transversal. If this happens, what we have to do is add this group element as a new element to the SGS, and recalculate the transversals. In that way, we will gradually grow the SGS. What we're hoping is that if we look at enough random elements, we'll end up with an SGS for the whole group.

This algorithm is called random Schreier-Sims. It's a Monte-Carlo algorithm - meaning it can return the wrong answer: We might stop too soon, and return an SGS which only generates a subgroup, not the whole group. However, we can make this unlikely by choosing enough random elements.

Okay, so first of all, given some generators for a group, we need to be able to generate random elements of the group. One way to do this is called the "product replacement algorithm". What we will do is create an array containing 10 random elements of the group. Then every time we're asked for another random element, we will randomly replace one element of the array, by multiplying it either on left or right, by either another element or its inverse. The replacement element will thus be a new random element of the group. Ok, but how do we get 10 random elements in the array in the first place? Well, what we do is start off by putting the generators into the array, repeated as many times as necessary - eg a,b,c,a,b,c,a,b,c,a. Then we just run the product replacement a few times to "mix up" the array.

Here's the code. (Note, in this version, we maintain an eleventh element (the zeroth) in the array as an accumulator for results we are going to return - this has been found empirically to give better results.)


initProdRepl :: (Ord a, Show a) => [Permutation a] -> IO (Int, IOArray Int (Permutation a))
initProdRepl gs =
    let n = length gs
        r = max 10 n -- if we have more than 10 generators, we have to use a larger array
        xs = (1:) $ take r $ concat $ repeat gs
    in do xs' <- newListArray (0,r) xs
          replicateM_ 60 $ nextProdRepl (r,xs') -- perform initial mixing
          return (r,xs')

nextProdRepl :: (Ord a, Show a) => (Int, IOArray Int (Permutation a)) -> IO (Maybe (Permutation a))
nextProdRepl (r,xs) = -- r will usually be 10
    do s <- randomRIO (1,r)
       t <- randomRIO (1,r)
       u <- randomRIO (0,3 :: Int)
       out <- updateArray xs s t u
       return out

updateArray xs s t u =
    let (swap,invert) = quotRem u 2 in
    if s == t
    then return Nothing
    else do
        x_0 <- readArray xs 0
        x_s <- readArray xs s
        x_t <- readArray xs t
        let x_s' = mult (swap,invert) x_s x_t
            x_0' = mult (swap,0) x_0 x_s'
        writeArray xs 0 x_0'
        writeArray xs s x_s'
        return (Just x_0')
    where mult (swap,invert) a b = case (swap,invert) of
                                   (0,0) -> a * b
                                   (0,1) -> a * b^-1
                                   (1,0) -> b * a
                                   (1,1) -> b^-1 * a

One thing to point out: We have to select a random element to replace - x_s - and another random element to multiply by - x_t. We're not allowed to have s=t. The way I get round this is to wrap Maybe around the return type, and return Nothing if I happen to pick s=t. With a little bit of work I could probably do better.

Okay, so we can generate random elements from the generators. Now, to find a strong generating set, we start out with an empty set of transversals, and keep sifting random elements through the transversals, updating the transversals when we find elements that don't sift:


sgs :: (Ord a, Show a) => [Permutation a] -> [Permutation a]
sgs gs = toListSet $ concatMap snd $ rss gs

rss gs = unsafePerformIO $
    do (r,xs) <- initProdRepl gs
       rss' (r,xs) (initLevels gs) 0

rss' (r,xs) levels i
    | i == 25 = return levels -- stop if we've had 25 successful sifts in a row
    | otherwise = do g <- nextProdRepl (r,xs)
                     let (changed,levels') = updateLevels levels g
                     rss' (r,xs) levels' (if changed then 0 else i+1)
-- if we currently have an sgs for a subgroup of the group, then it must have index >= 2
-- so the chance of a random elt sifting to identity is <= 1/2

initLevels gs = [((b,M.singleton b 1),[]) | b <- bs]
    where bs = toListSet $ concatMap supp gs

updateLevels levels Nothing = (False,levels)
updateLevels levels (Just g) =
    case sift (map fst levels) g of
    Nothing -> (False, levels)
    Just g' -> (True, updateLevels' [] levels g' (minsupp g'))

updateLevels' ls (r@((b,t),s):rs) h b' =
    if b == b'
    then reverse ls ++ ((b, cosetRepsGx (h:s) b), h:s) : rs
    else updateLevels' (r:ls) rs h b'

That's it.

This algorithm makes it possible to work with much larger permutation groups. We already saw it in use in an earlier post, where I found an SGS for the Rubik's cube group, in order to calculate the number of possible positions (approximately 4*10^19). Over the next few weeks I want to look at how we can use strong generating sets to investigate the structure of groups.

Followers