I understand quaternions!

So I just read the first chapter of John Stillwell’s Naive Lie Theory, and now I understand how quaternions can be used to represent spatial rotation! Previously, I could do the calculation but I didn’t understand why it was true. Here’s the way I understand it, which is inspired by but not identical to Stillwell. First of all, what exactly is the claim that I now understand?

A purely imaginary quaternion x\mathbf{i}+y\mathbf{j}+z\mathbf{k} can be thought of as a vector \mathbf{u}=(x,y,z) in \mathbb{R}^3. Thus we can think of a general quaternion as the sum of a scalar and a vector: w+x\mathbf{i}+y\mathbf{j}+z\mathbf{k}=w+\mathbf{u}. Thus any unit quaternion can be written q=\cos\theta+\mathbf{u}\sin\theta for some angle \theta and unit vector \mathbf{u}\in\mathbb{R}^3.

The claim concerns rotations in \mathbb{R}^3. Apparently, the rotation by the angle 2\theta about the axis u\in\mathbb{R}^3 is represented by the unit quaternion q=\cos\theta+\mathbf{u}\sin\theta. To rotate a vector v\in\mathbb{R}^3 by this rotation, we write qv\overline{q}. People really like this because it makes 3D spatial rotation into quaternion multiplication, which is much easier and less messy than using 3\times 3 matrices. Computer graphics use quaternions for this reason. Also, this gives a nice interpretation for quaternions, which otheriwse have unclear meaning: rotations of \mathbb{R}^3. Later in this post we will see an even nicer interpretation.

Now why does this work?

First let’s see what happens when two vectors (purely imaginary quaternions) are multiplied.

(a\mathbf{i}+b\mathbf{j}+c\mathbf{k})(x\mathbf{i}+y\mathbf{j}+z\mathbf{k})=-(ax+by+cz)+(bz-cy)\mathbf{i}+(cx-az)\mathbf{j}+(ay-bx)\mathbf{k}

We can recognize the right-hand-side as minus the dot product plus the cross product of the two vectors:

\mathbf{u}\mathbf{v}=-\mathbf{u}\cdot\mathbf{v}+\mathbf{u}\times\mathbf{v}

Two interesting special cases of this:

1) If \mathbf{u} and \mathbf{v} are parallel, then their cross product is 0, so \mathbf{u}\mathbf{v}=-\mathbf{u}\cdot\mathbf{v}. In particular, \mathbf{u}^2 = -\mathbf{u}\cdot\mathbf{u}=-|\mathbf{u}|^2. If \mathbf{u} is a unit vector, then its square is -1, just like \mathbf{i}, \mathbf{j}, and \mathbf{k}. This means that the set of quaternions a+b\mathbf{u} are a replica of the complex plane: they add and multiply just like complex numbers a+b\mathbf{i}.

2) If \mathbf{u} and \mathbf{v} are orthogonal, then their dot product is 0, so \mathbf{u}\mathbf{v}=\mathbf{u}\times\mathbf{v}. In this case, let’s define \mathbf{w}=\mathbf{u}\times\mathbf{v}. Now the vectors \mathbf{u}, \mathbf{v}, and \mathbf{w} are mutually orthogonal. Let’s assume that \mathbf{u} and \mathbf{v} are both unit vectors, and see how \mathbf{u}, \mathbf{v}, and \mathbf{w} multiply:

\mathbf{u}^2=-1 (by the first special case)
\mathbf{v}^2=-1
\mathbf{w}^2=-1 (since \mathbf{w}=\mathbf{u}\times\mathbf{v} is a unit vector)
\mathbf{u}\mathbf{v}=\mathbf{w}
\mathbf{v}\mathbf{u}=-\mathbf{w} (because cross product is antisymmetric)
\mathbf{v}\mathbf{w}=-\mathbf{v}\mathbf{v}\mathbf{u}=\mathbf{u} (by previous line, associativity of quaternion multiplication, \mathbf{v}^2=-1)
\mathbf{w}\mathbf{v}=-\mathbf{u}
\mathbf{w}\mathbf{u}=-\mathbf{v}\mathbf{u}\mathbf{u}=\mathbf{v}
\mathbf{w}\mathbf{u}=-\mathbf{v}

Lo and behold, these are the exact same relations that hold among \mathbf{i}, \mathbf{j}, and \mathbf{k} and that are used to define the quaternions! In other words, it turns out that we have made a copy of the quaternions using \mathbf{u}, \mathbf{v}, and \mathbf{w} in place of \mathbf{i}, \mathbf{j}, and \mathbf{k}. This is an automorphism of the quaternions.

What makes this automorphism work? Well, it is essential that \mathbf{u}, \mathbf{v}, and \mathbf{w} be mutually orthogonal and that each of them have length 1. But this is not all, they must also have the correct orientation. For example, \mathbf{u}=i, \mathbf{v}=j, \mathbf{w}=-k would not work. The automorphisms of the quaternions are therefore exactly the proper rotations of \mathbb{R}^3.

Now what happens to all the quaternions when we multiply them on the left by a unit vector \mathbf{u}? To answer this, let’s first construct \mathbf{v} and \mathbf{w} as we did above. We let \mathbf{v} be any unit vector orthogonal to \mathbf{u}, and we let \mathbf{w}=\mathbf{u}\mathbf{v}. Now any quaternion can be expressed as a linear combination of 1, \mathbf{u}, \mathbf{v}, and \mathbf{w}, and this is more convenient for our purposes than the conventional expression of a quaternion as a linear combinations of 1, \mathbf{i}, \mathbf{j}, and \mathbf{k}. So:

\mathbf{u}(a+b\mathbf{u}+c\mathbf{v}+d\mathbf{w})=-b+a\mathbf{u}-d\mathbf{v}+c\mathbf{w}

What happened? Well, we simultaneously rotated by 90 degrees in the 1,\mathbf{u}-plane and in the \mathbf{v},\mathbf{w}-plane. This is called a double rotation, and it is not possible in 3 dimensional space because it requires two orthogonal planes. Here is a projection onto 3-space of a 4-dimensional hypercube undergoing a double rotation.

Tesseract

It is simultaneously rotating and turning inside out. The turning inside out, however, is really a projection of a rotation in an orthogonal plane. The small cube in the center is not really smaller than the cube around it, it is just further away from the viewer. As the hypercube turns inside out, each constituent cube comes closer and goes farther away.

Now let’s take the general case of multiplying quaternions on the left by a unit quaternion q=\cos\theta+\mathbf{u}\sin\theta. For a quaternion x, we get

qx = x\cos\theta+ \mathbf{u}x\sin\theta

This is a linear combination of x and \mathbf{u}x, x doubly rotated by 90 degrees. When we combine these two things this way, we end up rotating x by \theta. Indeed, if \theta=0 then qx=x and if \theta=90^\circ then qx=\mathbf{u}x. If you think about this you’ll see that as \theta increases qx doubly rotates by the angle \theta in the 1,\mathbf{u}-plane and in the \mathbf{v},\mathbf{w}-plane.

It is important to note that qx rotates at the same rate in both planes. A double rotation where both rotations are of the same angle is called an isoclinic rotation.

Now what about multiplying quaternions on the right by a unit quaternion q=\cos\theta+\mathbf{u}\sin\theta? Quaternions aren’t commutative right, so maybe it will do something different? Well it does! First let’s just multiply on the right by \mathbf{u}:

(a+b\mathbf{u}+c\mathbf{v}+d\mathbf{w})\mathbf{u} =-b+a\mathbf{u}+d\mathbf{v}-c\mathbf{w}

This is just like the multiplying on the left except the signs of the \mathbf{v} and \mathbf{w} terms have flipped. It is a double rotation again, but now we are rotating the planes opposite ways. The general case works as expected:

xq = x\cos\theta + x\mathbf{u}\sin\theta

So we doubly rotate x by the angle \theta in the 1,\mathbf{u}-plane and by the angle -\theta in the \mathbf{v},\mathbf{w}-plane.

We distinguish isoclinic rotations by whether they rotate the planes in the same or opposite directions. If they rotate the planes in the same direction, they are called left isoclinic rotations, and if opposite, they are called right isoclinic rotations. Left isoclinic rotations have come about through left quaternion multiplication, and right one have come about through right quaternion multiplication, but is it possible to do the reverse? No. Every left multiplication yields a left isoclinic rotation, as we have seen, and a left isoclinic rotation can’t also be a right one. This is related to the previous point that \mathbf{u}, \mathbf{v}, and \mathbf{w} must have the same orientation as \mathbf{i}, \mathbf{j}, and \mathbf{k}.

Now, finally, let’s justify the claim above. What happens to a quaternion x when we left-multiply it by q and right-multiply it by \overline{q}? If q=\cos\theta+\mathbf{u}\sin\theta, then \overline{q}=\cos\theta-\mathbf{u}\sin\theta=\cos(-\theta)+\mathbf{u}\sin(-\theta). Thus the two isoclinic rotations performed on x are in the same planes and by the same angle but they are in different directions. Also, one of them is left and one is right. Let’s write down what they do:

Left-multiplication by q: Rotates by \theta in the 1,\mathbf{u}-plane and by \theta in the \mathbf{v},\mathbf{w}-plane.

Right-multiplication by \overline{q}: Rotates by -\theta in the 1,\mathbf{u}-plane and by \theta in the \mathbf{v},\mathbf{w}-plane.

So the combined effect is to rotate by 2\theta in the \mathbf{v}, \mathbf{w}-plane and do nothing else! Since we don’t touch the real component of x, we are then rotating \mathbb{R}^3 exclusively. Since we don’t touch the u-component of x, the vector u must be our axis of rotation. Thus we did it! This is why conjugation by q is rotation by angle 2\theta about u!

Before I go, here’s some more cool stuff about this. Notice the factor of 2. This is interesting because it means that if \theta=180^\circ we won’t rotate at all. In this case, q=\cos 180^\circ + \textbf{u}\sin 180^\circ=-1. So the unit quaternions 1 and -1 both correspond to not rotating at all. These are actually the only such unit quaternions, because every other unit quaternion has an angle \theta such that 2\theta\neq 0. More generally, there are exactly two unit quaternions that correspond to any rotation: q=\cos\theta+\mathbf{v}\sin\theta and -q. This means that the 3-sphere of unit quaternions is a double cover of the space of proper rotations of 3-space. This fact was exploited in the very cool game Hypernom.

Also, I’d like to mention rotations in the 4-space that quaternions live in. We know how to do left and right isoclinic rotations in this space, but how do we do general rotations? Well, it turns out we can put together a left and a right isoclinic rotation to do any rotation we want. This can be written qxr, where q and r are unit quaternions which are performing left and right isoclinic rotations on x respectively. I’ll show how to make any single rotation, and you can put these together to make double rotations (not necessarily isoclinic).

Let’s say we want to rotate by the angle 2\theta in the plane spanned by quaternions y and z. If y and z are both fully imaginary, we already know how to do this. So let’s consider the case wherein y has non-zero real component. If y=1, then z must be fully imaginary. We then follow a procedure very similar to the previous. We do a left and a right isoclinic rotation, both of which rotate in the same direction by \theta in the 1,z plane and which cancel in the orthogonal plane. This can be written qxq. Notice the absence of the conjugate.

Now if y\neq 1, but still has non-zero real component, we can turn it to be 1, perform the rotation in the way just described, and then turn it back. In more detail:

Step 1. Singly rotate in the 1,y-plane so that y maps to 1. Let z' be the rotated version of z.

Step 2. Rotate in the 1,z'-plane by 2\theta.

Step 3. Perform the same rotation as step 1 but in reverse.

Now, I claimed that every rotation could be written as qxr, but now we have multiple steps. What’s going on? Well, we can write each step this way. Step 1 can be written q_1xr_1, step 2 q_2xr_2, and step 3 is just the inverse of step 1: q_1^{-1}xr_1^{-1}. Doing all these steps sequentially yields q_1^{-1}(q_2(q_1xr_1)r_2)r_1^{-1}=(q_1^{-1}q_2q_1)x(r_1r_2r_1^{-1}). This is of the required form because q_1^{-1}q_2q_1 and r_1r_2r_1^{-1} are both unit quaternions performing left and right isoclinic rotations on x respectively.

So every rotation in 4-space can be accomplished with two unit quaternions, q and r. Rotating a quaternion x then yields qxr. This means that rotations in 4-space can be represented by pairs of quaternions. However, just like in the case of rotations in 3-space, there are actually exactly two pairs of quaternions corresponding to every 4-spatial rotation: (q,r) and (-q,-r). We can see this because (-q)x(-r)=qxr. But why are these the only two? Well, suppose that we had another pair (q\alpha, \beta r). Then q\alpha x\beta r=qxr, which simplies to \alpha x\beta=x.  This must hold for all x, so we can choose x=1 in particular, to attain \alpha\beta=1. This means that \beta=\alpha^{-1}=\overline{\alpha}, so \alpha x\beta=\alpha x \overline{\alpha} is a rotation of \mathbb{R}^3. Since it doesn’t change anything, \alpha=\pm 1.

Leave a comment