# What is a unitary operator?

I shall make no attempt at maintaining a uniform level of mathematics in this blog. For instance, the previous post showed a simple approach to a relatively advanced (or, at least, new-ish) concept, and the one before that I found to be quite difficult, and am still not sure I get it entirely. In this post, I’m going to try to explain a very basic concepts, which most people who read blogs like this may consider to be trivial. This would be somewhat like a talk I would give to students who have recently learnt what Hilbert spaces and bounded linear operators are. What I’m feeling for is a kind an intuition at a very basic level, which can be used to build upon for a more refined intuition in more complex cases. Unless I approach new concepts this way myself, I soon find myself lost. In a way, this post approximates the way I would approach a new concept, and thus is more about process than content.

The reason for this post is that I was reading a bit of ergodic theory again, with an eye to understanding the number-theoretic arguments that rely on it (Furstenberg’s proof of Szemerédi, and the like). When I saw the use of unitary operators in the formulation of von Neumann’s ergodic theorem, I wondered how I would explain the use of them to a student with only a scant knowledge of operator theory. The discussion will initially just focus on the kind of unitary transformation a student should already have encountered in linear algebra, but later on involve some more abstract notions.

Let’s first settle on a suitable Hilbert space to use. For now, since we’re going to discuss physical ideas involving lengths and angles, we’ll stick with Euclidean space, and see how the ideas generalise to other spaces later. We start with $\mathbb{R}^3$, with the usual inner product $(x_1 ,y_1 ,z_1 )\cdot (x_2 ,y_2 ,z_3 ) = x_1 x_2 +y_1 y_2 +z_1 z_2.$

Any $3\times 3$ real matrix can be seen as a bounded linear operator $\mathbb{R}^3 \to \mathbb{R}^3$.  (Exercise: Show this.) Since the way in which we “measure” elements of the Hilbert space is through the inner product (and the norm deriving from it), interesting operators would be ones which have some quantifiable relationship with this. For instance, a subspace could be seen as the set of vectors that all satisfy some angular/directional relationship (such as being contained in a plane), and the projection onto this subspace isolates the parts of a vector that satisfy this relationship. So it would seem that a fairly natural question to ask is, which operators will preserve the length and angular relationships between vectors?

The simplest operations on a space might be considered to be translation, rotation and dilation (and the identity mapping, which is not that interesting). Starting with dilation, suppose that $D_{\lambda}$ is an operator that takes a vector $x$ and transforms it to a vector of length $\lambda \| x\|$ in the same direction as $x$. Obviously, this operator acts as $D_{\lambda} (x_1 ,x_2 ,x_3 ) = \sqrt{\lambda}(x_1 ,x_2 ,x_3 ).$

The operator can of course be written as a diagonal $3\times 3$ matrix with all entries equal to $\sqrt{\lambda}$. It is easily seen that unless $\lambda = 1$, this operator does not preserve inner products, that is, $\langle D_{\lambda }x, D_{\lambda}y \rangle \neq \langle x, y \rangle$. The same goes for translation, that is, an operator that translates any $(x_1, x_2, x_3 )$ to $(x_1+ a, x_2+b, x_3 +c )$.

You may now complain that these operators do indeed preserve length and angular relationships, and they do to a certain extent. However, they do not preserve those relationships with the origin, which is why the lengths differ. The third kind of transformation does this. Any rotation $U$ on $\mathbb{R}^3$ can be seen to preserve inner products. When you think geometrically of the inner product in terms of the projection of one vector on another, it is clear that the absolute orientation of these vectors does not play a part. Furthermore, given that the rotation can be represented by a $3\times 3$ matrix, we can now deduce the defining property of a unitary operator. Denoting the transpose of a matrix $U$ by $U^{\#}$, we see that $\langle x, y \rangle = \langle Ux ,Uy \rangle = \langle U^{\#} Ux, y \rangle = \langle x, U^{\#} Uy \rangle$

for all $x,y \in \mathbb{R}^3$.

Since this must hold for all relevant $x$ and $y$, it must be that $U^{\#}U = UU^{\#} = 1$. (This property will be satisfied by all matrices with determinant -1 or 1).

(Another nice example to consider is that of a rotation in a complex plane of, which is just a multiplication by a complex number of modulus 1.)

With some idea of what a unitary operator in this space is, we can try to make sense of the von Neumann ergodic theorem.

Theorem. Let $U$ be a unitary operator on a Hilbert space $H$. Let $P$ be the orthogonal projection onto the subspace $\{ x\in H: Ux = x\}$. Then, for any $x \in H$, $\lim_{N\to \infty} \frac{1}{N}\sum_{n=0}^{N-1} U^n x = Px$.

Suppose that we were to apply this theorem to the case of rotations in $\mathbb{R}^2$, which is not much different from considering rotations in three dimensions. We immediately run into a problem. If $P$ is a rotation, what is an invariant subspace? Unless $P$ is a full rotation, only the origin is invariant, and a full rotation is the same as the identity operator! However, this does not imply that the content of the theorem is entirely void. Since the zero operator is the projection on the invariant subspace which is the origin, we see that for any $x \in \mathbb{R}^2$, $\displaystyle \lim_{N\to \infty} \frac{1}{N}\sum_{n=0}^{N-1} U^n x =0.$

Hence, if the sum up the rotations (which are not full rotations) of any vector, we must get an average of $0$, which makes intuitive sense. In the case of rational rotations (i.e. rational mutiples of $2\pi$), it is not too hard to prove, and in the case of irrational ones it would seem sensible from the Weyl equidistribution theorem.

One of the first examples one usually encounters in ergodic theory is the shift operator. Since my introduction to ergodic theory came via Furstenberg’s proof of Szemeredi’s theorem, this felt natural. However, supposing that you are used to unitarity meaning rotation, is there a way to interpret the shift operator as such?

To make sense of this, we have to slightly expand our idea of unitarity. Consider the following operator acting on vectors in $\mathbb{R}^2$: $\displaystyle A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}.$

It is easy to see that this matrix takes the unit vector $[1,0]$ to the unit vector $[0,1]$, and vice versa. Furthermore, the conjugate transpose of $A$ is itself, and $A^2 = 1$. Thus $A$ is unitary, and is in fact a permutation of the axes. Of course, in two dimensions there are only two permutations possible (it is the permutation group on two elements, after all!), so things will get more interesting in higher dimensions. (Exercise: Show that any permutation of axes in $\mathbb{R}^3$ is unitary. Can all of these permutations be seen as rotations?) In fact, any permutation of an orthonormal basis in a Hilbert space is unitary (think of how a permutation of the basis would affect the inner product). Thus, if we think of the shift operator as a permutation of a countable basis, we see that it fits into the general intuition of a unitary operator.

Clearly, there is still much to unpack here. What kind of unitary operators do we get on some common Hilbert spaces, such as $l^2$ or $L^2$? What does unitary equivalence mean? What does it mean when a unitary operator is self-adjoint? Posing questions like these is, I believe, more valuable than reading a single textbook’s introduction on the subject, because you’re trying to make sense of something in terms of what you already know, and you can allow curiosity to lead you. At some point, the concepts will take on a life of their own and your frame of reference will expand, ready to begin making sense of the next, more advanced concept.

Learning mathematics is a lot messier than most textbooks (and lectures) would have us believe. Embrace it.