# Solving symbolic matrix equations in Python with SymPy

I am no Python expert, and have only recently encountered SymPy, for symbolic calculations. I have been trying to do some (relatively simple) matrix calculations, and it has taking me an embarrassingly long time to figure out how to do this. So, I am sharing what I have learned here to help someone else avoid the rather large number of internet searches I had to do to piece it together.

There is a lot out there on how to use SymPy to solve matrix equations of the form $Ax = b$. What I am interested in is taking a bunch of given matrices (with numerical values) and constants, performing some operations with an unknown matrix, and setting each entry of the final matrix equal to zero and solving. In other words, suppose we are given $n \times n$ matrices $A$ and $B$, which are determined beforehand. There is some constant $t$ which can be varied (this forms part of an iterative scheme), and an unknown matrix $C$, which is represented purely symbolically, as such:

$\begin{bmatrix} c_{0,0} & c_{0,1} & \cdots &c_{0, n-1} \\ \vdots & \vdots & \ddots & \vdots \\ c_{n-1,0} & c_{n-1,1} & \cdots & c_{n-1,n-1} \end{bmatrix}$

There is a function $f_{A,B,t}: \mathbb{R}^{n\times n}\to\mathbb{R}^{n\times n}$, and we want to find the entries of $C$ for which

$f_{A,B,t}(C) = \mathbf{0},$

where $\mathbf{0}$ indicates the $n\times n$ zero matrix.

In order to solve an equation in SymPy, you have to declare the “symbols” that you are solving for. Now, defining a matrix symbol in SymPy is easy, but this did not help me in solving for the matrix, and I kept getting an empty output. I needed a way to iteratively declare each entry of the matrix as a symbol, whilst putting them together as a single matrix. This turned out to be the key to the whole thing. First, let us state the preamble:

import sympy as sym
from sympy.interactive import printing
printing.init_printing(use_latex=True)

The last two lines have no effect on the calculations, but they do give the option of displaying your matrices very nicely. Next, let us define some function with which to work:

def matrixfunction(A,B,C,t):
mf = A - t*C + B.inv()
return mf

(The final part of the last line is simply how we compute the inverse of $B$.) Here is one of the first things that tripped me up. In SymPy, you should distinguish between operations involving symbolic matrices and usual operations between matrices. For instance, if I were to declare $D$ and $E$ to be two arbitrary $5\times 5$ matrices and wanted, for instance, to multiply them, I would use

D = sym.MatrixSymbol('D', 5, 5)
E = sym.MatrixSymbol('E', 5, 5)
sym.MatMul(D,E)

The output to this would be

D*E

and we would be able to see the symbolic entries of this matrix by using

X = sym.MatMul(D,E)
X.as_explicit()

The same holds for MatAdd. However, if you have defined the matrix by declaring all of its entries to be symbols, there does not seem to be a need to use this method, and a simple * can be used for multiplication and $+$ for addition.

Our objective is now to set each entry in the matrix obtained from the function “matrixfunction” equal to zero and solve for the unknown matrix $C$. To do so, we define

def solvefor(A,B,t):
C = Matrix(n,n,sym.symbols('D0:n(0:n)'))
sol = sym.solve(matrixfunction(A,B,C,t))
display sol

Of course, in the above, $n$ needs to be replaced by an actual numerical value. Once the functions are defined, we can assign values to $A$, $B$ and $t$ and run “solvefor”. The output will be a set of values assigning the solution value to each entry of $C$. In this case, $B$ has to be invertible. Note that SymPy automatically sets the argument of “sym.solve” equal to zero unless otherwise instructed – in this case, it is set equal to the zero matrix. We are also not specifying which symbols to solve for, since SymPy will automatically solve for the free variables here.

# Szemerédi’s regularity lemma

I have taken an interest in regularity lemmas recently, starting with the weak regularity lemma of Frieze and Kannan (in order to understand Szemerédi’s theorem on arithmetic progressions), and decided that I needed to understand the proof of the full regularity lemma. The reference I’m using is Reinhard Diestel‘s book Graph Theory. The regularity lemma seems almost magically powerful to me, and has many applications (and generalisations, but we’ll get to that), but the proof is not all that hard. This post won’t present the full proof, just an overview to get the feel of it. (For another perspective on the regularity lemma by someone who actually knows what he is doing, see Yufei Zhao’s blog entry https://yufeizhao.wordpress.com/2012/09/07/graph-regularity/.)

Much like certain proofs of Roth’s theorem, the idea behind the proof of the regularity lemma is to increase a bounded positive quantity by a constant in each iteration. There are only so many times you can do this, and when you stop, what is left must satisfy the conditions of the lemma. Firstly, some definitions.

Definition 1. Let $G=(V,E)$ be a graph. We say that a pair of disjoint subsets $U,W \subset V$ is $\varepsilon$-regular if

$|d(U,W) - d(U_1 ,W_1)| < \varepsilon$

for any subsets $U_1 \subseteq U$ and $W_1 \subseteq W$ for which $|U_1 |\geq \varepsilon |U|$ and $|W_1 |\geq \varepsilon |W|$.

In the above, $d(U,W)$ as usual denotes edge density

$\frac{\| U,V\|}{|U||W|}$

between $U$ and $W$.

Definition. Let $\varepsilon >0$ be given and let $G=(V,E)$, $|V|=n$, be a graph. We say that a partition $\{ V_0 ,V_1 , \dots V_k \}$ of the vertices of the graph is $\varepsilon$-regular if

1. $|V_0| \leq \varepsilon n$
2. $|V_1| = |V_2| = \cdots =|V_k|$
3. Each pair $(V_i ,V_j)$, $i\neq j$ is $\varepsilon$-regular for all but at most $\varepsilon k^2$ of the possible pairs.

The regularity lemma (in one form; there are several equivalent ones) can be stated as follows.

Regularity Lemma. Give some $\varepsilon >0$ and some $m\geq 1$, there exists an integer $M$ such that any graph $G = (V,E)$ of order at least $m$ admits an $\varepsilon$-regular partition $\{ V_0 , V_1 , \dots ,V_k \}$ where $m\leq k \leq M$.

The quantity that we will bound is a kind of $L^2$ norm on the graph. If $A,B \subseteq V$ and disjoint, set

$q(A,B) = \frac{|A| |B|}{n^2}d(A,B)^2 = \frac{\| A,B \|^2}{|A||B|n^2},$

where $n = |V|$. It should be clear that $q(A,B) \leq 1$. In effect, we consider $A\cup B$ as a bipartite graph. If the graph is complete, $q$ will be $|A||B|/(|A| +|B|)^2 <1/2$, and $0$ when empty. It does not seem to me that the quantity has a very intuitive meaning. However, it does satisfy the conditions of being bounded and increasing under successive partitions, which is what we’ll need. Suppose that $\mathcal{C}$ is a partition of $C$ and $\mathcal{D}$ is a partition of $D$. We define

$q(\mathcal{C} , \mathcal{D} ) = \sum_{A \in \mathcal{C}, B \in \mathcal{D}} q(A,B)$.

The fundamental fact which we can prove (quite easily) is that, with all objects as above,

$q(\mathcal{C},\mathcal{D}) \geq q(C,D).$

If we have a partition $\mathcal{P} = \{V_1 , \dots ,V_k \}$ of the graph $G$, we define

$q(\mathcal{P}) = \sum_{i\neq j} q(V_j ,V_j).$

We can now prove from the above that if $\mathcal{P}'$ is a refinement of the partition $\mathcal{P}$, then

$q(\mathcal{P}') \geq \mathcal{P}.$

We can already start to spy the strategy of the proof here – we have a bounded quantity that increases with further partitions. What we have to prove is that such refinement of partitions will increase by a fixed amount, which means that it can be done only a bounded number of times, given that $q$ is bounded by $1$.

Suppose we are now given a partition of $V$, namely $\{ V_0 ,V_1 , \dots ,V_k \}$, where all the members except $V_0$ have the same size. We regard $V_0$ as a set of members of the partition itself, each a singleton. We set

$\mathcal{P} = \{ V_1, \dots , V_k \} \cup \{ \{ v \} : v\in V_0 \}$

and define the quantity $q$ for our partition ${ V_0 ,V_1 , \dots ,V_k }$ as $q(\mathcal{P} )$. With our definitions in place, we can start assembling the required lemmas.

Lemma 0. From the Cauchy-Schwarz inequality, we can conclude that

$\sum \frac{e_{i}^{2} }{\mu_i} \geq \frac{(\sum e_i )^2}{\sum \mu_i}.$

We won’t be using this lemma in this post (hence calling it Lemma 0), but it is essential in working out the details of the following lemmas.

Lemma 1. If $C,D \subset V$ are disjoint and not $\varepsilon$-regular, there are partitions $\mathcal{C} = \{ C_1 ,C_2 \}$ of $C$ and $\mathcal{D} = \{ D_1 , D_2 \}$ of $D$ such that

$q(\mathcal{C} , \mathcal{D} ) \geq q(C,D) + \frac{\varepsilon^4 }{|C| |D| n^2}.$

For the proof, we pick the sets $C_1$ and $D_1$ to be two sets of sizes no less than $\varepsilon |C|$ and $\varepsilon |D|$, respectively, witnessing the irregularity of $(C,D)$, and $C_2$ and $D_2$ to be what is left over in $C$ and $D$.

Using Lemma 1, we can prove the following crucial lemma.

Lemma 2. Let $\varepsilon$ be such that $0<\varepsilon \leq 1/4$. If $\mathcal{P} = \{ V_0 ,V_1, \dots ,V_k \}$ is an $\varepsilon$-irregular partition of $V$ where $|V_1 | = |V_2 | = \cdots = |V_k |$ and $|V_0| \leq \varepsilon n$, then there is a partition $\mathcal{P}' = \{ W_0 , W_1 , \dots ,W_l \}$, $k\leq l \leq k4^k$, of $V$ such that $|W_1| = |W_2 | = \cdots = |W_l |$ and $|W_0 | \leq |V_0 | + n/2^k$ such that

$q( \mathcal{P} ' ) \geq q( \mathcal{P} ) + \frac{ \varepsilon^5 }{2}.$

The crucial part here is that the constant increased by does not depend on any of the parameters except $\varepsilon$. The idea behind the proof of Lemma 2 is to compare all pairs of sets $V_i , V_j$ in the first partition, that might be irregular, then use Lemma 1 to partition them. We take a common refinement of all of these partitions, then partition further to obtain sets of equal cardinality, shunting all the vertices left over into $V_0$.

What astounds me about Lemma 2 is how wasteful it seems to be. We just partition according to Lemma 1 until we get the desired refinement, and yet this is still powerful enough to lead directly to the result we’re after.

We can now carefully pick our constants and iterate Lemma 2 until we obtain the regularity lemma. Pick $\varepsilon$ such that $0< \varepsilon \leq 1/4$. We will need to determine the constants $m$ and $M$, depending on $\varepsilon$ but not on $G$. These depend on how many times we have to implement Lemma 2, and how many further partitions each use of Lemma 2 creates. Because the $q$-value of a partition cannot be greater than $1$, we know that the number of iterations has to be bounded by $s = 2/ \varepsilon^5$. Now, at each stage the cardinality of the exceptional set may increase by $n/2^k$, if we start with a partition $\{ C_0 , C_1 , \dots ,C_k \}$. We want to prevent the size of the exceptional set at any stage from exceeding $\varepsilon n$, and since we start with some $C_0$ with $|C_0 |\leq \frac{1}{2} \varepsilon n$, we have to make sure that $s$ iterations of adding $n/2^k$ do not exceed $\frac{1}{2}\varepsilon n$ either. (This is not precise, since the exceptional set will not actually increase by the same factor at each stage, but this will certainly do.) Now $n$ must be chosen large enough so that for the initial exceptional set $C_0$, we have a large $k$ so that $|C_0| < k$ and $|C_0 |\leq \frac{1}{2} \varepsilon n$. The exceptional set should be allowed up to $k$ members, to ensure we have equal cardinality for all the other sets. If $k\geq m$ is large enough to allow $2^{k-1} \geq s/\varepsilon$, we have $s/2^k \leq \varepsilon /2$ and

$k+\frac{s}{2^k} n \leq \varepsilon n$

for $n\geq 2k/ \varepsilon$.

What is left is to choose $M$. We choose $M$ larger than the maximum of $2k/\varepsilon$ and the number of sets (non-exceptional) that can be generated by $s$ applications of the previous lemma, that is, $s$ iterations of $r$ elements of the partition increasing to $r4^r$ elements.

As long as our graph is relatively small, it is easy to show that a suitable partition exists. If the graph has order $n$, where $m\leq n \leq M$, we choose $V_0 = \emptyset$ and let $|V_1| = |V_2| = \cdots = |V_n|$. For $n> M$, we want to start with a partition which satisfies $|W_0| \leq \frac{1}{2} \varepsilon n$ for Lemma 2 to be valid. Let $k$ be as we established above. Now, let $W_0$ be a minimal subset of $V$ so that $k$ divides $V\setminus W_0$. We then partition $V\setminus W_0$ into $k$ sets of equal cardinality. Because $k\leq \frac{1}{2} \varepsilon n$, Lemma 2 is applicable, and we simply apply it repeatedly to obtain the partition we are after.

# The most important thing I do every day. Almost (kind of).

This post started out in a more innocent time, but has become more relevant because of the global crisis. I have officially been on COVID-19 lockdown for four weeks now, and have to take my own advice here so as not to go stir-crazy.

The current post has its genesis in a relatively recent holiday. My wife and I spent nearly two weeks away from it all, or at least, most of it. For the most part we were in relative isolation, surrounded by nature. Scenes like this:

This was one of those near-perfect experiences. We rested well, but were also active. We slept much and often, but also spent hours trekking through the mountains and the bush, and still had time to do a mountain of reading (because otherwise it doesn’t qualify as a holiday in my book, so to speak). I’ve had holidays that were fun while they lasted, but left me with a hollow feeling, as if I have neither rested properly nor done anything worthwhile. This was the opposite.

What set this one apart then, were two factors:

1. We were surrounded by gorgeous natural scenes and very few people.
2. We were active every day, going on some taxing hikes and swimming in rock pools.

I returned from my holiday refreshed and mentally prepared for the blizzard of things I have to do this year. Yet, only one day after my holiday, I was sitting at home catching up on Netflix when I was shocked to realise that I did, in fact, feel like shit. There was absolutely no reason for me to feel anything but wonderful. I was well-rested, I was healthy, I was financially stable, did not have great stress, and I had a weekend of relaxing left with my wife before going back to work (which I was looking forward to).

It didn’t take me long to figure out why, because I could contrast my circumstances with those of two days before, when I was feeling great. Firstly, I hadn’t been out of the house yet, despite it being a gorgeous, sunny day. Secondly, I hadn’t done anything physical. So, opposite circumstances yielded opposite results.

I know this does not sound like a major epiphany. But as someone who spends a lot of time thinking about very abstract stuff and not as much on external circumstances, it was an important one. The lesson comes down to this: to feel mentally excellent, the easiest thing to do is to involve the physical. I am no stranger to physical exercise and train regularly, but usually in the evenings. I realised that to have an excellent day, I needed to feel excellent first thing in the morning.

To implement this, I came up with a little exercise programme I do first thing in the morning. It is important for me to do this before having my coffee, before checking my e-mail, or anything else. It’s not a massive work-out. My aim is to get thoroughly warmed up, to get out of breath and to sweat. There is an easy metric to determine when I have done enough: it is when the exercise stops feeling like a chore and starts being fun, and I don’t really want to stop.

The difference in my life is clearly measurable and significant. During the lockdown, there have been times when I have slept later than normal and entered the day with a fuzzy mind and low physical energy. The solution is to get out of bed (even when I’ve overslept) and get straight into exercise. I make the requirements for these sessions very low, and only require ten goblet squats for the session to be considered complete. Of course, I never stop there, but knowing that I can finish this in a minute helps me get started. When I start my day’s work, my mind feels clear and energised. If I could quantify my happiness, I would say that this simple habit has increased it by at least 10%.

I do still skip it sometimes because of sheer laziness or time pressure (mostly the former). Every time I do though, I regret it.

# What is a unitary operator?

I shall make no attempt at maintaining a uniform level of mathematics in this blog. For instance, the previous post showed a simple approach to a relatively advanced (or, at least, new-ish) concept, and the one before that I found to be quite difficult, and am still not sure I get it entirely. In this post, I’m going to try to explain a very basic concepts, which most people who read blogs like this may consider to be trivial. This would be somewhat like a talk I would give to students who have recently learnt what Hilbert spaces and bounded linear operators are. What I’m feeling for is a kind an intuition at a very basic level, which can be used to build upon for a more refined intuition in more complex cases. Unless I approach new concepts this way myself, I soon find myself lost. In a way, this post approximates the way I would approach a new concept, and thus is more about process than content.

The reason for this post is that I was reading a bit of ergodic theory again, with an eye to understanding the number-theoretic arguments that rely on it (Furstenberg’s proof of Szemerédi, and the like). When I saw the use of unitary operators in the formulation of von Neumann’s ergodic theorem, I wondered how I would explain the use of them to a student with only a scant knowledge of operator theory. The discussion will initially just focus on the kind of unitary transformation a student should already have encountered in linear algebra, but later on involve some more abstract notions.

Let’s first settle on a suitable Hilbert space to use. For now, since we’re going to discuss physical ideas involving lengths and angles, we’ll stick with Euclidean space, and see how the ideas generalise to other spaces later. We start with $\mathbb{R}^3$, with the usual inner product

$(x_1 ,y_1 ,z_1 )\cdot (x_2 ,y_2 ,z_3 ) = x_1 x_2 +y_1 y_2 +z_1 z_2.$

Any $3\times 3$ real matrix can be seen as a bounded linear operator $\mathbb{R}^3 \to \mathbb{R}^3$.  (Exercise: Show this.) Since the way in which we “measure” elements of the Hilbert space is through the inner product (and the norm deriving from it), interesting operators would be ones which have some quantifiable relationship with this. For instance, a subspace could be seen as the set of vectors that all satisfy some angular/directional relationship (such as being contained in a plane), and the projection onto this subspace isolates the parts of a vector that satisfy this relationship. So it would seem that a fairly natural question to ask is, which operators will preserve the length and angular relationships between vectors?

The simplest operations on a space might be considered to be translation, rotation and dilation (and the identity mapping, which is not that interesting). Starting with dilation, suppose that $D_{\lambda}$ is an operator that takes a vector $x$ and transforms it to a vector of length $\lambda \| x\|$ in the same direction as $x$. Obviously, this operator acts as

$D_{\lambda} (x_1 ,x_2 ,x_3 ) = \sqrt{\lambda}(x_1 ,x_2 ,x_3 ).$

The operator can of course be written as a diagonal $3\times 3$ matrix with all entries equal to $\sqrt{\lambda}$. It is easily seen that unless $\lambda = 1$, this operator does not preserve inner products, that is, $\langle D_{\lambda }x, D_{\lambda}y \rangle \neq \langle x, y \rangle$. The same goes for translation, that is, an operator that translates any $(x_1, x_2, x_3 )$ to  $(x_1+ a, x_2+b, x_3 +c )$.

You may now complain that these operators do indeed preserve length and angular relationships, and they do to a certain extent. However, they do not preserve those relationships with the origin, which is why the lengths differ. The third kind of transformation does this. Any rotation $U$ on $\mathbb{R}^3$ can be seen to preserve inner products. When you think geometrically of the inner product in terms of the projection of one vector on another, it is clear that the absolute orientation of these vectors does not play a part. Furthermore, given that the rotation can be represented by a $3\times 3$ matrix, we can now deduce the defining property of a unitary operator. Denoting the transpose of a matrix $U$ by $U^{\#}$, we see that

$\langle x, y \rangle = \langle Ux ,Uy \rangle = \langle U^{\#} Ux, y \rangle = \langle x, U^{\#} Uy \rangle$

for all $x,y \in \mathbb{R}^3$.

Since this must hold for all relevant $x$ and $y$, it must be that $U^{\#}U = UU^{\#} = 1$. (This property will be satisfied by all matrices with determinant -1 or 1).

(Another nice example to consider is that of a rotation in a complex plane of, which is just a multiplication by a complex number of modulus 1.)

With some idea of what a unitary operator in this space is, we can try to make sense of the von Neumann ergodic theorem.

Theorem. Let $U$ be a unitary operator on a Hilbert space $H$. Let $P$ be the orthogonal projection onto the subspace $\{ x\in H: Ux = x\}$. Then, for any $x \in H$,

$\lim_{N\to \infty} \frac{1}{N}\sum_{n=0}^{N-1} U^n x = Px$.

Suppose that we were to apply this theorem to the case of rotations in $\mathbb{R}^2$, which is not much different from considering rotations in three dimensions. We immediately run into a problem. If $P$ is a rotation, what is an invariant subspace? Unless $P$ is a full rotation, only the origin is invariant, and a full rotation is the same as the identity operator! However, this does not imply that the content of the theorem is entirely void. Since the zero operator is the projection on the invariant subspace which is the origin, we see that for any $x \in \mathbb{R}^2$,

$\displaystyle \lim_{N\to \infty} \frac{1}{N}\sum_{n=0}^{N-1} U^n x =0.$

Hence, if the sum up the rotations (which are not full rotations) of any vector, we must get an average of $0$, which makes intuitive sense. In the case of rational rotations (i.e. rational mutiples of $2\pi$), it is not too hard to prove, and in the case of irrational ones it would seem sensible from the Weyl equidistribution theorem.

One of the first examples one usually encounters in ergodic theory is the shift operator. Since my introduction to ergodic theory came via Furstenberg’s proof of Szemeredi’s theorem, this felt natural. However, supposing that you are used to unitarity meaning rotation, is there a way to interpret the shift operator as such?

To make sense of this, we have to slightly expand our idea of unitarity. Consider the following operator acting on vectors in $\mathbb{R}^2$:

$\displaystyle A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}.$

It is easy to see that this matrix takes the unit vector $[1,0]$ to the unit vector $[0,1]$, and vice versa. Furthermore, the conjugate transpose of $A$ is itself, and $A^2 = 1$. Thus $A$ is unitary, and is in fact a permutation of the axes. Of course, in two dimensions there are only two permutations possible (it is the permutation group on two elements, after all!), so things will get more interesting in higher dimensions. (Exercise: Show that any permutation of axes in $\mathbb{R}^3$ is unitary. Can all of these permutations be seen as rotations?) In fact, any permutation of an orthonormal basis in a Hilbert space is unitary (think of how a permutation of the basis would affect the inner product). Thus, if we think of the shift operator as a permutation of a countable basis, we see that it fits into the general intuition of a unitary operator.

Clearly, there is still much to unpack here. What kind of unitary operators do we get on some common Hilbert spaces, such as $l^2$ or $L^2$? What does unitary equivalence mean? What does it mean when a unitary operator is self-adjoint? Posing questions like these is, I believe, more valuable than reading a single textbook’s introduction on the subject, because you’re trying to make sense of something in terms of what you already know, and you can allow curiosity to lead you. At some point, the concepts will take on a life of their own and your frame of reference will expand, ready to begin making sense of the next, more advanced concept.

Learning mathematics is a lot messier than most textbooks (and lectures) would have us believe. Embrace it.

# Hypergraph complexity

There are occasional, serendipitous events that make one suspect that there may be some benevolent animating power behind the universe. (This only lasts until the next annoying thing happens.) In August 2017, I was leaving San Francisco for Calgary after attending a workshop at AIM. On the flight, across the aisle from me, I saw someone editing what was clearly some mathematical papers. I introduced myself, and the other party turned out to be none other than Jacob Fox, on his way to a meeting in Banff. After parting ways, I started to look at some of his papers, and one immediately caught my eye, namely A relative Szemerédi theorem (with Conlon and Zhao). I had a problem that I’ve been working (and been stuck) on for a while, and this paper gave me a crucial theorem I needed to continue.

This paper also introduced me to the idea of hypergraph complexity, which I don’t yet understand, hence this post. We’ll start with some basic definitions and concepts.

Definition 1. Let $J$ be a set and for $H \subseteq J$, let $\binom {J}{r}$ denote the set of all $r$-element subsets of $J$. An $r$uniform hypergraph on $J$ is any subset of $H$.

It should be clear that this just generalises the notion of an edge as a choice or two elements, to a choice of an arbitrary (but constant) number of elements. Hypergraphs  give us some nice generalisations of Ramsey-type results, for example,

Theorem 1. [2] An infinite $m$-regular hypergraph contains an infinite clique or an infinite anticlique.

The nonstandard proof of this (which can be found in [2]) involves a beautiful use of iterated nonstandard extensions, a somewhat tricky (but very useful) subject I will probably address in a later post.

In what follows, we restrict our attention to finite hypergraphs. We firstly need to define a hypergraph system:

Definition 2. [1] A hypergraph system is a quadruple $V = (J, (V_j )_{j\in J}, r, H)$ where $J$ is a finite set, $(V_j )_{j\in J}$ is a collection of non-empty sets, $r\geq 1$ is a positive integer, and $H \subseteq \binom{J}{r}$ is an $r$-uniform hypergraph. For any $e\subseteq J$, set $V_e := \prod_{j\in e} V_j$.

What would be a reasonable hypergraph system, and what would it be good for? Since the concept occurs in a paper concern a Szemerédi-type theorem, it seems reasonable that they have something to do with arithmetic progressions. To make this clear, we’ll need the notion of a weighted hypergraph, which we’ll illustrate by means of an example. Set $J = \{ 1,2,3\}$, $r=2$, and $V_j = \mathbb{Z}_N$. The weights on $V$ are functions $g_e : V_e \to \mathbb{R}^{\geq 0}$. For instance, for the edge $(1,2)$ we have a function $g_{(1,2)}: \mathbb{Z}_N \times \mathbb{Z}_N \to \mathbb{R}^{\geq 0}$. Suppose that $A \subseteq \mathbb{Z}_N$, and let $\chi_A$ denote its indicator function. We define the functions

$g_{1,2}(x,y) = \chi_A (x-y)$

$g_{2,3}(x,y) = \chi_A (x+y)$

$g_{1,3}(x,y) = \chi_A (x).$

The product $g_{1,2}(x,y) g_{2,3}(x,y) g_{1,3}(x,y)$ will only be greater than $0$ if there are some $x,y \in \mathbb{Z}_N$ such that $x-y, x, x+y$ are in $A$. Equivalently, if

$\mathbb{E}_{x,y \in \mathbb{Z}_N} g_{1,2}(x,y) g_{2,3}(x,y) g_{1,3}(x,y) >0,$

$A$ will contain a 3-arithmetic progression.

This is a simplistic example, and not nearly sufficient to characterise the existence of 3APs. For instance, we make no mention of the issue of wrap-around progressions in the cyclic group $\mathbb{Z}_N$. What is more, one would ideally like to have a result which takes into account values of $N$ of arbitrary size. For appropriate linear forms conditions that work for any size of AP, the paper [1] should be consulted. For now, we continue with trying to get a handle on the complexity.

Suppose we are given a hypergraph system as above.

Definition 3. [1] For any set $e$ of size $r$ and any $E_e \subseteq V_e = \prod_{j\in e} V_j$, define the complexity of $E_e$ to be the smallest integer $T$ such that there is a partition of $E_e$ into $T$ sets $E_{e,1}, \dots ,E_{e,T}$ such that each $E_{e,i}$ is the set of $r$-cliques of some $(r-1)$-uniform hypergraph.

Thus, in the above example we would let $E_e$ be a subset of $\mathbb{Z}_N \times \mathbb{Z}_N$ for some $e \in H$. Certainly, an $(r-1)$-uniform hypergraph for $r=2$ is not the most scintillating mathematical object, but let us persist with this simple case.

To make sense of the latter part of the Definition 3, let us first discuss what the required cliques are (following [1]). If $x = (x_j)_{j\in J} \in V_j$ and $J' \subseteq J$, write $x_{J'} = (x_j)_{j\in J'} \in V_{J'}$ for the obvious projection of $x$ onto the coordinates $J'$. For any $e\subseteq J$, define

$\partial e = \{ f\subseteq e: |f| = |e|-1.\}$

In a 2-uniform hypergraph as above, for $e = (1,2)$, $\partial e = \{1,2\}$. Simply put, the boundary is seen as the set of edges of one dimension less that together uniquely determine the edge we are looking at.

If we now say that $E_{e,i}$ is the set of $(r-1)$-cliques of some $r$-uniform hypergraph, we mean that there is some $B_{f,i} \subseteq V_f$ for each $f\in \partial e$ so that

$\chi_{E_{e,i}} (x_e) = \prod_{f\in \partial e} \chi_{B_{f,i}}(x_f)$

for all $x_e \in V_e$.

To be fair, this does not make anything very clear, so let’s unpack this notion further with our very simple example of a (complete) 2-uniform hypergraph, by trying to understand what a satisfying notion of complexity should be. Taking the edge $\{ 1,2\}$ as before, and $V_1 = V_2 = \mathbb{N}\times \mathbb{N}$. Let the set $V_e = \{1,2,\dots ,20 \} \times \{1,3,5, \dots ,19\}$. We want sets $E_{e,i}$ and $B_{1,i} , B_{2,i} \subseteq \mathbb{N}$ so that

$\chi_{E_{e,i}} (x_e) = \chi_{B_{1,i}}(x_1) \times \chi_{B_{2,i}}(x_2)$

for all $x_e \in V_{\{ 1,2\} }$. The triviality of this example now becomes obvious – setting $B_{1,1} = \{1,2,\dots ,20\}$ and $B_{2,1} = \{1,3,\dots ,19\}$ gives us a complexity of 1. We could see this example as an instance of a complete bipartite graph between the components making up $V_e$, which is fully described by the Cartesian product of the two sets.

Let’s see if the same reasoning applies to 3-uniform hypergraphs. Let $e = \{ 1,2,3\}$ be an edge in some 3-uniform hypergraph, with the associated set $V_e = \mathbb{N}\times \mathbb{N} \times \mathbb{N}$. Let the subset $E_e$ be $\{1,2,\dots ,10\} \times \{1,3,5,7 ,9\} \times \{2,4,6,8\}.$ Now we need sets $E_{e ,1}, \dots , E_{e, T}$ and subsets $B_{f_1 ,i}, B_{f_2, i}, B_{f_3, i}$ of $V_{f_1 }, V_{f_2 }, V_{f_3 } = \mathbb{N}\times \mathbb{N}$ such that

$\chi_{E_{e,i}} (x_e) = \chi_{B_{f_1,i}}(x_1) \times \chi_{B_{2,i}}(x_2)\times \chi_{B_{f_3,i}}(x_3).$

A little consideration will show that the same trick employed above works here as well, because our set $V_e$ can be considered to be a complete 3-uniform tripartite graph on the parts $V_1, V_2, V_3$. It seems clear that we will need to be slightly more creative in order to get higher complexity.

Consider now the following subset of the above set $V_e$:

$E_e =\{ (1,3,2), (1,3,4), (2,5,7), (2, 5, 9), (4,7,6), (3,7,6), (3,7,5), (4, 7,5), (10, 9, 8), (10,9,1)\}.$

This set can’t be written as a single  Cartesian product of subsets of the sets we are considering, and seems slightly more random in character, so there is a hope that the complexity should be higher. Setting $f_1 = (1,2)$$f_2 = (2,3)$ and $f_3 = (1,2)$, we take

$B_{1,1} = \{(1,3)\}, B_{2,1} = \{ (3,2), (3,4) \} , B_{3,1} = \{(1,2), (1,4)\}$

$B_{1,2} = \{(2,5)\}, B_{2,2} = \{ (5,7), (5,9) \} , B_{3,2} = \{(2,7), (2,9)\}$

$B_{1,3} = \{(3,7), (4,7)\}, B_{2,3} = \{ (7,5), (7,6) \} , B_{3,3} = \{(3,5), (3,6), (4,5),(4,6)\}$

$B_{1,4} = \{(10,9)\}, B_{2,4} = \{ (9,1), (9,8) \} , B_{3,4} = \{(10,1), (10,8)\}$

(where the notation has been shortened in an obvious way). To avoid confusion, we note that there are elements $x,y,z \in B_{1,i}, B_{2,i}, B_{3,i}$ such that $\chi_{B_{1,i}}(x)\chi_{B_{2,i}}(y)\chi_{B_{3,i}}(z) = 1$, but which do not correspond to projections of an edge in $V_e$, and can therefore be ignored. We thus see that our set $E_e$ has a complexity of at most 4.

Of course, the case of a 3-uniform hypergraph is quite easy to analyse this way, because two of the edges determine the third, and the difficulty of generating such examples will increase rapidly for higher $r$.

Intuitively, the example at least makes clear that the complexity of the hypergraph is somehow determined by how easily the set $V_e$ can be described in terms of parts of a smaller “dimension”, and that highly random sets will have higher complexity (without specifying exactly what we mean by “random” here). The real application of this complexity is in hypergraph removal lemmas, which we’ll (possibly) discuss in another post.

Shameless self-promotion: If you enjoy Fourier series and Martin-Löf randomness (or Kolmogorov-Chaitin-Solomonoff randomness – take your pick), see my paper in APAL. For something similar, involving Carleson’s theorem and Schnorr randomness, see the paper by Franklin, McNicholl and Rute at https://arxiv.org/abs/1603.01778.

References:

1. A relative Szemerédi theorem, David Conlon, Jacob Fox and Yufei Zhao
2. Nonstandard Methods in Ramsey Theory and Combinatorial Number Theory, M Di Nasso, I Goldbring and M Lupini

# Typical elements 2

Since we’ve already discussed the use of simple element in the proof of the ergodic theorem, we will now address the existence of such elements.

Recall that we are working with a measure preserving system $([0,1]^{\mathbb{N}}, \mathcal{C}, T, \nu)$ as defined previously, where $T$ denotes the shift operator. We are looking for an element $\alpha$ of $[0,1]^{\mathbb{N}}$ such that

$\displaystyle \lim_{n\to \infty} \sum_{i=0}^{n-1} f(T^i \alpha ) = \int_{[0,1]^{\mathbb{N}}} f(y)d\nu .$

for any $f \in C([0,1]^{\mathbb{N}})$. First, take a sequence $\{ \alpha_n \}_{n \in \mathbb{N}}$ in $[0,1]^{\mathbb{N}}$ of elements  that are periodic; in other words, each element is of the form

$\displaystyle \alpha_n = (\alpha_{1}^{n} , \alpha_{2}^{n} , \dots , \alpha_{c_n }^{n} , \alpha_{1}^{n} , \alpha_{2}^{n} , \dots, \alpha_{c_n }^{n} , \dots )$

for some integer $c_n$. We set

$\displaystyle \mu_{\alpha_n } = \frac{1}{c_n } (\delta_{\alpha_n } + \delta_{T \alpha_n } + \cdots + \delta_{T^{c_{n-1} }\alpha_n } ),$

where  $\delta$ denotes the Dirac measure, as usual. We require that the $\mu_{\alpha_n}$ converge weakly to $\nu$ as $n\to \infty$.

Here we depart from the sequence of the proof in Kamae in order to first show how to construct such a sequence, before showing that we can use it to obtain a typical element. To show the existence of the sequence, it suffices to show that for any neigbourhood $U$ of $\nu$ in the weak topology of Borel measures on $[0,1]^{\mathbb{N}}$, we can find an element $\beta$ such that $\mu_{\beta}\in U$. The nature of the topology means that we can approximate the measure $\nu$ to an arbitrary degree by considering a finite amount of information. In other words, there are integers $m$, $n$ and some positive $\delta >0$ such that if $| \mu (B) - \nu (B) |<\delta$  for any subset $B$ of the form

$\displaystyle \left[ \frac{j_0}{m} , \frac{j_0 +1}{m} \right] \times \cdots \times \left[ \frac{j_{n-1}}{m} , \frac{j_{n-1}+1}{m} \right] \times [0,1] \times [0,1] \times \cdots$

where $j_0 , j_1 ,\dots ,j_{n-1}$ are integers ranging from $0$ to $m-1$, then $\mu \in U$.

Now set

$\displaystyle \Sigma = \left\{ \frac{1}{2m} , \frac{3}{2m} , \dots , \frac{2m-1}{2m} \right\} .$

We can define a measure $\lambda$ on $\Sigma^n$ by

$\displaystyle \lambda \left( \left( \frac{j_{0} + \frac{1}{2}}{m} , \dots , \frac{j_{n-1} +\frac{1}{2}}{m} \right) \right) = \nu \left( \left[ \frac{j_{0} }{m} , \frac{j_{0} +1}{m} \right] \times \dots \times \left[ \frac{j_{n-1} }{m} , \frac{j_{n-1} +1}{m} \right] \times [0,1] \times \dots \right)$

for the $j_i$ in the same range as before. We can see $\lambda$ as an encoding of the measure $\nu$ on cylinder sets of the form we are considering, and will clearly be used to construct a measure in the neighbourhood of $\nu$, once we have tweaked it a little.

We are given that $\nu$ is $T$-invariant. From this, we can conclude that

$\displaystyle \sum_{\xi_0 \in \Sigma} \lambda ((\xi_0 , \chi_1 ,\dots , \xi_{n-1} )) =\sum_{\xi_0 \in \Sigma} \lambda ((\xi_1 , \dots , \xi_{n-1} , \xi_{0} )). \quad (*)$

for any $\xi_1 , \dots \xi_{n-1} \in \Sigma$. Why? We have that

$\displaystyle \bigcup_{\xi_0 \in \Sigma} \left[ \frac{j_0}{m} , \frac{j_0 +1}{m} \right] \times \cdots \times \left[ \frac{j_{n-1} }{m} , \frac{j_{n-1} +1}{m} \right] \times [0,1] \times [0,1] \times \cdots \\ = T^{-1} \left( \left[ \frac{j_1 }{m} , \frac{j_1 +1}{m} \right] \times \cdots \times \left[ \frac{j_{n-1} }{m} , \frac{j_{n-1} +1}{m} \right] \times [0,1] \times [0,1] \times \cdots \right) ,$

and the latter term is equal to

$\displaystyle \bigcup_{\xi_0 \in \Sigma} \left( \left[ \frac{j_1 }{m} , \frac{j_1 +1}{m} \right] \times \cdots \times \left[ \frac{j_{n-1} }{m} , \frac{j_{n-1} +1}{m} \right] \times \left[ \frac{j_0 }{m} , \frac{j_0 +1}{m} \right] \times [0,1] \times \cdots \right) .$

From $\lambda$, we want to obtain a measure $\nu$ on $\Sigma^n$ which satisfies:

(1) For any $\xi \in \Sigma^n$, $N\eta (\xi)$ is a positive integer such that $|\lambda (\xi) - \eta (\xi) |<\delta$, where $\delta$ has been specified earlier.

(2) The relation $(*)$ holds for $\eta$ in the place of $\lambda$.

These do not seem like very great demands on $\eta$, but we should still confirm that it can satisfy them. The first condition above requires that all the values $\eta (\xi)$ are rational. This is not much of a challenge, because of the denseness of the rationals. We must just make sure that the adjustment from $\lambda$ to $\eta$ does not violate condition (1). We therefore need to make sure, also, that the rationals we pick not only satisfy $(*)$, but also

$\displaystyle \sum_{\xi_0 , \xi_1 , \dots , \xi_{n-1}\in \Sigma} \eta (\xi_0, \dots, \xi_{n-1}) =1 \quad (**)$

(obviously, since $\eta$ must still be a probability). Since there are only a finite number of elements of the measure space, we can ensure first that the latter condition holds, then ensure the validity of $(*)$ by more adjustments, making sure any adjustments upwards are balanced by adjustments downwards to guarantee $(**)$ This brings into question whether there will not be one of these adjustments that will turn the measure of one of these elements negative. But since we’re only dealing with a finite number of points of the measure space, and our adjustments are arbitrarily small, we must just make sure that none of the adjustments are as big as the least measure of any of the elements. If there are any elements of zero measure, we adjust the measure $\lambda$ to avoid that, whilst still remaining close to $\latex \nu$.

Now pick a longest sequence $\xi^0, \xi^1, \dots ,\xi^{r-1}$ of elements of $\Sigma^n$ such that

1. $(\xi^{i}_{1}, \dots ,\xi^{i}_{n-1}) = (\xi^{i+1}_{0},\dots ,\xi^{i+1}_{n-2}$ for any $i=0,1, \dots ,r-1$ and
2. $|\{ i: 0\leq i < r, \xi^i = \xi \}| \leq N\eta(\xi)$ for any $\xi \in \Sigma^n$,

where $\xi^i = (\xi^{i}_{0}, \dots , \xi^{i}_{n-1})$ and $\xi^r = \xi^0$.

The first condition clearly has to do with the translation operator, and requires that the $i$-th element of $\Sigma^n$ is the translate of the $(i+1)$-th element, with the last term of the element not dependent on the next element. Notice that there is nothing in the first condition which prevents “looping” the same sequence over and over again. If such were allowed, we would not be able to speak of the longest sequence, and so the second condition is introduced in order to limit the lengths.

Now, suppose that equality does not hold in (2), for some $\xi$, remembering that $N\eta (\xi)$ is an integer, and also supposing that $\eta (\xi) \neq 0$. In that case, we will be able to extend the sequence to increase the term on the left-hand side (by using closed loops), contradicting the assumption of the longest sequence. Thus, we can conclude that equality holds in (2) for each $\xi$.

We define a periodic element $\beta$ of $[0,1]^{\mathbb{N}}$ with period $n+r-1$ by

$\displaystyle (\beta(0), \beta(1), \dots ,\beta(n+r-2)) = (\xi_{0}^{0}, \xi_{1}^{0}, \dots, \xi_{n-1}^{0}, \xi_{n-1}^{1}, \xi_{n-1}^{2},\dots ,\xi_{n-1}^{r-1})$

Thus, we have found a periodic element in the neighbourhood $U$ of $\nu$ for each $U$. Taking neighbourhoods $U_n$ of $\nu$ such that $\cap_{n} U_n = \nu$, denote each corresponding periodic element by $\alpha_n$. For $t_1, t_2, \dots$ a sequence of positive integers increasing sufficiently fast, set

$\displaystyle \alpha(n) = \alpha_m (n-T_m)$

for any $n\in \mathbb{N}$ with $T_m \leq n \leq T_{m+1}$, where $T_0 = 0$ and $T_i = T_{i-1} +c_i t_i$ (recall that $c_n$ denotes the period of $\alpha_n$). This gives us our desired typical element, and completes the argument.

References:

Kamae, T., A simple proof of the ergodic theorem using nonstandard analysis, Israel Journal of Mathematics, Vol. 42, No. 4, 1982

# Focus, focus, focus

This is the sine qua non of mathematics. Yet even though many books have been written about this, and I have been doing maths for many years, it is still an elusive beast. My younger self would have agreed to its importance, but for some reason I have spent years trying to get by in my chosen profession without really developing this fundamental skill.

Our society values focus without encouraging it. Much is made of attention deficits in children, but how many adults work in surroundings that require absolute focus? We require children to sit quietly and pay attention in boring classes all day, but very few jobs would require that of an adult. I love learning, and I certainly enjoyed some classes at school, but most of the time I was bored out of my mind and kept quiet because I was afraid of getting low marks (which would ruin one’s life, I was told) or of getting sent to the principal or one of his minions, with the spectre of a caning always looming.  However, if you look at how so many office workers conduct their day, it is replete with interruptions, e-mails, Facebook posts, meetings, phone conversations, etc. How many professional people can go an hour without checking either their phone or their e-mail?

I would go so far as to say that many jobs actively discourage focus. The grown-up version of not doing your homework is not replying to your boss’s e-mail quickly enough, or missing a call from a client.

A lot of what I do is stultifying – student admissions, tuition meetings and administrative matters which have me squirming in my seat (and checking my e-mail). I am fortunate enough though that most of my job involves things I like. Working out second-year calculus problems may sometimes be a little frustrating if I want to get to research, but fundamentally I enjoy doing maths problems and it is often a relief to do something easy. Also, I have to admit, the boring, less challenging stuff has its own attraction. It is easier to have a discussion on the syllabus of some course when you are well and truly stuck on a problem; you at least feel that you have earned part of your keep.

And that is the trap, one I must confess to having fallen into. Bureaucracy increases exponentially – the more bureaucrats there are, the more bureaucracy they generate. As such, I am never without some task awaiting my attention or ten e-mails waiting to be answered. Most of these tasks do not take a lot of time. They can usually be done in less than half an hour, and you have that slight feeling of accomplishment at having something you can cross off your list. I think that continually facing such things has attenuated my mathematical attention. I cannot remember the last time I have sat for hours on a single paper or book, completely engrossed in trying to understand it. I do maths here and there, correcting this paper, moving on to clearing my inbox, then trying to read a new theorem, remembering something urgent about an exam paper that I have to discuss with a colleague, and so on. When your mind operates thus for a while, it becomes ingrained, and it gets uncomfortable when you focus on one thing for too long.

Clearly, once recognised this problem could not be allowed to persist. Lines had to be drawn in the sand. The very first thing I did was to dedicate the first hour of my working day to reading mathematics with the fullest concentration I could muster. I have previously mentioned the book on nonstandard number theory I am reading, and decided to make that my focus.  Unless summoned to a meeting I would do this directly after arriving at work and clocking in (literally – at our university you have to). I set aside a desk in my office which has nothing on it but some scratch paper. This part is very important – when I work at my desk with my computer, I will inevitable glance up at it from time to time, and that breaks the flow. Next, I put on wireless headphones with very calm music, something which manages to block external sounds but is not itself interesting enough to distract you. I sometimes use the sounds of ocean waves or rain falling, to the same effect. For that one hour, I don’t have to check my phone and don’t get up at all.

The first realisation from this practice was that concentration is its own reward. Giving myself the mental space to devote solely to mathematics not only reminded me of why I do it, but the very act of being intensely focused is extremely pleasant. I’m not saying it is always particularly easy, but it is worth it. Soon after starting to do this every morning, I found myself craving it. The clarity of mind during that hour is wonderfully refreshing. You have one thing to do, and only that. And when you get to the rest of the day, with its e-mails and interruptions, you have already learned something valuable, and done it well.

This worked so well that I found myself sneaking in extra hours of concentrated work whenever I could manage. It should be mentioned that this does not work as well with shorter periods of time. Thirty minutes is still not useless, but because the end is in sight, I find that I do not immerse myself as deeply. I often find myself longing to go on at the end of my hour, but do not push it to the point of fatigue. If I still feel eager to go on, I am more likely to create more time for it later. It also helps to be in the middle of an interesting proof, so I can’t wait to get back to it.

Cultivating mathematical attention is not as simple as this, though. Reading and research are two different animals. Both are creative activities, since learning anything well cannot be done passively. But the same method does not quite work for doing original mathematics, and this I consider the difficult bit. Since this deserves an entire post of its own, I will continue this later (after a maths-related post, of course).

# Almost everything is wrong

Depending on your level of cynicism, this video either shows you one of the great problems with scientific publication, or shows you how to be a successful scientist…

For those interested in exploring the murky waters of academic publishing in more detail, Timothy Gowers has some opinions on the subject.

# The Laws of Meetings

1. They who schedule many meetings have obscenely little of value to do. They who enjoy meetings have nothing of value to do.
2. When you feel you have to schedule a meeting, you have already failed. Go into it with this knowledge, and atone.
3. If you schedule a meeting to accomplish anything that could have been done with free project software or otherwise accomplished online, you have failed doubly.
4. Capacity for productive work falls exponentially during a meeting, but only increases logarithmically afterwards.
5. If you have called the meeting and speak more than all others attending combined, it is not a meeting.
6. It is okay to go to a meeting just for the snacks.

# Pi day

$\pi$ day is not the only reason to celebrate today. Remember that it is also Einstein’s birthday!

In order to learn something new about $\pi$, go read this lovely post by John Baez.