# Hypergraph complexity

There are occasional, serendipitous events that make one suspect that there may be some benevolent animating power behind the universe. (This only lasts until the next annoying thing happens.) In August 2017, I was leaving San Francisco for Calgary after attending a workshop at AIM. On the flight, across the aisle from me, I saw someone editing what was clearly some mathematical papers. I introduced myself, and the other party turned out to be none other than Jacob Fox, on his way to a meeting in Banff. After parting ways, I started to look at some of his papers, and one immediately caught my eye, namely A relative Szemerédi theorem (with Conlon and Zhao). I had a problem that I’ve been working (and been stuck) on for a while, and this paper gave me a crucial theorem I needed to continue.

This paper also introduced me to the idea of hypergraph complexity, which I don’t yet understand, hence this post. We’ll start with some basic definitions and concepts.

Definition 1. Let $J$ be a set and for $H \subseteq J$, let $\binom {J}{r}$ denote the set of all $r$-element subsets of $J$. An $r$uniform hypergraph on $J$ is any subset of $H$.

It should be clear that this just generalises the notion of an edge as a choice or two elements, to a choice of an arbitrary (but constant) number of elements. Hypergraphs  give us some nice generalisations of Ramsey-type results, for example,

Theorem 1. [2] An infinite $m$-regular hypergraph contains an infinite clique or an infinite anticlique.

The nonstandard proof of this (which can be found in [2]) involves a beautiful use of iterated nonstandard extensions, a somewhat tricky (but very useful) subject I will probably address in a later post.

In what follows, we restrict our attention to finite hypergraphs. We firstly need to define a hypergraph system:

Definition 2. [1] A hypergraph system is a quadruple $V = (J, (V_j )_{j\in J}, r, H)$ where $J$ is a finite set, $(V_j )_{j\in J}$ is a collection of non-empty sets, $r\geq 1$ is a positive integer, and $H \subseteq \binom{J}{r}$ is an $r$-uniform hypergraph. For any $e\subseteq J$, set $V_e := \prod_{j\in e} V_j$.

What would be a reasonable hypergraph system, and what would it be good for? Since the concept occurs in a paper concern a Szemerédi-type theorem, it seems reasonable that they have something to do with arithmetic progressions. To make this clear, we’ll need the notion of a weighted hypergraph, which we’ll illustrate by means of an example. Set $J = \{ 1,2,3\}$, $r=2$, and $V_j = \mathbb{Z}_N$. The weights on $V$ are functions $g_e : V_e \to \mathbb{R}^{\geq 0}$. For instance, for the edge $(1,2)$ we have a function $g_{(1,2)}: \mathbb{Z}_N \times \mathbb{Z}_N \to \mathbb{R}^{\geq 0}$. Suppose that $A \subseteq \mathbb{Z}_N$, and let $\chi_A$ denote its indicator function. We define the functions

$g_{1,2}(x,y) = \chi_A (x-y)$

$g_{2,3}(x,y) = \chi_A (x+y)$

$g_{1,3}(x,y) = \chi_A (x).$

The product $g_{1,2}(x,y) g_{2,3}(x,y) g_{1,3}(x,y)$ will only be greater than $0$ if there are some $x,y \in \mathbb{Z}_N$ such that $x-y, x, x+y$ are in $A$. Equivalently, if

$\mathbb{E}_{x,y \in \mathbb{Z}_N} g_{1,2}(x,y) g_{2,3}(x,y) g_{1,3}(x,y) >0,$

$A$ will contain a 3-arithmetic progression.

This is a simplistic example, and not nearly sufficient to characterise the existence of 3APs. For instance, we make no mention of the issue of wrap-around progressions in the cyclic group $\mathbb{Z}_N$. What is more, one would ideally like to have a result which takes into account values of $N$ of arbitrary size. For appropriate linear forms conditions that work for any size of AP, the paper [1] should be consulted. For now, we continue with trying to get a handle on the complexity.

Suppose we are given a hypergraph system as above.

Definition 3. [1] For any set $e$ of size $r$ and any $E_e \subseteq V_e = \prod_{j\in e} V_j$, define the complexity of $E_e$ to be the smallest integer $T$ such that there is a partition of $E_e$ into $T$ sets $E_{e,1}, \dots ,E_{e,T}$ such that each $E_{e,i}$ is the set of $r$-cliques of some $(r-1)$-uniform hypergraph.

Thus, in the above example we would let $E_e$ be a subset of $\mathbb{Z}_N \times \mathbb{Z}_N$ for some $e \in H$. Certainly, an $(r-1)$-uniform hypergraph for $r=2$ is not the most scintillating mathematical object, but let us persist with this simple case.

To make sense of the latter part of the Definition 3, let us first discuss what the required cliques are (following [1]). If $x = (x_j)_{j\in J} \in V_j$ and $J' \subseteq J$, write $x_{J'} = (x_j)_{j\in J'} \in V_{J'}$ for the obvious projection of $x$ onto the coordinates $J'$. For any $e\subseteq J$, define

$\partial e = \{ f\subseteq e: |f| = |e|-1.\}$

In a 2-uniform hypergraph as above, for $e = (1,2)$, $\partial e = \{1,2\}$. Simply put, the boundary is seen as the set of edges of one dimension less that together uniquely determine the edge we are looking at.

If we now say that $E_{e,i}$ is the set of $(r-1)$-cliques of some $r$-uniform hypergraph, we mean that there is some $B_{f,i} \subseteq V_f$ for each $f\in \partial e$ so that

$\chi_{E_{e,i}} (x_e) = \prod_{f\in \partial e} \chi_{B_{f,i}}(x_f)$

for all $x_e \in V_e$.

To be fair, this does not make anything very clear, so let’s unpack this notion further with our very simple example of a (complete) 2-uniform hypergraph, by trying to understand what a satisfying notion of complexity should be. Taking the edge $\{ 1,2\}$ as before, and $V_1 = V_2 = \mathbb{N}\times \mathbb{N}$. Let the set $V_e = \{1,2,\dots ,20 \} \times \{1,3,5, \dots ,19\}$. We want sets $E_{e,i}$ and $B_{1,i} , B_{2,i} \subseteq \mathbb{N}$ so that

$\chi_{E_{e,i}} (x_e) = \chi_{B_{1,i}}(x_1) \times \chi_{B_{2,i}}(x_2)$

for all $x_e \in V_{\{ 1,2\} }$. The triviality of this example now becomes obvious – setting $B_{1,1} = \{1,2,\dots ,20\}$ and $B_{2,1} = \{1,3,\dots ,19\}$ gives us a complexity of 1. We could see this example as an instance of a complete bipartite graph between the components making up $V_e$, which is fully described by the Cartesian product of the two sets.

Let’s see if the same reasoning applies to 3-uniform hypergraphs. Let $e = \{ 1,2,3\}$ be an edge in some 3-uniform hypergraph, with the associated set $V_e = \mathbb{N}\times \mathbb{N} \times \mathbb{N}$. Let the subset $E_e$ be $\{1,2,\dots ,10\} \times \{1,3,5,7 ,9\} \times \{2,4,6,8\}.$ Now we need sets $E_{e ,1}, \dots , E_{e, T}$ and subsets $B_{f_1 ,i}, B_{f_2, i}, B_{f_3, i}$ of $V_{f_1 }, V_{f_2 }, V_{f_3 } = \mathbb{N}\times \mathbb{N}$ such that

$\chi_{E_{e,i}} (x_e) = \chi_{B_{f_1,i}}(x_1) \times \chi_{B_{2,i}}(x_2)\times \chi_{B_{f_3,i}}(x_3).$

A little consideration will show that the same trick employed above works here as well, because our set $V_e$ can be considered to be a complete 3-uniform tripartite graph on the parts $V_1, V_2, V_3$. It seems clear that we will need to be slightly more creative in order to get higher complexity.

Consider now the following subset of the above set $V_e$:

$E_e =\{ (1,3,2), (1,3,4), (2,5,7), (2, 5, 9), (4,7,6), (3,7,6), (3,7,5), (4, 7,5), (10, 9, 8), (10,9,1)\}.$

This set can’t be written as a single  Cartesian product of subsets of the sets we are considering, and seems slightly more random in character, so there is a hope that the complexity should be higher. Setting $f_1 = (1,2)$$f_2 = (2,3)$ and $f_3 = (1,2)$, we take

$B_{1,1} = \{(1,3)\}, B_{2,1} = \{ (3,2), (3,4) \} , B_{3,1} = \{(1,2), (1,4)\}$

$B_{1,2} = \{(2,5)\}, B_{2,2} = \{ (5,7), (5,9) \} , B_{3,2} = \{(2,7), (2,9)\}$

$B_{1,3} = \{(3,7), (4,7)\}, B_{2,3} = \{ (7,5), (7,6) \} , B_{3,3} = \{(3,5), (3,6), (4,5),(4,6)\}$

$B_{1,4} = \{(10,9)\}, B_{2,4} = \{ (9,1), (9,8) \} , B_{3,4} = \{(10,1), (10,8)\}$

(where the notation has been shortened in an obvious way). To avoid confusion, we note that there are elements $x,y,z \in B_{1,i}, B_{2,i}, B_{3,i}$ such that $\chi_{B_{1,i}}(x)\chi_{B_{2,i}}(y)\chi_{B_{3,i}}(z) = 1$, but which do not correspond to projections of an edge in $V_e$, and can therefore be ignored. We thus see that our set $E_e$ has a complexity of at most 4.

Of course, the case of a 3-uniform hypergraph is quite easy to analyse this way, because two of the edges determine the third, and the difficulty of generating such examples will increase rapidly for higher $r$.

Intuitively, the example at least makes clear that the complexity of the hypergraph is somehow determined by how easily the set $V_e$ can be described in terms of parts of a smaller “dimension”, and that highly random sets will have higher complexity (without specifying exactly what we mean by “random” here). The real application of this complexity is in hypergraph removal lemmas, which we’ll (possibly) discuss in another post.

Shameless self-promotion: If you enjoy Fourier series and Martin-Löf randomness (or Kolmogorov-Chaitin-Solomonoff randomness – take your pick), see my paper in APAL. For something similar, involving Carleson’s theorem and Schnorr randomness, see the paper by Franklin, McNicholl and Rute at https://arxiv.org/abs/1603.01778.

References:

1. A relative Szemerédi theorem, David Conlon, Jacob Fox and Yufei Zhao
2. Nonstandard Methods in Ramsey Theory and Combinatorial Number Theory, M Di Nasso, I Goldbring and M Lupini

# Typical elements 2

Since we’ve already discussed the use of simple element in the proof of the ergodic theorem, we will now address the existence of such elements.

Recall that we are working with a measure preserving system $([0,1]^{\mathbb{N}}, \mathcal{C}, T, \nu)$ as defined previously, where $T$ denotes the shift operator. We are looking for an element $\alpha$ of $[0,1]^{\mathbb{N}}$ such that

$\displaystyle \lim_{n\to \infty} \sum_{i=0}^{n-1} f(T^i \alpha ) = \int_{[0,1]^{\mathbb{N}}}f(y)d\nu.$

for any $f \in C([0,1]^{\mathbb{N}})$. First, take a sequence $\{ \alpha_n \}_{n \in \mathbb{N}}$ in $[0,1]^{\mathbb{N}}$ of elements  that are periodic; in other words, each element is of the form

$\displaystyle \alpha_n = (\alpha_{1}^{n}, \alpha_{2}^{n}, \dots , \alpha_{c_n }^{n}, \alpha_{1}^{n}, \alpha_{2}^{n}, \dots, \alpha_{c_n }^{n}, \dots )$

for some integer $c_n$. We set

$\displaystyle \mu_{\alpha_n } = \frac{1}{c_n }(\delta_{\alpha_n }+ \delta_{T \alpha_n }+ \cdots + \delta_{T^{c_{n-1} }\alpha_n }),$

where  $\delta$ denotes the Dirac measure, as usual. We require that the $\mu_{\alpha_n}$ converge weakly to $\nu$ as $n\to \infty$.

Here we depart from the sequence of the proof in Kamae in order to first show how to construct such a sequence, before showing that we can use it to obtain a typical element. To show the existence of the sequence, it suffices to show that for any neigbourhood $U$ of $\nu$ in the weak topology of Borel measures on $[0,1]^{\mathbb{N}}$, we can find an element $\beta$ such that $\mu_{\beta}\in U$. The nature of the topology means that we can approximate the measure $\nu$ to an arbitrary degree by considering a finite amount of information. In other words, there are integers $m$, $n$ and some positive $\delta >0$ such that if $| \mu (B) - \nu (B) |<0$  for any subset $B$ of the form

$\displaystyle \left[ \frac{j_0}{m}, \frac{j_0 +1}{m}\right] \times \cdots \times \left[ \frac{j_{n-1}}{m}, \frac{j_{n-1}+1}{m}\right] \times [0,1] \times [0,1] \times \cdots$

where $j_0 , j_1 ,\dots ,j_{n-1}$ are integers ranging from $0$ to $m-1$, then $\mu \in U$.

Now set

$\displaystyle \Sigma = \left\{ \frac{1}{2m}, \frac{3}{2m}, \dots, \frac{2m-1}{2m} \right\}.$

We can define a measure $\lambda$ on $\Sigma^n$ by

$\displaystyle \lambda \left( \left( \frac{j_{0} + \frac{1}{2}}{m}, \dots , \frac{j_{n-1} +\frac{1}{2}}{m} \right) \right) = \nu \left( \left[ \frac{j_{0} }{m}, \frac{j_{0} +1}{m} \right] \times \dots \times \left[ \frac{j_{n-1} }{m}, \frac{j_{n-1} +1}{m} \right] \times [0,1] \times \dots \right)$

for the $j_i$ in the same range as before. We can see $\lambda$ as an encoding of the measure $\nu$ on cylinder sets of the form we are considering, and will clearly be used to construct a measure in the neighbourhood of $\nu$, once we have tweaked it a little.

We are given that $\nu$ is $T$-invariant. From this, we can conclude that

$\displaystyle \sum_{\xi_0 \in \Sigma} \lambda ((\xi_0, \chi_1 ,\dots , \xi_{n-1})) =\sum_{\xi_0 \in \Sigma} \lambda ((\xi_1, \dots , \xi_{n-1} , \xi_{0})).\quad (*)$

for any $\xi_1 , \dots \xi_{n-1} \in \Sigma$. Why? We have that

$\displaystyle \bigcup_{\xi_0 \in \Sigma} \left[ \frac{j_0}{m}, \frac{j_0 +1}{m}\right] \times \cdots \times \left[ \frac{j_{n-1} }{m}, \frac{j_{n-1} +1}{m}\right] \times [0,1] \times [0,1] \times \cdots \\ = T^{-1} \left( \left[ \frac{j_1 }{m}, \frac{j_1 +1}{m}\right] \times \cdots \times \left[ \frac{j_{n-1} }{m}, \frac{j_{n-1} +1}{m}\right] \times [0,1] \times [0,1] \times \cdots \right),$

and the latter term is equal to

$\displaystyle \bigcup_{\xi_0 \in \Sigma} \left( \left[ \frac{j_1 }{m}, \frac{j_1 +1}{m}\right] \times \cdots \times \left[ \frac{j_{n-1} }{m}, \frac{j_{n-1} +1}{m}\right] \times \left[ \frac{j_0 }{m}, \frac{j_0 +1}{m}\right] \times [0,1] \times \cdots \right).$

From $\lambda$, we want to obtain a measure $\nu$ on $\Sigma^n$ which satisfies:

(1) For any $\xi \in \Sigma^n$, $N\eta (\xi)$ is a positive integer such that $|\lambda (\xi) - \eta (\xi) |<\delta$, where $\delta$ has been specified earlier.

(2) The relation $(*)$ holds for $\eta$ in the place of $\lambda$.

These do not seem like very great demands on $\eta$, but we should still confirm that it can satisfy them. The first condition above requires that all the values $\eta (\xi)$ are rational. This is not much of a challenge, because of the denseness of the rationals. We must just make sure that the adjustment from $\lambda$ to $\eta$ does not violate condition (1). We therefore need to make sure, also, that the rationals we pick not only satisfy $(*)$, but also

$\displaystyle \sum_{\xi_0 , \xi_1 , \dots , \xi_{n-1}\in \Sigma} \eta (\xi_0, \dots, \xi_{n-1}) =1 \quad (**)$

(obviously, since $\eta$ must still be a probability). Since there are only a finite number of elements of the measure space, we can ensure first that the latter condition holds, then ensure the validity of $(*)$ by more adjustments, making sure any adjustments upwards are balanced by adjustments downwards to guarantee $(**)$ This brings into question whether there will not be one of these adjustments that will turn the measure of one of these elements negative. But since we’re only dealing with a finite number of points of the measure space, and our adjustments are arbitrarily small, we must just make sure that none of the adjustments are as big as the least measure of any of the elements. If there are any elements of zero measure, we adjust the measure $\lambda$ to avoid that, whilst still remaining close to $\latex \nu$.

Now pick a longest sequence $\xi^0, \xi^1, \dots ,\xi^{r-1}$ of elements of $\Sigma^n$ such that

1. $(\xi^{i}_{1}, \dots ,\xi^{i}_{n-1}) = (\xi^{i+1}_{0},\dots ,\xi^{i+1}_{n-2}$ for any $i=0,1, \dots ,r-1$ and
2. $|\{ i: 0\leq i < r, \xi^i = \xi \}| \leq N\eta(\xi)$ for any $\xi \in \Sigma^n$,

where $\xi^i = (\xi^{i}_{0}, \dots , \xi^{i}_{n-1})$ and $\xi^r = \xi^0$.

The first condition clearly has to do with the translation operator, and requires that the $i$-th element of $\Sigma^n$ is the translate of the $(i+1)$-th element, with the last term of the element not dependent on the next element. Notice that there is nothing in the first condition which prevents “looping” the same sequence over and over again. If such were allowed, we would not be able to speak of the longest sequence, and so the second condition is introduced in order to limit the lengths.

Now, suppose that equality does not hold in (2), for some $\xi$, remembering that $N\eta (\xi)$ is an integer, and also supposing that $\eta (\xi) \neq 0$. In that case, we will be able to extend the sequence to increase the term on the left-hand side (by using closed loops), contradicting the assumption of the longest sequence. Thus, we can conclude that equality holds in (2) for each $\xi$.

We define a periodic element $\beta$ of $[0,1]^{\mathbb{N}}$ with period $n+r-1$ by

$\displaystyle (\beta(0), \beta(1), \dots ,\beta(n+r-2)) = (\xi_{0}^{0}, \xi_{1}^{0}, \dots, \xi_{n-1}^{0}, \xi_{n-1}^{1}, \xi_{n-1}^{2},\dots ,\xi_{n-1}^{r-1})$

Thus, we have found a periodic element in the neighbourhood $U$ of $\nu$ for each $U$. Taking neighbourhoods $U_n$ of $\nu$ such that $\cap_{n} U_n = \nu$, denote each corresponding periodic element by $\alpha_n$. For $t_1, t_2, \dots$ a sequence of positive integers increasing sufficiently fast, set

$\displaystyle \alpha(n) = \alpha_m (n-T_m)$

for any $n\in \mathbb{N}$ with $T_m \leq n \leq T_{m+1}$, where $T_0 = 0$ and $T_i = T_{i-1} +c_i t_i$ (recall that $c_n$ denotes the period of $\alpha_n$). This gives us our desired typical element, and completes the argument.

References:

Kamae, T., A simple proof of the ergodic theorem using nonstandard analysis, Israel Journal of Mathematics, Vol. 42, No. 4, 1982

# Focus, focus, focus

This is the sine qua non of mathematics. Yet even though many books have been written about this, and I have been doing maths for many years, it is still an elusive beast. My younger self would have agreed to its importance, but for some reason I have spent years trying to get by in my chosen profession without really developing this fundamental skill.

Our society values focus without encouraging it. Much is made of attention deficits in children, but how many adults work in surroundings that require absolute focus? We require children to sit quietly and pay attention in boring classes all day, but very few jobs would require that of an adult. I love learning, and I certainly enjoyed some classes at school, but most of the time I was bored out of my mind and kept quiet because I was afraid of getting low marks (which would ruin one’s life, I was told) or of getting sent to the principal or one of his minions, with the spectre of a caning always looming.  However, if you look at how so many office workers conduct their day, it is replete with interruptions, e-mails, Facebook posts, meetings, phone conversations, etc. How many professional people can go an hour without checking either their phone or their e-mail?

I would go so far as to say that many jobs actively discourage focus. The grown-up version of not doing your homework is not replying to your boss’s e-mail quickly enough, or missing a call from a client.

A lot of what I do is stultifying – student admissions, tuition meetings and administrative matters which have me squirming in my seat (and checking my e-mail). I am fortunate enough though that most of my job involves things I like. Working out second-year calculus problems may sometimes be a little frustrating if I want to get to research, but fundamentally I enjoy doing maths problems and it is often a relief to do something easy. Also, I have to admit, the boring, less challenging stuff has its own attraction. It is easier to have a discussion on the syllabus of some course when you are well and truly stuck on a problem; you at least feel that you have earned part of your keep.

And that is the trap, one I must confess to having fallen into. Bureaucracy increases exponentially – the more bureaucrats there are, the more bureaucracy they generate. As such, I am never without some task awaiting my attention or ten e-mails waiting to be answered. Most of these tasks do not take a lot of time. They can usually be done in less than half an hour, and you have that slight feeling of accomplishment at having something you can cross off your list. I think that continually facing such things has attenuated my mathematical attention. I cannot remember the last time I have sat for hours on a single paper or book, completely engrossed in trying to understand it. I do maths here and there, correcting this paper, moving on to clearing my inbox, then trying to read a new theorem, remembering something urgent about an exam paper that I have to discuss with a colleague, and so on. When your mind operates thus for a while, it becomes ingrained, and it gets uncomfortable when you focus on one thing for too long.

Clearly, once recognised this problem could not be allowed to persist. Lines had to be drawn in the sand. The very first thing I did was to dedicate the first hour of my working day to reading mathematics with the fullest concentration I could muster. I have previously mentioned the book on nonstandard number theory I am reading, and decided to make that my focus.  Unless summoned to a meeting I would do this directly after arriving at work and clocking in (literally – at our university you have to). I set aside a desk in my office which has nothing on it but some scratch paper. This part is very important – when I work at my desk with my computer, I will inevitable glance up at it from time to time, and that breaks the flow. Next, I put on wireless headphones with very calm music, something which manages to block external sounds but is not itself interesting enough to distract you. I sometimes use the sounds of ocean waves or rain falling, to the same effect. For that one hour, I don’t have to check my phone and don’t get up at all.

The first realisation from this practice was that concentration is its own reward. Giving myself the mental space to devote solely to mathematics not only reminded me of why I do it, but the very act of being intensely focused is extremely pleasant. I’m not saying it is always particularly easy, but it is worth it. Soon after starting to do this every morning, I found myself craving it. The clarity of mind during that hour is wonderfully refreshing. You have one thing to do, and only that. And when you get to the rest of the day, with its e-mails and interruptions, you have already learned something valuable, and done it well.

This worked so well that I found myself sneaking in extra hours of concentrated work whenever I could manage. It should be mentioned that this does not work as well with shorter periods of time. Thirty minutes is still not useless, but because the end is in sight, I find that I do not immerse myself as deeply. I often find myself longing to go on at the end of my hour, but do not push it to the point of fatigue. If I still feel eager to go on, I am more likely to create more time for it later. It also helps to be in the middle of an interesting proof, so I can’t wait to get back to it.

Cultivating mathematical attention is not as simple as this, though. Reading and research are two different animals. Both are creative activities, since learning anything well cannot be done passively. But the same method does not quite work for doing original mathematics, and this I consider the difficult bit. Since this deserves an entire post of its own, I will continue this later (after a maths-related post, of course).

# Typical elements

I promised myself that I would make at least every second post a proper mathematical one, so here goes… In fact, I will break this topic up into sections, so there will be a continuation of this post. Note that some of the links open up pdf files.

I am reading the proof of the ergodic theorem in the book Nonstandard Methods in Ramsey Theory and Combinatorial Number Theory by Mauro Di Nasso, Isaac Goldbring and Martino Lupini. Since the proof is clearly presented in the book and is freely available, I will not go into detail here. There is however one part of the proof which is not presented in the book – one assumes because it would take one too far afield. This concerns the existence of “typical elements”:

Definition 1. Let $([0,1]^{\mathbb{N}}, \mathcal{C}, T, \nu)$ be a measure-preserving dynamical system, where $\mathcal{C}$ is the Borel $\sigma$-algebra on $[0,1]^{\mathbb{N}}$, $T$ is the unilateral shift operator and $\nu$ is a $T$-invariant probability measure. An element $\alpha$ of $[0,1]^{\mathbb{N}}$ is called typical if, for any $f \in C([0,1]^{\mathbb{N}})$,

$\displaystyle \lim_{n\to \infty} \sum_{i=0}^{n-1} f(T^i \alpha ) = \int_{[0,1]^{\mathbb{N}}}f(y)d\nu.$

I have recently spent quite some time on ideas surrounding the concept of equidistribution, which is why I immediately found typical elements appealing. The idea that you can have a single element which can be used, via an ergodic average, to approximate the integral of any continuous function is perhaps not shocking, but pleasing nonetheless. My immediate reaction is to wonder what else we can say about these elements and collections of them. For instance, how many are there? How accessible are these elements in constructive terms? I have not yet explored these notions, but am eager to do so.

Existence can be proved with the ergodic theorem, but more interesting to me is that it can also be done without. The proof I will present here is from the paper by Kamae (who came up with the nonstandard proof of the ergodic theorem), who in turn states that he found it in Ville. The proof relies on little but some basic measure theory. I will stick closely to the Kamae paper, but hopefully clear up some details.

The key to the proof is to construct a sequence of periodic elements of $[0,1]^{\mathbb{N}}$ which can be used to approximate the measure $\nu$. We say that $\alpha = (\alpha_1, \alpha_2, \alpha_3, \dots) \in [0,1]^{\mathbb{N}}$ is periodic if there is some $k$ such that $\alpha_i = \alpha_{i+k}$ for all $i\in \mathbb{N}$. Given a periodic $\alpha$ with period $p$, we construct a measure on $[0,1]^{\mathbb{N}}$ by setting

$\displaystyle \mu_{\alpha} = \frac{1}{p}(\delta_{\alpha} + \delta_{T\alpha}+\dots +\delta_{T^{p-1}\alpha}),$

$\delta_{\beta}$ as usual denoting the Dirac measure at $\beta$. We want to show that we can find a sequence of periodic elements such that the associated measures converge weakly to $\nu$.

The idea of the proof is now to encode a cylinder set in $[0,1]^{\mathbb{N}}$ as a finite Cartesian product of a finite alphabet, then find a measure on that product which is very close to $\mu$. The new measure will yield a kind of “maximal sequence” in the product, which we can use to construct our periodic element. But for now I am going to skip straight to the end and show how we can use such elements to get a typical one. In a follow-up post, we will get to the construction of the periodic elements, which is the real meat of the proof.

Suppose now that we have a sequence $\alpha_1, \alpha_2, \alpha_3, \dots$ of periodic elements, $\alpha_i$ assumed to have period $c_i$, such that the associated measures $\mu_{\alpha_i}$ converge weakly to $\nu$ as $i\to \infty$. Since each of these elements is determined by a finite numbers of “bits”, we can get the full information of each one in a finite string. To get our typical element then, we might be tempted to stick the first $c_{i+1}$ bits of string $\alpha_i$ onto the end of the first $c_i$ bits of string $\alpha_i$, but this will lead to convergence problems when we look at $\int fd\nu$ again. Rather, we can think of the $\mu_{\alpha_i}$ as increasingly good approximations to $\nu$, and so would want $\alpha_i$ to play a greater role in $\alpha$ than $\alpha_j$, when $i>j$.  So we take the first $c_1 t_1$ elements of $\alpha_1$ to form the first  $c_1 t_1$ element of $\alpha$, follow them with the first $c_2 t_2$ bits of $\alpha_2$ and so on, where $t_1 < t_2 < t_3 < \cdots$. We will also require that $c_i t_i$ is sufficiently small with respect to $c_{i+1}t_{i+1}$. To be a little more formal about it, we set

$\displaystyle \alpha (n) = \alpha_m (n - T_m),$

for any $n\in \mathbb{N}$, with $T_m \leq n and $T_{m+1} = T_{m}+c_m t_m$, with $T_0 =0$.

In order to show that $\alpha$ is indeed typical, it seems reasonable to write the sum in Definition 1 some form of integral in $\mu_i$. Given that $f$ is a continuous function on a compact space, we know that the range is bounded and for convenience, positive.  What we want then is something of the form

$\displaystyle \int f d\mu_{\alpha_n} \approx \frac{1}{m}\sum f(T^{i} \alpha) \to \int f d\nu.$

We can now see why it is necessary for the $t_i$ to increase quickly. It is a useful exercise to write out the integral $\int f d \mu_n$ as a sum according to the definition of a Lebesgue integral, to see that, for large $m$

$\displaystyle \int f d \mu_m \approx \frac{1}{c_m}\sum_{i=0}^{c_m -1} f(T^i \alpha_m).$

Due to the use of the periods of the $\alpha_m$ in the construction of $\alpha$, we see that

$\displaystyle \frac{1}{t_m c_m}\sum_{i=0}^{t_m c_m -1} f(T^i \alpha) +\varepsilon,$

where $\varepsilon$ indicates the error due to the terms $f(T^i \alpha)$ for $i=0,\dots ,c_m t_m -1$. This can be made as small as we please by allowing the $t_i$ to increase rapidly, which yields the result.

Some of the fine detail has of course been left out here, but not, I think, anything that is too difficult to supply oneself. In a future post, I will discuss how we find $\alpha$ using the measure $\nu$.

# Trying to be a mathematician

The title of this blog has changed recently, as previous visitors might notice. I have been trying to decide on a direction for some time, and wanted it to be something close to the thing most important to me, namely mathematics. More than that, I want it to be about my mathematics. Not just technical exposition, although there will be some of that as well, but about my struggle to try to become a good mathematician, whatever that means.

The blog will therefore be of a somewhat mixed nature. Some of the posts will be more personal, and may not always be of interest to someone interested in something mathematical. But I suspect there are quite a few people out there in similar positions to mine, and may benefit from knowing that the struggle is a shared one. This is not limited to mathematicians, but to scientists and academics in general. Especially, this may help some of those new to the business to at least avoid some of the mistakes I have made…

In addition, there will be purely mathematical entries. I have discovered (at a late stage, it must be said) that to understand something I have to write about it. I think I have avoided doing so more because writing mathematical documents is not easy. Yet one cannot be said to understand a topic to a reasonable degree unless one can convey it in writing. Lecturing is also good, but one usually doesn’t have the leisure to lecture on everything you want to understand, and it does not leave a permanent record to consult at a later stage. Writing a blog on the subject is a good compromise, since one has to make the material presentable to an audience, who could (at best) offer insight and correction and (at worst) savage your lack of understanding.

I make no claims of originality here. Much of what I write will have been stated by others. For instance, Terry Tao has a blog post from years ago that pretty much gives the exact advice of the previous paragraph. He has a great page of career advice at https://terrytao.wordpress.com/career-advice/, which should be read. All I can offer here is offer my own take on things.

This all sound quite serious, which will hopefully not be the dominant tone of this blog. One piece of advice for budding mathematicians would be not to take yourself too seriously. I have to keep reminding myself of this. Mathematics should be fun, and if it is not, find yourself another job. I tend to fall into a deep depression and blame myself endlessly when a solution I have been working on for months turns out to have a trivial counterexample. The answer is to get away from the problem, take a walk in a park, and realise that you are not a failure as a human being because of this. Sure, not every minute of mathematics will be pleasant, but if you can’t periodically remind yourself that you are doing this because you like it, you will suffer under a great psychological burden.

Thus, if I post something which seems silly, realise that this is just me reminding myself to lighten the hell up.

# Almost everything is wrong

Depending on your level of cynicism, this video either shows you one of the great problems with scientific publication, or shows you how to be a successful scientist…

For those interested in exploring the murky waters of academic publishing in more detail, Timothy Gowers has some opinions on the subject.

# The Laws of Meetings

1. They who schedule many meetings have obscenely little of value to do. They who enjoy meetings have nothing of value to do.
2. When you feel you have to schedule a meeting, you have already failed. Go into it with this knowledge, and atone.
3. If you schedule a meeting to accomplish anything that could have been done with free project software or otherwise accomplished online, you have failed doubly.
4. Capacity for productive work falls exponentially during a meeting, but only increases logarithmically afterwards.
5. If you have called the meeting and speak more than all others attending combined, it is not a meeting.
6. It is okay to go to a meeting just for the snacks.