# Typical elements

I promised myself that I would make at least every second post a proper mathematical one, so here goes… In fact, I will break this topic up into sections, so there will be a continuation of this post. Note that some of the links open up pdf files.

I am reading the proof of the ergodic theorem in the book Nonstandard Methods in Ramsey Theory and Combinatorial Number Theory by Mauro Di Nasso, Isaac Goldbring and Martino Lupini. Since the proof is clearly presented in the book and is freely available, I will not go into detail here. There is however one part of the proof which is not presented in the book – one assumes because it would take one too far afield. This concerns the existence of “typical elements”:

Definition 1. Let $([0,1]^{\mathbb{N}}, \mathcal{C}, T, \nu)$ be a measure-preserving dynamical system, where $\mathcal{C}$ is the Borel $\sigma$-algebra on $[0,1]^{\mathbb{N}}$, $T$ is the unilateral shift operator and $\nu$ is a $T$-invariant probability measure. An element $\alpha$ of $[0,1]^{\mathbb{N}}$ is called typical if, for any $f \in C([0,1]^{\mathbb{N}})$,

$\displaystyle \lim_{n\to \infty} \sum_{i=0}^{n-1} f(T^i \alpha ) = \int_{[0,1]^{\mathbb{N}}}f(y)d\nu.$

I have recently spent quite some time on ideas surrounding the concept of equidistribution, which is why I immediately found typical elements appealing. The idea that you can have a single element which can be used, via an ergodic average, to approximate the integral of any continuous function is perhaps not shocking, but pleasing nonetheless. My immediate reaction is to wonder what else we can say about these elements and collections of them. For instance, how many are there? How accessible are these elements in constructive terms? I have not yet explored these notions, but am eager to do so.

Existence can be proved with the ergodic theorem, but more interesting to me is that it can also be done without. The proof I will present here is from the paper by Kamae (who came up with the nonstandard proof of the ergodic theorem), who in turn states that he found it in Ville. The proof relies on little but some basic measure theory. I will stick closely to the Kamae paper, but hopefully clear up some details.

The key to the proof is to construct a sequence of periodic elements of $[0,1]^{\mathbb{N}}$ which can be used to approximate the measure $\nu$. We say that $\alpha = (\alpha_1, \alpha_2, \alpha_3, \dots) \in [0,1]^{\mathbb{N}}$ is periodic if there is some $k$ such that $\alpha_i = \alpha_{i+k}$ for all $i\in \mathbb{N}$. Given a periodic $\alpha$ with period $p$, we construct a measure on $[0,1]^{\mathbb{N}}$ by setting

$\displaystyle \mu_{\alpha} = \frac{1}{p}(\delta_{\alpha} + \delta_{T\alpha}+\dots +\delta_{T^{p-1}\alpha}),$

$\delta_{\beta}$ as usual denoting the Dirac measure at $\beta$. We want to show that we can find a sequence of periodic elements such that the associated measures converge weakly to $\nu$.

The idea of the proof is now to encode a cylinder set in $[0,1]^{\mathbb{N}}$ as a finite Cartesian product of a finite alphabet, then find a measure on that product which is very close to $\mu$. The new measure will yield a kind of “maximal sequence” in the product, which we can use to construct our periodic element. But for now I am going to skip straight to the end and show how we can use such elements to get a typical one. In a follow-up post, we will get to the construction of the periodic elements, which is the real meat of the proof.

Suppose now that we have a sequence $\alpha_1, \alpha_2, \alpha_3, \dots$ of periodic elements, $\alpha_i$ assumed to have period $c_i$, such that the associated measures $\mu_{\alpha_i}$ converge weakly to $\nu$ as $i\to \infty$. Since each of these elements is determined by a finite numbers of “bits”, we can get the full information of each one in a finite string. To get our typical element then, we might be tempted to stick the first $c_{i+1}$ bits of string $\alpha_i$ onto the end of the first $c_i$ bits of string $\alpha_i$, but this will lead to convergence problems when we look at $\int fd\nu$ again. Rather, we can think of the $\mu_{\alpha_i}$ as increasingly good approximations to $\nu$, and so would want $\alpha_i$ to play a greater role in $\alpha$ than $\alpha_j$, when $i>j$.  So we take the first $c_1 t_1$ elements of $\alpha_1$ to form the first  $c_1 t_1$ element of $\alpha$, follow them with the first $c_2 t_2$ bits of $\alpha_2$ and so on, where $t_1 < t_2 < t_3 < \cdots$. We will also require that $c_i t_i$ is sufficiently small with respect to $c_{i+1}t_{i+1}$. To be a little more formal about it, we set

$\displaystyle \alpha (n) = \alpha_m (n - T_m),$

for any $n\in \mathbb{N}$, with $T_m \leq n and $T_{m+1} = T_{m}+c_m t_m$, with $T_0 =0$.

In order to show that $\alpha$ is indeed typical, it seems reasonable to write the sum in Definition 1 some form of integral in $\mu_i$. Given that $f$ is a continuous function on a compact space, we know that the range is bounded and for convenience, positive.  What we want then is something of the form

$\displaystyle \int f d\mu_{\alpha_n} \approx \frac{1}{m} \sum f(T^{i} \alpha) \to \int f d\nu.$

We can now see why it is necessary for the $t_i$ to increase quickly. It is a useful exercise to write out the integral $\int f d \mu_n$ as a sum according to the definition of a Lebesgue integral, to see that, for large $m$

$\displaystyle \int f d \mu_m \approx \frac{1}{c_m}\sum_{i=0}^{c_m -1} f(T^i \alpha_m).$

Due to the use of the periods of the $\alpha_m$ in the construction of $\alpha$, we see that

$\displaystyle \frac{1}{t_m c_m} \sum_{i=0}^{t_m c_m -1} f(T^i \alpha) + \varepsilon ,$

where $\varepsilon$ indicates the error due to the terms $f(T^i \alpha)$ for $i=0,\dots ,c_m t_m -1$. This can be made as small as we please by allowing the $t_i$ to increase rapidly, which yields the result.

Some of the fine detail has of course been left out here, but not, I think, anything that is too difficult to supply oneself. In a future post, I will discuss how we find $\alpha$ using the measure $\nu$.