Fourier analysis – the real story iv: How did it start?

Dedicated to the memory of Willem L Fouché who, amongst many other stellar contributions to my life, told me to go read Dym & Mckean, and also taught me the connections between Fourier analysis and descriptive set theory.

It’s all well and good to learn advanced mathematics because it’s interesting to you or useful. Not everyone is interested in the history or origin of a field, and that is perfectly fine. For myself, I do not feel I fully understand something until I have some sense of how it originated and more importantly, why. I occasionally discuss Fourier analysis on this blog, because I think it is kind-of magical and because I feel like I don’t have a deep, intuitive understanding of it yet. Recently, I’ve been considering the question whether Fourier analysis was inevitable. Without necessarily going back through the historical record in detail, what would be my guess as to how it came about? This post should perhaps be regarded as historical fiction – an account of how things might have happened.

Since little (but not nothing) happens in a vacuum, let us start with the heat equation, which – as we’ve established – Fourier was obsessed with. Without giving initial or boundary conditions then, we’re looking at the equation

\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}

when we’re only considering one dimension.

How do you solve a problem in mathematics? By some combination of two things:

  • Solve an easier but related problem, or
  • breaking it up into bits you can solve.

We’ll use this combination to try to puzzle out how Fourier analysis could have come about.

The heat equation is not difficult to get a general solution for, if we leave out the initial and boundary conditions. By separating variables and setting u(x,t) = X(x)T(t), we get

\frac{T'(t)}{\alpha T(t)} = \frac{X''(x)}{X(x)} = -\lambda.

Since the first two parts depend on different variables, they need to be equal to a constant. Using a minus sign in front of \lambda is a bit of a cheat, anticipating the form of the solution, but it will make things easier. We can then see that the following function is a general solution:

u(x,t) = \sin (\sqrt{\lambda} x) e^{-\alpha \lambda t}.

Of course, this works for \cos as well, and we can multiply by arbitrary constants without breaking the solution, so we have

u(x,t) = A\sin (\sqrt{\lambda} x) e^{-\alpha \lambda t} + B\cos (\sqrt{\lambda} x) e^{-\alpha \lambda t}.

Now, a differential equation usually isn’t much use without some initial and boundary conditions, so let’s suppose we’re looking at something (a rod, perhaps) of unit length, whose ends are kept a zero degrees:

u(0,t) = u(1,t) = 0.

To ensure this is satisfied, we can set B = 0 and replace x by 2\pi x. This would break our solution, if it weren’t for the fact that we can compensate for it by adjusting A, which we are free to do. (Some members of the audience are screaming out that I’m missing a whole bunch of solutions – don’t worry, we’ll get to it!)

It’s going well so far, but what about the initial conditions? This is where it gets interesting. But making it too interesting makes it very difficult, so let’s start with the easiest possible case. Since the initial condition is given by

u(x,0) = f(x),

we surely can’t get any simpler than setting f(x) = \sin 2\pi x. The problem is solved! At least, as long as we can set \lambda = 1, which we can do because it was, after all, arbitrary. Unfortunately, things usually don’t stay this simple, but let’s make it incrementally more difficult. What if

f(x) = a_1 \sin 2\pi x + a_2 \sin 4\pi x?

(We only take integer multiples in the argument of \sin since we still have to satisfy the boundary conditions.) That’s absolutely no problem, either. The sum of two solutions is still a solution (because our differential equation is linear), so we can solve separately for the two terms and just add them, with the new frequency involved adjusted for in the coefficient of the term. We can do this for any number of terms. In other words, we can solve the problem for any initial condition of the form

\Sigma_{n=1}^{N} a_n \sin 2\pi n x.

Great! We’ve solved the problem for a whole class of functions. The question becomes: exactly how big is this class? And can we use this method to expand this class?

Fourier’s audacity was to say that, if we allow N to be infinite, all functions can be written this way. Now, this isn’t exactly true, and much of Fourier analysis has focused on finding out specifically how true it is, and more modern notions of convergence are necessary to even frame the question properly (back in Fourier’s day, they played a bit fast and loose with these issues). Certainly, most “nice” functions on the unit interval can be written this way, meaning the heat equation is solved for all of them. That is a great accomplishment!

Fair enough, but this part of the story is still not obvious. Perhaps Fourier had a suspicion that functions can be expanded as sums of trigonometric functions, but how do you go about verifying that? I can only imagine that an enormous amount of work went into this. Nowadays, it is the work of a few minutes to write a script that will output the visualization of a trigonometric sum of the above type. In Fourier’s day, this all had to be done by hand. This might seem like a disadvantage, but I’m not so sure. You would need to think deeply on what you would spend your time on, and choose your problems with care. I’m not against the widespread use of computing in mathematics, but I do think we can learn something from the work habits of the old masters.

So, we can assume that Fourier (like his contemporaries) was an absolute wizard at calculus. If I had to reconstruct his thought process – albeit through a modern lens – I would imagine it went something like the following.

Fourier didn’t have function spaces and orthonormal bases to play around with – even the basic concepts of set theory were still decades away (and Fourier analysis played a crucial role in Cantor’s work, too). But he probably would have been exquisitely aware of the following integral:

\int_0^1 \sin 2\pi nx \sin 2\pi mx dx = 0 \textrm{ when } n \neq m

and

\int_0^1 \sin 2\pi nx \sin 2\pi nx dx = \frac{1}{2}.

If we now take the trigonometric series f as given above, we can say that

\int_0^1 f(x) \sin 2\pi m x dx = \Sigma_{n=1}^N \int_0^1 a_n \sin 2\pi n x \sin 2\pi m x dx.

Using the identity above, we can conclude that

\int_0^1 f(x) \sin 2\pi m x dx = \frac{a_m}{2}.

In other words, if we assume that f is a linear combination of “nice” \sin terms, we can recover that linear combinations by “projecting” f onto each term. If we assume that N can be infinite, and that any function (appropriate to the boundary conditions) can be written as such an infinite linear combination, we have a way of finding each coefficient. This means that if we are able to disregard a whole mess of convergence issues, we can solve the heat equation for a very wide range of initial conditions. To make this more general, we can do something very similar with \cos, which will allow us to handle other conditions.

Of course, the mathematical world did not immediately accept Fourier’s methods, with good reason. A lot of work remained to be done. Even today, the convergence of Fourier-type series is an active area of investigation. I can imagine that those who had to use the heat equation in practice welcomed this advance, though. Indeed, it was only a short while before the consequences of this stretched far beyond application to the heat equation…

What’s up with convolution (the second part)?

Previously, we managed to get an idea of the justification of using convolution by using probabilities, and it seems to make some kind of intuitive sense. But do the same concepts translate to Fourier analysis?

With the convolution product defined as

(f \star g)(x) = \int f(y) g(x-y) dy,

(where we leave the limits of integration up to the specific implementation), the key property of the convolution product we will consider is the following:

\widehat{ (f \star g)}(n) = \hat{f} (n) \hat{g} (n).

One possibility that immediately presents itself is that, if the Fourier coefficients of f and g are easy to compute, we have a quicker way to compute the Fourier coefficients of the convolution, which doesn’t involve computing some ostensibly more difficult integrals. Which requires us to answer the question: why do we want to calculate the convolution in the first place?

Let’s take some easy functions and see what happens when we take the convolution. Let

f (t) = \sin 2\pi t \textrm{ and } g (t) = \cos 2\pi t.

The Fourier coefficient of these functions are really easy to compute, since we have

\sin 2\pi  t = \frac{e^{2\pi i t} - e^{-2\pi i t}}{2i}

and

\cos 2\pi t = \frac{e^{2\pi i t} + e^{-2\pi i t}}{2},

meaning that \hat{f}(1) = 1/2i, \hat{f} (-1) = -1/2i, \hat{g} (1) = 1/2 and \hat{g} (-1) = 1/2. This implies that

\widehat{(f \star g)} (-1) = -\frac{1}{4i}, \widehat{(f \star g)}(1) = \frac{1}{4i}.

This allows us to immediately write down the convolution product:

(f \star g)(t) = \frac{1}{4i} \left( e^{2\pi i t} - e^{-2\pi i t} \right) = \frac{1}{2} \sin 2\pi t.

Clearly, our two signals modify each other. How is this significant? The best way to think of this is as an operation taking place in the frequency domain. Suppose f is our “original” signal, and it is being “modified” by g, yielding a signal h = f \star g. The contribution of a frequency n to h (in other words, the n-th Fourier coefficient), is the n-th Fourier coefficient of f multiplied by the n-th coefficient of g. We can see this as a way of affecting different frequencies in a specific way; something we often refer to as filtering.

We often think of the function in the time domain as the primary object, but looking at the Fourier coefficients is, in a way, far more natural when the function is considered as a signal. Your hear by activating certain hairs in the inner ear, which respond to specific frequencies. In order to generate a specific sound, a loudspeaker has to combine certain frequencies. To modify a signal then, it is therefore by far the easiest to work directly on the level of frequency. Instead of trying to justify the usual definition of convolution, we see the multiplication of Fourier coefficients as the starting point, and then try to see what must be done to the “original” functions to make this possible. So, we suppose that we have made a new function h that has Fourier coefficients formed by the product of the Fourier coefficients of f and g, and try to find an expression for h:

h(x) = \int \hat{f} (\xi ) \hat{g} (\xi ) e^{2\pi i \xi x} d\xi.

If you simply write out the definitions in the above, and you remember that

\int e^{2\pi i y (\xi - \zeta)} dy = 0 when \xi \neq \zeta,

you will get the expression for the convolution product of f and g. As such, the definition of convolution has to be seen as a consequence of what we would like to do to the signal, rather than the starting point.

We still have not related the definition to how convolution originated in probability, as detailed in the previous post. Unfortunately, the comparison between the two cases is not exact, because in the probabilistic case we obtain a completely new measure space after the convolution, whereas in the present case we require our objects to live in the same function space. Again, the solution is to think in frequency space: to find all ways of getting e^{-2\pi i x \xi}, we need to multiply e^{-2\pi (x-y)\xi} and e^{-2\pi i y \xi} for all values of y, which leads to our definition.

(As always, I have been rather lax with certain vitally important technicalities – such as the spaces we’re working in and the measures – such as whether we’re working with a sum or an integral. I leave this for the curious reader to sort out.)

Fourier analysis – the real story III

We’re trying to get inside Fourier’s head, so to speak, and explore the origins of his methods. To do this, we’re going to look at his derivation of the identity

\frac{\pi}{4} = \cos x - \frac{1}{3} \cos 3x + \frac{1}{5} \cos 5x - \frac{1}{7} \cos 7x + \cdots.

(Again, this is from Paul J. Nahin’s wonderful “Hot Molecules, Cold Electrons”.)

First, we need the indefinite integral

\int \frac{1}{1+x^2} dx = \tan^{-1}x + C.

(Deriving this integral is an easy exercise, but worth knowing.) Now suppose you have a right-angled triangle with the two unspecified angles being \theta and, by necessity, \pi/2 - \theta. Letting the hypotenuse be 1, the side adjacent to \theta be x and the remaining side (necessarily) being \sqrt{1-x^2}, we have that

\frac{\pi}{2} = \tan^{-1} \left( \frac{\sqrt{1-x^2}}{x}\right)+ \tan^{-1} \left( \frac{x}{\sqrt{1-x^2}}\right).

By using the appropriate substitution, we get

\frac{\pi}{2} = \tan^{-1} u + \tan^{-1} \frac{1}{u}.

Now, using the “fact” that

\frac{1}{1+u^2} = 1 - u^2 + u^4 - u^6 + \cdots

(which you can get from, e.g., long division), we integrate to get the indefinite integral

\int \frac{du}{1+u^2} = \tan^{-1} u + C = u - \frac{1}{3}u^3 + \frac{1}{5}u^5 - \frac{1}{7}u^7 + \cdots.

Comparing the two expressions on the right, we see that C = 0. By replacing u with 1/u, we get

\tan^{-1} \left( \frac{1}{u} \right) = \frac{1}{u} - \frac{1}{3} \left( \frac{1}{u} \right)^3 + \frac{1}{5} \left( \frac{1}{u} \right)^5 - \frac{1}{7} \left( \frac{1}{u} \right)^7 + \cdots.

Combining our previous expressions yields

\frac{\pi}{2} = \left( u + \frac{1}{u} \right) - \frac{1}{3} \left( u + \frac{1}{u} \right)^3 + \frac{1}{5}\left( u + \frac{1}{u} \right)^5 - \cdots.

Replacing u by e^{ix}, we can immediately get Fourier’s identity:

\frac{\pi}{4} = \cos x -\frac{1}{3}\cos 3x +\frac{1}{5}\cos 5x - \frac{1}{7} \cos 7x + \cdots.

This is a remarkable identity, but is it true? It is very instructive to graph the right-hand side to see what happens. As it turns out, the identity is only kind-of true. It can be improved, and I suggest finding the mistakes in the derivation as a first step to doing so.

Fourier analysis – the real story I

Instead of going on with nonstandard analysis (which I might do later in a new format), I thought I would try another series of posts, on something very near and dear to my heart – Fourier analysis. I have (partly) read many books on Fourier analysis, and even published some work in it. But I have struggled to understand the essence of it – not surprising, since it is a subject both vast and deep. At the moment, I am solving differential equations and having much fun with the process. In doing so, I realized that I build a much deeper understanding when I get my hands dirty, so to speak. This is not a great revelation, but it is a principle I have ignored for too long. To understand Fourier analysis, I shouldn’t be reading the most advanced, abstract books on the subject and looking at the theoretical research, I need to go back to the origin. Why was it formulated this way? Why were kernels introduced? Why do we use Poisson summation? Why is it important that convergence takes place in certain spaces?

In this series, I intend to go back through history, even though I do not intend this as a historical work. It is more about following the evolution of ideas, and so I do not intend it to be 100% chronological. Mathematics is a messy place, and the narrative is not always clear (just like history itself, really). Things are rarely as neat as we make them out to be in our textbooks, and I think we do our students a disservice by not exposing them to these ideas. Fourier techniques are usually presented to the student as if ex nihilo, but there is a fascinating evolution of thought to explore. One of the few books that do not try to hide this is that of Körner, and I’m sure I will be referring to it extensively as I write this.

Let’s go back to the beginning then, and see what Fourier actually did, why he did it and whether anyone had done it before. There is probably no place better to go than Fourier’s “The Analytical Theory of Heat”. Now, I’m obviously not going to read the whole thing, but looking through the table of contents we see that Section II of Chapter 3 starts with “First example of the use of trigonometric series in the theory of heat”, which seems like the kind of thing we’re after.

Perhaps the best place to start would be to discuss Fourier’s heat equation itself:

a^2 \frac{\partial^2 U}{\partial x^2} = \frac{\partial  U}{\partial t}.

Here, U is some function of distance, x, and time. t. We’re not going to discuss how this equation came about and are just going to accept it as it is. There’s a good (but short) post on some of the history here. I am only presenting the one-dimensional form of the equation here, but of course the higher-dimensional version can be expressed in terms of the Laplacian. Interestingly, Fourier’s solution of the heat equation was inspired by that of Laplace, which in turn was inspired by work of Poisson.

For a moment, let us think about the content of this equation. What does it mean, and why should it be applicable to heat? Since we have a first derivative in one variable, which denotes rate of change, and the second derivative in another, we know that somehow that change in time is influenced by the curvature of the function, seen as a function of space. Instead of fully exploring this idea myself, I’ll direct you to the video at https://youtu.be/b-LKPtGMdss.

If you have had any courses in differential equations, the solution is quite obvious. It is now completely accepted, and that obfuscates how much of a revolution it actually was. As explained in the post I referred to, mathematicians had certain ideas of what a function should be, and it didn’t look like Fourier’s series conformed to those ideas. One might dismiss this as foolish conservatism by the mathematicians of old, but we must never fall into the trap of thinking our ancestors were ignorant or stupid. Rather, it indicates that Fourier analysis was something truly revolutionary, with tremendous implications for the very foundations of mathematics (more on that later). Even today, simply answering whether a trigonometric series does indeed define a certain kind of function is no simple task. One of the greatest theorems of twentieth-century mathematics is an ostensibly simple question on the convergence of Fourier series of quite well-behaved functions…

Next time, we’ll look at the solutions of the equation.