Fourier analysis – the real story iv: How did it start?

Dedicated to the memory of Willem L Fouché who, amongst many other stellar contributions to my life, told me to go read Dym & Mckean, and also taught me the connections between Fourier analysis and descriptive set theory.

It’s all well and good to learn advanced mathematics because it’s interesting to you or useful. Not everyone is interested in the history or origin of a field, and that is perfectly fine. For myself, I do not feel I fully understand something until I have some sense of how it originated and more importantly, why. I occasionally discuss Fourier analysis on this blog, because I think it is kind-of magical and because I feel like I don’t have a deep, intuitive understanding of it yet. Recently, I’ve been considering the question whether Fourier analysis was inevitable. Without necessarily going back through the historical record in detail, what would be my guess as to how it came about? This post should perhaps be regarded as historical fiction – an account of how things might have happened.

Since little (but not nothing) happens in a vacuum, let us start with the heat equation, which – as we’ve established – Fourier was obsessed with. Without giving initial or boundary conditions then, we’re looking at the equation

$\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}$

when we’re only considering one dimension.

How do you solve a problem in mathematics? By some combination of two things:

Solve an easier but related problem, or
breaking it up into bits you can solve.

We’ll use this combination to try to puzzle out how Fourier analysis could have come about.

The heat equation is not difficult to get a general solution for, if we leave out the initial and boundary conditions. By separating variables and setting $u(x,t) = X(x)T(t)$ , we get

$\frac{T'(t)}{\alpha T(t)} = \frac{X''(x)}{X(x)} = -\lambda$ .

Since the first two parts depend on different variables, they need to be equal to a constant. Using a minus sign in front of $\lambda$ is a bit of a cheat, anticipating the form of the solution, but it will make things easier. We can then see that the following function is a general solution:

$u(x,t) = \sin (\sqrt{\lambda} x) e^{-\alpha \lambda t}.$

Of course, this works for $\cos$ as well, and we can multiply by arbitrary constants without breaking the solution, so we have

$u(x,t) = A\sin (\sqrt{\lambda} x) e^{-\alpha \lambda t} + B\cos (\sqrt{\lambda} x) e^{-\alpha \lambda t}.$

Now, a differential equation usually isn’t much use without some initial and boundary conditions, so let’s suppose we’re looking at something (a rod, perhaps) of unit length, whose ends are kept a zero degrees:

$u(0,t) = u(1,t) = 0.$

To ensure this is satisfied, we can set $B = 0$ and replace $x$ by $2\pi x$ . This would break our solution, if it weren’t for the fact that we can compensate for it by adjusting $A$ , which we are free to do. (Some members of the audience are screaming out that I’m missing a whole bunch of solutions – don’t worry, we’ll get to it!)

It’s going well so far, but what about the initial conditions? This is where it gets interesting. But making it too interesting makes it very difficult, so let’s start with the easiest possible case. Since the initial condition is given by

$u(x,0) = f(x),$

we surely can’t get any simpler than setting $f(x) = \sin 2\pi x$ . The problem is solved! At least, as long as we can set $\lambda = 1$ , which we can do because it was, after all, arbitrary. Unfortunately, things usually don’t stay this simple, but let’s make it incrementally more difficult. What if

$f(x) = a_1 \sin 2\pi x + a_2 \sin 4\pi x?$

(We only take integer multiples in the argument of $\sin$ since we still have to satisfy the boundary conditions.) That’s absolutely no problem, either. The sum of two solutions is still a solution (because our differential equation is linear), so we can solve separately for the two terms and just add them, with the new frequency involved adjusted for in the coefficient of the term. We can do this for any number of terms. In other words, we can solve the problem for any initial condition of the form

$\Sigma_{n=1}^{N} a_n \sin 2\pi n x.$

Great! We’ve solved the problem for a whole class of functions. The question becomes: exactly how big is this class? And can we use this method to expand this class?

Fourier’s audacity was to say that, if we allow $N$ to be infinite, all functions can be written this way. Now, this isn’t exactly true, and much of Fourier analysis has focused on finding out specifically how true it is, and more modern notions of convergence are necessary to even frame the question properly (back in Fourier’s day, they played a bit fast and loose with these issues). Certainly, most “nice” functions on the unit interval can be written this way, meaning the heat equation is solved for all of them. That is a great accomplishment!

Fair enough, but this part of the story is still not obvious. Perhaps Fourier had a suspicion that functions can be expanded as sums of trigonometric functions, but how do you go about verifying that? I can only imagine that an enormous amount of work went into this. Nowadays, it is the work of a few minutes to write a script that will output the visualization of a trigonometric sum of the above type. In Fourier’s day, this all had to be done by hand. This might seem like a disadvantage, but I’m not so sure. You would need to think deeply on what you would spend your time on, and choose your problems with care. I’m not against the widespread use of computing in mathematics, but I do think we can learn something from the work habits of the old masters.

So, we can assume that Fourier (like his contemporaries) was an absolute wizard at calculus. If I had to reconstruct his thought process – albeit through a modern lens – I would imagine it went something like the following.

Fourier didn’t have function spaces and orthonormal bases to play around with – even the basic concepts of set theory were still decades away (and Fourier analysis played a crucial role in Cantor’s work, too). But he probably would have been exquisitely aware of the following integral:

$\int_0^1 \sin 2\pi nx \sin 2\pi mx dx = 0 \textrm{ when } n \neq m$

and

$\int_0^1 \sin 2\pi nx \sin 2\pi nx dx = \frac{1}{2}.$

If we now take the trigonometric series $f$ as given above, we can say that

$\int_0^1 f(x) \sin 2\pi m x dx = \Sigma_{n=1}^N \int_0^1 a_n \sin 2\pi n x \sin 2\pi m x dx.$

Using the identity above, we can conclude that

$\int_0^1 f(x) \sin 2\pi m x dx = \frac{a_m}{2}.$

In other words, if we assume that $f$ is a linear combination of “nice” $\sin$ terms, we can recover that linear combinations by “projecting” $f$ onto each term. If we assume that $N$ can be infinite, and that any function (appropriate to the boundary conditions) can be written as such an infinite linear combination, we have a way of finding each coefficient. This means that if we are able to disregard a whole mess of convergence issues, we can solve the heat equation for a very wide range of initial conditions. To make this more general, we can do something very similar with $\cos$ , which will allow us to handle other conditions.

Of course, the mathematical world did not immediately accept Fourier’s methods, with good reason. A lot of work remained to be done. Even today, the convergence of Fourier-type series is an active area of investigation. I can imagine that those who had to use the heat equation in practice welcomed this advance, though. Indeed, it was only a short while before the consequences of this stretched far beyond application to the heat equation…

Stop learning maths

I always thought I loved learning. The thought of knowing a lot excites me. So, for years I tried to come up with better ways to do it. I read many books and tried many techniques. I set schedules, made mind maps and read up on the neuroscience. Yet I never learned as much as I wanted to, and it remained difficult. Want to know topos theory? Find the best books and study the hell out of them, with all of these cool techniques I’ve learned. But nearly every book got abandoned. The same with papers. “If you want to get to the cutting edge, you have to read all the papers!” This went even worse than the books…

I blamed myself for a lack of discipline, or reasoned that perhaps this particular field isn’t for me. I felt that I wasn’t an expert in anything. But the desire to learn remained – or so I thought.

It took many decades of a futile cycle to find out the truth. I would make a commitment to some area, gather the books and papers and get to work, and it would fizzle out after a while. Some time later, I would repeat this with some other topic, with greater determination and vowing that this time I would see it through. This went on for years!

I love mathematics, so why did I not love studying it? Perhaps I was broken in some way, simply too weak to ever be good at the thing I love. Then I noticed some things about myself. Sometimes, I could spend hours, weeks or months pondering a mathematical thing, and it gave me great pleasure. It did not feel like discipline, or work. It was natural to do, and I hated anyone or anything interrupting this reverie.

Finally, the truth dawned: I do not love learning – if you consider effective learning as the rapid acquisition of information with the ability to provide an accurate representation of it at a later stage. What I do love is figuring stuff out! That’s why the books didn’t do it for me. I could read about differentiable manifolds all day long, doing all the exercises, and not be able to tell you a damn thing about them a week later. That is because I was trying to satisfy the author’s expectations (like you would a teacher’s) instead of getting it to make sense for me.

Instead of studying some mathematical topic now, I look for something that really tickles me. Something I want to understand. “That’s interesting – why does this thing do that?” I’ll try to come up with an answer based on my existing knowledge for a while, then realize it is hopelessly insufficient. Then I’ll go hunting for an explanation – scanning books, watching YouTube videos, doing Google searches (I tend to avoid AI for this, but I do use it for some other things). “Hmmm – it looks like I can’t understand what’s going on here without knowing more about manifolds…” The mistake would then be to grab a book on manifolds and try to study it. I want to know just enough to answer the question. Invariably though, that subject itself will interest me, because I would want to know why it is important in this case, and that isn’t really possible without knowing why it is important in other cases. I’ll often going into the history or applications of it. Then I would start pondering. Why were manifolds invented? Why would they come in handy? Who came up with them, and why? What kind of thing is not a manifold? How does a manifold relate to the idea of a derivative? But don’t get too deep into this rabbit hole; go back to the original question.

The reason I hadn’t done this more often is because I thought it was too slow. Taking a week to think about the point of a manifold is a week I could have spent going through an entire chapter in a book! Instead, this prejudice against pondering actually slowed me down, because I never developed a good internal representation of the concept, but plowed ahead with the technical stuff before I was ready for it. I would then get lost in the minutiae, and lose interest because I was trying to run before I could walk.

There are those that can happily plow through a book on Lie algebras and read it from start to finish. I am not one of those people. I cannot learn for the sake of learning, I need to have a compelling mystery which I have to solve, and I need to put together clues my way instead of following someone else’s teaching. This is the only way I can understand deeply, and remember well. I need to give myself the time to do this, instead of chasing results. It may not seem like it, but if you’re anything like me, prioritizing depth instead of speed will actually make you faster.

Not much help if you have a calculus exam tomorrow, though…

The importance of units and constants

Because I was young and stupid, I never used to pay much attention to constants in physics in my studies. They were just something you had to remember in order to do the calculations, or at least have stored in your calculator. And screw the units – you just needed the number.

Oh, what a sweet summer child I was. Constants and units are the building blocks of understanding physics. Much of physics is about understanding the relationships between various constants, and any new result that provides a link between previously unrelated constants is a major breakthrough. Considering units by themselves may lead to a major breakthrough, as Bohr showed in realizing that Planck’s constant had the units of angular momentum – it is as if nature was inviting us to figure out the principles of quantum mechanics. As for the relationships between constants, one of the most magical formulas of physics to me is the following consequence of Maxwell’s equations:

$c^2 = \frac{1}{\varepsilon_0 \mu_0}$ ,

where $\varepsilon_0$ is the permittivity of free space and $\mu_0$ is the magnetic permeability.

My aim here is not to come up with some amazing conclusion or derivation, but to show that contemplating constants and units might actually lead to a better understanding of the physics, or at the very least serve as a mnemonic aid.

Speaking of electromagnetism, let’s start with that as an exploration of units and constants. We use Coulomb’s law (non-vector form) for the magnitude of the force between two charges as our jumping off point:

$F = C\frac{q_1 q_2}{r^2}.$

(I’m not going to explain every part of every equation; Wikipedia has all the details.) What does this allow us to say about the Coulomb’s constant, $C$ ? Since force is measured in Newton ( $N$ ), charge in Coulomb ( $C$ ) and distance in metres ( $m$ ), we know immediately that Coulomb’s constant has units of

$\frac{Nm^2}{C^2}$ .

Now, we also know from Maxwell that $C = 1/4\pi \varepsilon_0$ , which means we can express permittivity in the units

$\frac{C^2}{Nm^2}.$

(The danger of removing units rears its ugly head here. Permittivity of a specific material is usually given as a relative permittivity, which is the ratio of the actual permittivity to that of vacuum. This is useful, but does not aid understanding.) Does this tell us anything about $\varepsilon$ ? Permittivity is supposed to measure the “polarizability” of a material: a low value means the material polarizes less easily. In order to polarize something, work has to be done, so energy is required, which means part of the potential energy due to two charges is used for the polarization. Therefore, a part of the potential energy is stored in the polarized medium, and the electric field is decreased. A high permittivity leads to a low field intensity, as is evident in a good conductor, which has almost no field inside the substance.

To create a picture of this, consider the atoms comprising our substance. Each atom consists of a positive nucleus surrounded by electrons in their orbital states. Applying an electric field will act “separately” on the protons and the electrons, by the principle of superposition, moving them apart and creating dipoles. This will partially cancel the field, and snap back when the field is removed. The discerning reader might now raise the following question: since vacuum has a non-zero value, does that mean it can be polarized, and energy stored in it? Well, yes – although it is not so easily explained with polarized molecules or atoms. However, there are substances that have even lower permittivities than the vacuum. In addition, the fact that the vacuum can do this is implied by the above relation between $c$ , $\mu_0$ and $\varepsilon_0$ . Thus, Maxwell’s equations plus the speed of light being finite must imply that there can be energy in free space, without matter to store it. One can appreciate the difficulty of this concept, and the tendency to want to describe this ability to some “aether”, as was common until the end of the nineteenth century.

But back to units. Does understanding the units help us in comprehending the meaning of $\varepsilon$ , which is material-specific? We can simplify the units of $\varepsilon$ slightly by remembering that force times distance is energy, and can write the units as

$\frac{C^2}{Jm}$

and interpret it as saying “charge squared per Joule per metre”. Since the permittivity of a substance is constant (not really – it depends on frequency), we can perhaps see it as the constant ratio between the product of charges (source of the electric field) and the energy stored through polarization, per metre. How does this help us understand anything? By considering the units of a ubiquitous constant, we have arrived at the doorstep of one of the great breakthroughs of nineteenth century physics, Maxwell’s displacement current. This was a deep insight, especially at a time when science did not have modern atomic concepts.

This isn’t rigorous, and might even be slightly incorrect. But it opens the door to thinking about the physical concepts involved, and leads to deeper insights. In other words, don’t neglect the constants, and don’t think units are superfluous!

Edit: Here’s something else to think about, which may lead to further insight. When you study propagation in cables using the telegrapher’s equations, you will find that the speed of propagation is given by

$\frac{1}{\sqrt{LC}}.$

Here, $L$ represents inductance and $C$ represents capacitance (look at the units for these as well – I’m glossing over some stuff here). Now, look again at the speed of light in terms of the permittivity and permeability of free space. This gives us a direct analogy: $\varepsilon_0$ can be seen as the capacitance and $\mu_0$ as the inductance of free space. At the very least, this is a useful mnemonic device.

Addendum: For another nice discussion on fundamental constants, see Sabine Hossenfelder’s video below.

What’s up with convolution (the second part)?

Previously, we managed to get an idea of the justification of using convolution by using probabilities, and it seems to make some kind of intuitive sense. But do the same concepts translate to Fourier analysis?

With the convolution product defined as

$(f \star g)(x) = \int f(y) g(x-y) dy,$

(where we leave the limits of integration up to the specific implementation), the key property of the convolution product we will consider is the following:

$\widehat{ (f \star g)}(n) = \hat{f} (n) \hat{g} (n).$

One possibility that immediately presents itself is that, if the Fourier coefficients of $f$ and $g$ are easy to compute, we have a quicker way to compute the Fourier coefficients of the convolution, which doesn’t involve computing some ostensibly more difficult integrals. Which requires us to answer the question: why do we want to calculate the convolution in the first place?

Let’s take some easy functions and see what happens when we take the convolution. Let

$f (t) = \sin 2\pi t \textrm{ and } g (t) = \cos 2\pi t.$

The Fourier coefficient of these functions are really easy to compute, since we have

$\sin 2\pi t = \frac{e^{2\pi i t} - e^{-2\pi i t}}{2i}$

and

$\cos 2\pi t = \frac{e^{2\pi i t} + e^{-2\pi i t}}{2},$

meaning that $\hat{f}(1) = 1/2i$ , $\hat{f} (-1) = -1/2i$ , $\hat{g} (1) = 1/2$ and $\hat{g} (-1) = 1/2$ . This implies that

$\widehat{(f \star g)} (-1) = -\frac{1}{4i}, \widehat{(f \star g)}(1) = \frac{1}{4i}.$

This allows us to immediately write down the convolution product:

$(f \star g)(t) = \frac{1}{4i} \left( e^{2\pi i t} - e^{-2\pi i t} \right) = \frac{1}{2} \sin 2\pi t.$

Clearly, our two signals modify each other. How is this significant? The best way to think of this is as an operation taking place in the frequency domain. Suppose $f$ is our “original” signal, and it is being “modified” by $g$ , yielding a signal $h = f \star g$ . The contribution of a frequency $n$ to $h$ (in other words, the $n$ -th Fourier coefficient), is the $n$ -th Fourier coefficient of $f$ multiplied by the $n$ -th coefficient of $g$ . We can see this as a way of affecting different frequencies in a specific way; something we often refer to as filtering.

We often think of the function in the time domain as the primary object, but looking at the Fourier coefficients is, in a way, far more natural when the function is considered as a signal. Your hear by activating certain hairs in the inner ear, which respond to specific frequencies. In order to generate a specific sound, a loudspeaker has to combine certain frequencies. To modify a signal then, it is therefore by far the easiest to work directly on the level of frequency. Instead of trying to justify the usual definition of convolution, we see the multiplication of Fourier coefficients as the starting point, and then try to see what must be done to the “original” functions to make this possible. So, we suppose that we have made a new function $h$ that has Fourier coefficients formed by the product of the Fourier coefficients of $f$ and $g$ , and try to find an expression for $h$ :

$h(x) = \int \hat{f} (\xi ) \hat{g} (\xi ) e^{2\pi i \xi x} d\xi.$

If you simply write out the definitions in the above, and you remember that

$\int e^{2\pi i y (\xi - \zeta)} dy = 0$ when $\xi \neq \zeta$ ,

you will get the expression for the convolution product of $f$ and $g$ . As such, the definition of convolution has to be seen as a consequence of what we would like to do to the signal, rather than the starting point.

We still have not related the definition to how convolution originated in probability, as detailed in the previous post. Unfortunately, the comparison between the two cases is not exact, because in the probabilistic case we obtain a completely new measure space after the convolution, whereas in the present case we require our objects to live in the same function space. Again, the solution is to think in frequency space: to find all ways of getting $e^{-2\pi i x \xi}$ , we need to multiply $e^{-2\pi (x-y)\xi}$ and $e^{-2\pi i y \xi}$ for all values of $y$ , which leads to our definition.

(As always, I have been rather lax with certain vitally important technicalities – such as the spaces we’re working in and the measures – such as whether we’re working with a sum or an integral. I leave this for the curious reader to sort out.)

The fractional derivative

I have recently had to study some models that use fractional derivatives, which I knew nothing of before. Turns out, these are a lot of fun, and deserve their own post. The notion was apparently already formulated by Leibniz, but there were difficulties involved that kept it from being widely used (such as being, well, bloody difficult). We’re mostly going to ignore those and dive in as if this is a normal thing for a human to do. I want to thank the excellent video at https://www.youtube.com/watch?v=A4sTAKN6yFA for introducing the topic to me this way.

The basis of the idea is simply that the derivative of the integral is the function itself, or

$\frac{d}{dx} \int_{-\infty}^x f(t) dt = f(x)$

(I’m simplifying here, so just assume that the function has all the nice properties to make this possible). Of course, we will need this to hold for higher orders as well. If we denote by $I^n f (x)$ the function obtained by integrating $f$ n times as follows:

$I^n f(x) = \int_{-\infty}^ x \int_{-\infty}^{x_{n}} \cdots \int_{-\infty}^{x_2} f(x_1) dx_1 dx_2 \dots dx_{n}$

Fortunately, iterated integrals are not a problem, thanks to the Cauchy integral formula (the other one):

$I^n f(x) = \frac{1}{(n-1)!} \int_{-\infty}^x (x-t)^{n-1} f(t) dt.$

Remembering that $(n-1)! = \Gamma (n)$ , we see there is nothing in the above expression that cannot be generalised to any positive real number (we’ll stick to these for now, lest we get lost too early). Supposing $y \in \mathbb{R}^{+}$ , we define

$I^y (x) = \frac{1}{\Gamma (y)} \int_{-\infty}^{x} (x-t)^{y-1} f(t)dt.$

There’s a bit of a problem. here regarding integrability. For instance, if you take $f(x) = x^2$ , the above does not work. Instead, we cheat a little by making the lower bound finite:

$I^y (x) = \frac{1}{\Gamma (y)} \int_{a}^{x} (x-t)^{y-1} f(t)dt.$

We lost uniqueness here, since now we also have to specify the constant $a$ used, but we gain the ability to integrate more functions this way. So, we have a fractional integral, but how do we get to the derivative? By remembering the above relation between derivative and integral, we can set

$\frac{d^{y}}{dx^{y}}f(x) = \frac{d^{\lceil y \rceil}}{dx^{\lceil y \rceil}} I^{\lceil y \rceil-y} f(x).$

If you look at this informally, you’ll see the derivatives and integrals “cancel out”, except for the integral of order $-y$ , which indicates that this is a derivative of order $y$ .

This particular approach is called the Riemann-Louisville derivative, and it’s not the only one. In fact, there are several definitions of the partial derivative using different kernels – and they’re not equivalent. For the moment though, I think this one illustrates the point well enough, so let’s suspend looking at the others. To see how this would work, let’s compute the $1\frac{1}{2}$ -th derivative of a simple function, say $f(x) = x^2$ , and see if it makes sense. Since the first derivative is $2x$ and the second is the constant 2, the $1\frac{1}{2}$ -derivative should in some sense be “between” these two functions. From the definition, we have to compute

$\frac{d^{3/2}}{dx^{3/2}}f(x) = \frac{d^{2}}{dx^{2}} I^{1/2} f(x).$

According to our definition, we now need to find

$I^{1/2}f(x) = \frac{1}{\Gamma (1/2)} \int_{a}^{x} \frac{t^2}{(x-t)^{1/2}}dt.$

$\Gamma (1/2)$ is easy enough to compute (use a substitution to turn it into a Gaussian) and is equal to $\sqrt{\pi}$ . The rest of the integral is not too difficult either – just use the substitution $y = (x-t)^{1/2}$ . This gives us

$I^{1/2}f(x) = \frac{16}{15}x^{5/2}.$

To check our integral, we do it numerically in Python with the script

import numpy as np
import matplotlib.pyplot as plt
import scipy.integrate as integrate

def fun1(t,a,x):
    return (1/np.sqrt(np.pi))*(t**2)/np.sqrt(x-t)

def int0(a,x):
    return integrate.quad(fun1,a,x,args = (a,x))[0]

def check(t):
    return (1/np.sqrt(np.pi))*16*(t**2.5)/15

xspace = np.linspace(1/100,1,1001)# - (5-1)/100,101)

valspace = []
for j in xspace:
    valspace.append(int0(0,j))

valspace0 = check(xspace)

plt.plot(xspace,valspace0, label = 'Int')
plt.plot(xspace,valspace, label = "Num")
plt.legend();

This gives us the following figure, where only one function can be seen, because the two fit snugly on top of each other.

Now that we have some confidence that we’ve evaluated the integral correctly, all we need to do is take the derivative twice to get to the desired 3/2 fractional derivative:

$\frac{d^{3/2}}{dx^{3/2}}f(x) = 4x^{1/2}.$

Still, does this make sense? There are certainly some intuitive rules that should be followed by our fractional derivative, namely that it should be somehow sandwiched between the integer derivatives, and that there should be a continuity here: we would expect the 0.75th derivative to be closer to the first derivative than the 0.5th one, for instance. Since I don’t want to do an extra bunch of integrals, let’s numerically differentiate some integrals like the above and see if we get closer to the first derivative. But since this post is getting out of hand, let’s leave that for the next one…

Interlude: What’s up with convolution? (Part the first)

You can’t even begin to do Fourier analysis without knowing about the convolution product. But this is a difficult thing to grasp intuitively. I know that I spent many hours trying to understand what’s really going on. There are, as always, really good videos on YouTube about this, which probably explain it better than I ever could, and I urge you to check them out. As for me, when I want to understand something I have to ask why this thing originated, and what it was initially used for. Something like the convolution product does not just drop out of the sky – it had to have some very definite and concrete purpose.

Fortunately, there’s a nice entry at https://math.stackexchange.com/questions/1348361/history-of-convolution that pointed me in the right way – specifically to Laplace, and the theory of probability. Let’s say you have two dice, with sides numbered from 1 to 6 as usual. What is the probability of rolling a seven? To count this we take, say, 1 and asked what should be added to make 7, which is of course 6. We do this for every other number from 2 to 6 (not taking symmetry in to account), and we add the products of the individual probabilities. In other words,

$p(1)p(6)+p(2)p(5)+\cdots +p(5)p(2)+p(6)p(1) = \sum_{n=1}^6 p(x)p(7-x).$

(In order to have this work for any number between 2 and 12, we would need to assign zero probability to negative numbers.). This at least looks like a convolution, and we can see if a generalization works. See if the same thing works if the second die is now 22-sided, with probabilities given by $p_2$ , and suppose the original dice has probability function $p_1$ . What if the probabilities are not equal (the die is weighted)?

Now we at least have a reasonably intuitive interpretation of convolution in discrete probability. Our goal is to get to the integral however, so let’s see if the same idea works for probability densities. That is, we need to explore whether we get something which computes the probability density or distribution for a sum of random variables $\chi_1$ and $\chi_2$ . We can try to make this work by analogy (which is how most mathematics is done). Suppose we have functions $p_1$ and $p_2$ , where

$F_i (y) = \mathbb{P}(\{x: \chi_i (x) \leq y \}) = \int_0^y p_i (x) dx, \quad i=1,2/$

(We’re assuming the usual Lebesgue measure here, but the same will hold for your other fancy measures – hopefully. I’m assuming the $\chi_i$ are random variables over the reals with reader’s choice of appropriate $\sigma$ -algebra.) We set

$p_1 * p_2 (x) = \int p_1 (y) p_2 (x-y) dx$

and try to see if there is an analogous interpretation to the discrete case. To get some results we can make sense of without too much effort, we pick two Gaussian distributions, different enough that they are easily distinguished. This would need to be reflected in the convolution product. Let our two probability density functions be given by

$p_i (x) = \frac{1}{\sigma_i\sqrt{2\pi} } e^{-\frac{(x-\mu_i)^2}{2\sigma_i^2}}.$

Setting $\mu_1 = 0, \sigma_1 = 0.2$ and $\mu_2 = 2, \sigma_2 = 0.25$ , we get the following graphs for the density functions:

To get the convolution of the density functions $p_1$ and $p_2$ , we need to evaluate

$p_1 * p_2 (x) = \int_{-\infty}^{\infty} p_1 (x-y) p_2 (y) dy,$

which gives us the rather nasty-looking

$\frac{1}{2\pi \sigma^2 }\int_{-\infty}^{\infty} e^{-\frac{(x-y)^2}{0.08}} e^{-\frac{(y-2)^2}{0.125}} dy.$

We could try to calculate this, but instead I’m going to be lazy and just see what it looks like, which is the real point. The Python code below won’t win any prizes, and I’m not using all the library functions I could, but I think it’s pretty clear. The only (perhaps) non-obvious bit is that you have to flip an array before applying the trapezoidal method, because it’s the wrong way round, and you’ll get a negative answer otherwise:

import matplotlib.pyplot as plt
import numpy as np

def normal_dist(x, mu, sd):
    prob_density = np.exp(-0.5*((x-mu)/sd)**2)/(np.sqrt(2*np.pi)*sd)
    return prob_density

xspace = np.linspace(-1,5,601)
yspace = np.linspace(0,3,301)
valspace = []
for t in xspace:
    xyspace = t - yspace
    vals0 = normal_dist(xyspace,0.0,0.2)
    vals1 = normal_dist(yspace,2.0,0.25)
    vals2 = np.multiply(vals0,vals1)
    yspace1 = np.flip(yspace)
    valspace.append(np.trapz(yspace1,vals2))

plt.plot(xspace,valspace)
plt.show

We get the output

Compare this with the two graphs above, and test if the idea of the convolution representing the density of the sum of random variables is still valid. (Not even mentioning that the variables have to be independent, so just be aware of that.)

All of this is excellent, but what does it have to do with Fourier analysis? We’ll get to that next.

Fourier analysis – the real story III

We’re trying to get inside Fourier’s head, so to speak, and explore the origins of his methods. To do this, we’re going to look at his derivation of the identity

$\frac{\pi}{4} = \cos x - \frac{1}{3} \cos 3x + \frac{1}{5} \cos 5x - \frac{1}{7} \cos 7x + \cdots.$

(Again, this is from Paul J. Nahin’s wonderful “Hot Molecules, Cold Electrons”.)

First, we need the indefinite integral

$\int \frac{1}{1+x^2} dx = \tan^{-1}x + C.$

(Deriving this integral is an easy exercise, but worth knowing.) Now suppose you have a right-angled triangle with the two unspecified angles being $\theta$ and, by necessity, $\pi/2 - \theta$ . Letting the hypotenuse be 1, the side adjacent to $\theta$ be $x$ and the remaining side (necessarily) being $\sqrt{1-x^2}$ , we have that

$\frac{\pi}{2} = \tan^{-1} \left( \frac{\sqrt{1-x^2}}{x}\right)+ \tan^{-1} \left( \frac{x}{\sqrt{1-x^2}}\right).$

By using the appropriate substitution, we get

$\frac{\pi}{2} = \tan^{-1} u + \tan^{-1} \frac{1}{u}.$

Now, using the “fact” that

$\frac{1}{1+u^2} = 1 - u^2 + u^4 - u^6 + \cdots$

(which you can get from, e.g., long division), we integrate to get the indefinite integral

$\int \frac{du}{1+u^2} = \tan^{-1} u + C = u - \frac{1}{3}u^3 + \frac{1}{5}u^5 - \frac{1}{7}u^7 + \cdots.$

Comparing the two expressions on the right, we see that $C = 0$ . By replacing $u$ with $1/u$ , we get

$\tan^{-1} \left( \frac{1}{u} \right) = \frac{1}{u} - \frac{1}{3} \left( \frac{1}{u} \right)^3 + \frac{1}{5} \left( \frac{1}{u} \right)^5 - \frac{1}{7} \left( \frac{1}{u} \right)^7 + \cdots.$

Combining our previous expressions yields

$\frac{\pi}{2} = \left( u + \frac{1}{u} \right) - \frac{1}{3} \left( u + \frac{1}{u} \right)^3 + \frac{1}{5}\left( u + \frac{1}{u} \right)^5 - \cdots.$

Replacing $u$ by $e^{ix}$ , we can immediately get Fourier’s identity:

$\frac{\pi}{4} = \cos x -\frac{1}{3}\cos 3x +\frac{1}{5}\cos 5x - \frac{1}{7} \cos 7x + \cdots.$

This is a remarkable identity, but is it true? It is very instructive to graph the right-hand side to see what happens. As it turns out, the identity is only kind-of true. It can be improved, and I suggest finding the mistakes in the derivation as a first step to doing so.

Fourier analysis – the real story II

One thing I didn’t know about Fourier was his obsession with heat – and not just in the mathematical sense. He literally thought that heat was some sort of panacea. So much so that it contributed to his death! See https://engines.egr.uh.edu/episode/186, for instance.

So, we have the heat equation. How do we solve it? More importantly, how did Fourier solve it? This is somewhat difficult to ascertain, in the general form, for Fourier’s book is written in a manner different from what is today accepted as a mathematical text. First of all, there are many more words and lengthy physical explanations – Fourier’s aim, after all, was not to promulgate a mathematical theory, but a real and useful way of solving the problem of heat in many bodies under many conditions. Of course, we also cannot expect a book from the early 19th century to adhere to our standards of mathematical rigour.

Fortunately, I have come across a great book which starts off with Fourier’s approach to the heat equation, namely “Hot molecules, cold electrons” by Paul J Nahin. If you haven’t read any of Nahin’s books, do yourself a favour. He’s an electrical engineer with a keen appreciation for mathematics, and he makes it a lot of fun. Over the next few posts then, since we’re exploring the origins of Fourier analysis, I thought I would present an argument by Fourier that is recounted in Nahin’s book. I’m not going to go into too much detail, because working that out is part of the fun.

What Fourier was so very good at was solving problems by representing functions as infinite series. It does not seem at first that this would make things simpler, but it does – a lot. Many (including Laplace and Lagrange) were skeptical of Fourier’s methods, and honestly, the techniques did not exist yet to fully justify the methods. Nevertheless, Fourier knew they worked, and he became a titan of mathematics because of them. Here, we will look at his proof for the identity

$\frac{\pi}{4} = \cos x - \frac{1}{3} \cos 3x + \frac{1}{5} \cos 5x - \frac{1}{7} \cos 7x + \cdots.$

Of course, the first striking thing about this identity is that the right side depends on $x$ , whilst the left does not. This immediately raises suspicion – as it should, since the identity is not actually correct. Before delving into the derivation, let us approximate the right hand side with some software to see how wrong the identity is. We can use the following in Matlab:

test = @piover4
figure
fplot(test,[-5 5])

function f = piover4(x)
    f = 0 
    for i = 1:100
        f = f + ((-1).^(i+1))*cos((2*i -1)*x)/(2*i-1);
    end
end

Clearly, the equation is not exactly correct, but it definitely works over a large range of $x$ . By the way, if we want to do this in Python we can use

import numpy as np
import matplotlib.pyplot as plt

def piover4(x):
    sum = 0
    for i in range(1,101):
    sum += (-1)*(i+1)np.cos((2i-1)x)/(2*i-1)
    return sum

xx = np.linspace(-5,5,1000)
yy = piover4(xx)
plt.plot(xx,yy)

The two different values the series converges to over most of the $x$ are probably not too concerning – those are just sign reversals in the series, and we’ll probably see what’s going on there during the derivation. What is much more interesting are the errors near the discontinuities. For the moment, we will park that discussion, although it turns out to be very important. In the next post, we will look at Fourier’s derivation of the identity, and figure out why it’s wrong, but also kind of correct.

Fourier analysis – the real story I

Instead of going on with nonstandard analysis (which I might do later in a new format), I thought I would try another series of posts, on something very near and dear to my heart – Fourier analysis. I have (partly) read many books on Fourier analysis, and even published some work in it. But I have struggled to understand the essence of it – not surprising, since it is a subject both vast and deep. At the moment, I am solving differential equations and having much fun with the process. In doing so, I realized that I build a much deeper understanding when I get my hands dirty, so to speak. This is not a great revelation, but it is a principle I have ignored for too long. To understand Fourier analysis, I shouldn’t be reading the most advanced, abstract books on the subject and looking at the theoretical research, I need to go back to the origin. Why was it formulated this way? Why were kernels introduced? Why do we use Poisson summation? Why is it important that convergence takes place in certain spaces?

In this series, I intend to go back through history, even though I do not intend this as a historical work. It is more about following the evolution of ideas, and so I do not intend it to be 100% chronological. Mathematics is a messy place, and the narrative is not always clear (just like history itself, really). Things are rarely as neat as we make them out to be in our textbooks, and I think we do our students a disservice by not exposing them to these ideas. Fourier techniques are usually presented to the student as if ex nihilo, but there is a fascinating evolution of thought to explore. One of the few books that do not try to hide this is that of Körner, and I’m sure I will be referring to it extensively as I write this.

Let’s go back to the beginning then, and see what Fourier actually did, why he did it and whether anyone had done it before. There is probably no place better to go than Fourier’s “The Analytical Theory of Heat”. Now, I’m obviously not going to read the whole thing, but looking through the table of contents we see that Section II of Chapter 3 starts with “First example of the use of trigonometric series in the theory of heat”, which seems like the kind of thing we’re after.

Perhaps the best place to start would be to discuss Fourier’s heat equation itself:

$a^2 \frac{\partial^2 U}{\partial x^2} = \frac{\partial U}{\partial t}.$

Here, $U$ is some function of distance, $x$ , and time. $t$ . We’re not going to discuss how this equation came about and are just going to accept it as it is. There’s a good (but short) post on some of the history here. I am only presenting the one-dimensional form of the equation here, but of course the higher-dimensional version can be expressed in terms of the Laplacian. Interestingly, Fourier’s solution of the heat equation was inspired by that of Laplace, which in turn was inspired by work of Poisson.

For a moment, let us think about the content of this equation. What does it mean, and why should it be applicable to heat? Since we have a first derivative in one variable, which denotes rate of change, and the second derivative in another, we know that somehow that change in time is influenced by the curvature of the function, seen as a function of space. Instead of fully exploring this idea myself, I’ll direct you to the video at https://youtu.be/b-LKPtGMdss.

If you have had any courses in differential equations, the solution is quite obvious. It is now completely accepted, and that obfuscates how much of a revolution it actually was. As explained in the post I referred to, mathematicians had certain ideas of what a function should be, and it didn’t look like Fourier’s series conformed to those ideas. One might dismiss this as foolish conservatism by the mathematicians of old, but we must never fall into the trap of thinking our ancestors were ignorant or stupid. Rather, it indicates that Fourier analysis was something truly revolutionary, with tremendous implications for the very foundations of mathematics (more on that later). Even today, simply answering whether a trigonometric series does indeed define a certain kind of function is no simple task. One of the greatest theorems of twentieth-century mathematics is an ostensibly simple question on the convergence of Fourier series of quite well-behaved functions…

Next time, we’ll look at the solutions of the equation.

Some mixed boundary conditions in FEniCS

When you first start to use a piece of software, it is good to have some really stupid examples to refer to, which expert users would probably laugh at. For the noob, like me, it is sometimes difficult to figure out how to do even the most basic things. Take the following example:

You have some kind of partial differential equation involving a function $F(z,t)$ , in one time variable $0\leq t \leq T$ and one space variable $z \in [0,L]$ . The given boundary and initial conditions are $F(0,t) = g(t)$ and $F(z,0) = 0$ . How to formulate these conditions in FEniCS?

Note that in the FEniCS implementation we are actually only considering a single variable, since the time differential will be incorporated by applying a finite difference-type calculation over specified time intervals.

First, we need to define the relevant boundaries. For the first one, $F(0,t) = g(t)$ , In FEniCS, we define a function that tests whether a point is on the boundary by

def GammaD(x, on_boundary):
    return near(x[0],0)

The use of “near” is because we are using floating point numbers, and can’t test for exact equality. Now that we can test for this boundary, we can specify the boundary condition:

g = Expression('...')
bc = DirichletBC(V, g, GammaD)

where $V$ was specified earlier in the code, as usual. In my case, I wanted $F(0,t) = \sin \omega t$ , so I set $g(x[0],t) = (1-x[0])*\sin \omega t$ . What about the initial condition $F(z,0) = 0$ ? That is also taken care of by the expression for $g$ . But does the restricted boundary condition take this initial condition into account? Since we later specify

u_n = interpolate(u_D, V)

and initially set $t=0$ in $u_D$ , it is indeed specified. If we want to add Neumann boundary conditions, we don’t need to do anything more – they are included by default.

Credit where it is due: the post at https://sites.math.rutgers.edu/~falk/math575/Boundary-conditions.html was very helpful, as was http://vefur.simula.no/~hpl/homepage/fenics-tutorial/release-1.0/webm/timedep.html.

As I need to add more and trickier conditions, I’ll update this post.

Not an ad, and I’m not getting paid to put it here. Just a thing that works, and that did wonders for my mother, when all the drugs couldn’t:

Trying to be a mathematician

Mathematics, optimisation and such