Archive for the ‘Play’ Category

The Log of Zeta

November 21, 2009

Last time (too long ago, sorry), I finally got around to talking about the function \zeta(s). Today, I think I’d like to talk about \log \zeta(s).

First, though, I should mention Euler’s product formula. The goal is to re-write the infinite series \zeta(s)=\sum_n n^{-s} as a product. For each prime p, we can tease out of the series above those terms where n is p to some power. That is, we can think about the sub-series \sum_{k\geq 0} (p^k)^{-s}. This is a geometric series that converges to (1-p^{-s})^{-1}. If we take two such series and multiply them together, which terms from our starting series for \zeta(s) will we recover?

Suppose p and q are two primes, and consider the product

\displaystyle\left(1+\frac{1}{p^s}+\frac{1}{p^{2s}}+\frac{1}{p^{3s}}+\cdots\right)\cdot \left(1+ \frac{1}{q^s}+\frac{1}{q^{2s}}+\frac{1}{q^{3s}}+\cdots\right).

The terms in \sum_n n^{-s} that we obtain from this product are those where n is a product of some power (possibly the 0 power) of p and some power (again, possibly 0) of q. We could then think about another prime, say r, and it’s geometric series, and multiply it by the above, and obtain all the terms in \sum_n n^{-s} where n=p^aq^br^c for some 0\leq a,b,c.

Continuing on, and recalling that every positive integer has a unique factorization as a product of primes, we obtain the magical formula (isn’t most of Euler’s work magical?):

\displaystyle \sum_n \frac{1}{n^s}=\prod_p (1-p^{-s})^{-1},

where the product is indexed by the primes. What’s nice about this, for us, at the moment, is that logarithms work well with products and powers, but not soo well for sums. Recalling the Taylor polynomial

\ln(1-x)=-1-x-x^2-x^3-\cdots=-\sum\limits_{n=0}^{\infty} x^n

we find

\displaystyle \ln \zeta(s)=\sum_p\sum_n \frac{1}{n}p^{-ns}.

That was fun. Incidentally, it’s not much of a jump from here to show that the series of prime reciprocals diverges. I mentioned it in a post a while ago.

Let’s switch gears for a little bit. I’m going to define a function J(x) on the positive reals. If you’re following Edwards’ book with me (as I jump all over the place), this will be almost exactly his J(x), differing only at the integers (for today anyway :)). Let me start by setting J(0)=0. The function J(x) will be a step function, which I’ll define to mean piecewise linear with slope 0 on each piece. So to define J(x) I only need to tell you when to jump, and by how much. The rule is: when you get to an integer n, if n=p^k for some prime p, then you jump up by 1/k. So at 2,3,5,7,11,\ldots you jump by 1, at 4,9,25,49,\ldots you jump by 1/2, at 8, 27, \ldots you jump by 1/3, etc. Here’s a little picture of J(x), with a few values highlighted:

Slightly more precisely, we can write

\displaystyle J(x)=\sum_{p^n\leq x}\frac{1}{n}.

Now, let me see if I can convince you that there is some justification in writing

\displaystyle \ln \zeta(s)=s\int_0^{\infty} J(x)x^{-s-1}\ dx.

Let’s work with the right-hand side. Since J(x)=0 near x=0, I’ll actually start my integral at x=1. I think the way to go about it is as follows:

\begin{array}{rcl} \displaystyle s\int_1^{\infty}J(x)x^{-s-1}\ dx &=& \displaystyle s\sum_{m=1}^{\infty}\int_m^{m+1}J(x)x^{-s-1}\ dx \\ {} & = & \displaystyle \sum_{m=1}^{\infty}J(m)\left(\frac{1}{m^s}-\frac{1}{(m+1)^s}\right).\end{array}

Now suppose that J(m)=J(m+1) for some m. Then the terms corresponding to m and m+1 telescope as follows:

\begin{array}{c} \displaystyle J(m)\left(\frac{1}{m^s}-\frac{1}{(m+1)^s}\right)+J(m+1)\left(\frac{1}{(m+1)^s}-\frac{1}{(m+2)^s}\right)\\ =\displaystyle J(m)\left(\frac{1}{m^s}-\frac{1}{(m+2)^s}\right)\end{array}.

If, also, J(m)=J(m+2), this we can telescope another term into this one, and so on. So, really, the important m in this sum are those where J(x) jumps, which are the prime powers. Let m_i=p_i^{n_i} be the i-th prime power (i.e., the point where J(x) makes its i-th jump), starting with m_1=2. Then we can write

\begin{array}{rcl} \displaystyle s\int_0^{\infty}J(x)x^{-s-1}\ dx & = & \displaystyle \sum_i J(m_i)\left(\frac{1}{m_i^s}-\frac{1}{m_{i+1}^s}\right) \\ & = & \displaystyle \frac{1}{2}J(2)+\sum_{i=2}^{\infty} -J(m_{i-1})\frac{1}{m_i^s}+J(m_i)\frac{1}{m_i^s} \\ & = & \displaystyle \frac{1}{2}+\sum_{i=2}^{\infty} \frac{1}{m_i^s}\left(J(m_i)-J(m_{i-1})\right) \\ & = & \displaystyle \frac{1}{2}+\sum_{i=2}^{\infty} \frac{1}{m_i^s}\cdot \frac{1}{n_i} \\ & = & \displaystyle \sum_{p}\sum_n \frac{1}{n}p^{-ns} \\ & = & \ln \zeta(s).\end{array}

What good is writing \ln \zeta(s) this way? I guess if you know something about Fourier inversion (which I don’t) then you get to say that

\displaystyle J(x)=\frac{1}{2\pi i}\int_{a-i\infty}^{a+i\infty}\log \zeta(s)x^s\frac{ds}{s},

for a=\text{Re}(s)>1. What good is that? I think I’ll have to read some more and tell you about it tomorrow, but it’ll turn out to be useful once we have yet another description of \ln \zeta(s), in terms of the 0s of \zeta(s) (finally getting to those 0s everybody cares so much about).


Riemann’s Zeta Function

November 13, 2009

I guess it is about time to get to the zeta function side of this story, if we’re ever going to use Farey sequences to show how you could prove the Riemann hypothesis. I’ve been reading a bit of Edwards’ book, with the same title as this post, and thought I’d try to summarize the first chapter over the next few posts. I’m not sure that the content of the first chapter is hugely vital for the final goal of relating to the Farey sequences, but I wanted to try to learn some of it anyway.

I should mention, before talking about any of this, that I will not claim to know any complex analysis. The last complex analysis course I took was 5 years ago, I can’t be sure how much I paid attention at the time, and I haven’t used it since. I will be jumping over gaps of various sizes for quite a while in the upcoming posts. Perhaps sometimes I’ll mention that a gap is there. Mostly, though, what I’m after in this story is the outline, and how all of the parts fit together, as a big picture. Perhaps I’ll go back and fill in some gaps, as I understand more.

For today, let’s see if I can build up to an analytic expression of \zeta(s). Our starting point is the function


which is defined for real values of s larger than 1. The goal is to find a nice expression that is defined on more of the complex plane, but agrees with this definition on the reals larger than 1.

To get to that point, we’ll use the \Gamma function:

\Gamma(s)=\displaystyle\int_0^{\infty} e^{-x}x^{s-1}\ dx

This is an analytic function defined everywhere except at the negative integers and 0. If you are only interested in real values of s, this function is a continuous extension of the factorial function, which is only defined on the positive integers. Actually, there is a shift involved, so \Gamma(n)=(n-1)! (Edwards blames this shift on Legendre, whose reasons, he states, “are obscure”. Edwards uses \Pi(s)=\Gamma(s+1) throughout, so I might make some \pm 1 errors). In particular, s\Gamma(s)=\Gamma(s+1) when both sides make sense.

Another relation that we’ll use is that

\pi=\Gamma(s)\Gamma(1-s)\sin(\pi s)

If memory serves (from reading Artin’s lovely little book on the \Gamma function), you can obtain this by determining that the derivative of the expression on the right is 0. This tells you that \sin(\pi s)\Gamma(s)\Gamma(1-s) is a constant, and so you can pick your favorite value of s to obtain the value of that constant. Anyway, the identity I’ll use is a rearrangement of the one above, namely

\Gamma(s)\sin(\pi s)=\dfrac{\pi}{\Gamma(1-s)}.

Now, in the expression

\Gamma(s)=\displaystyle\int_0^{\infty}e^{-x}x^{s-1}\ dx,

substitute x=nu for some positive n. Messing about with that substitution for a few minutes (and then re-writing u=x by abuse of notation), you can arrive at

\dfrac{\Gamma(s)}{n^s}=\displaystyle\int_0^{\infty} e^{-nx}x^{s-1}\ dx.

That n^s in the denominator is useful for us, as that’s how it shows up in \zeta(s). In particular, we can obtain

\begin{array}{rcl}\Gamma(s)\zeta(s) &=& \displaystyle \Gamma(s)\sum_{n=1}^{\infty}\dfrac{1}{n^s}=\sum_{n=1}^{\infty} \frac{\Gamma(s)}{n^s} \\ &=& \displaystyle \sum_{n=1}^{\infty}\int_0^{\infty}e^{-nx}x^{s-1}\ dx \\ &=& \displaystyle \int_0^{\infty}x^{s-1}\sum_{n=1}^{\infty} e^{-nx}\ dx\\ &=& \displaystyle \int_0^{\infty} \dfrac{x^{s-1}}{e^x-1}\ dx\end{array}

That last transition coming about by summing the geometric series. There are probably some things analysts like to check in here, moving infinite sums and improper integrals past each other… I’ll let them.

Ok, we’re nearly there. Next up, you do some complex line integral and show that

\begin{array}{rcl} \displaystyle \int_{+\infty}^{+\infty} \dfrac{(-x)^s}{e^x-1}\ \dfrac{dx}{x} &=& \displaystyle (e^{i\pi s}-e^{-i\pi s})\int_0^{\infty}\dfrac{x^{s-1}}{e^x-1}\ dx \\ &=& 2i\sin(\pi s)\Gamma(s)\zeta(s)\end{array}.

That integral is a little weird, going “from +\infty to +\infty“. Really, we take a path that “starts at +\infty“, swings around 0, then goes back out to infinity. This is almost certainly one of the complex analysis things I should go back and learn more about. Since we are integrating (-x)^s=e^{s\log(- x)} along positive real values of x, we’re working with logs along the negative real axis. Perhaps I’ll return to this integral in a future post.

Using the identity about \sin(\pi s)\Gamma(s) from above, we obtain

\zeta(s)=\dfrac{\Gamma(1-s)}{2\pi i}\displaystyle \int_{+\infty}^{+\infty}\dfrac{(-x)^s}{e^x-1}\ \dfrac{dx}{x}.

We’re now at a point where we’ve (apparently) got an expression for \zeta(s) that is analytic in the complex plane except for a simple pole at s=1.


November 11, 2009

Yesterday I talked about the function ?(x). If you recall (or rather, if you don’t… if you do recall, you can probably skip to the next paragraph), I defined this function by lining up two infinite binary trees. One tree was obtained by taking mediants, giving us the Stern-Brocot tree, while the other was obtained by taking midpoints, giving us a tree of “dyadic” fractions (denominators are powers of two). To evaluate ?(x), you find x in the Stern-Brocot tree, and then ?(x) is the value in the corresponding position of the dyadic tree.

Instead of thinking about infinite binary trees, let us think about cutting each tree off at some level, n. The pictures from yesterday’s post are, if you ignore the vertical dots, the trees when n=2. In the dyadic tree, we will obtain all fractions m/2^p where 0<m<2^p is odd and 1\leq p\leq n+1. Said another way, we will get all of the values m/2^{n+1} where 0<m<2^{n+1} (not necessarily odd anymore). If you think about it for a few moments, you’ll notice that these points are equally spaced in the interval, each being 1/2^{n+1} away from its neighbors. This is not the case for the Stern-Brocot tree.

In the truncated Stern-Brocot tree, points are scattered through the interval (0,1). We can see that they are scattered in a way that is symmetric around 1/2, but what else can we say? Think about choosing some number, b, of buckets, call them B_0,\ldots,B_{b-1}. Let B_i collect all of the points from the truncated Stern-Brocot tree whose value lies between i/b and (i+1)/b. Of course, I’m being a little loose with my endpoints, but hopefully the idea comes across. If you choose an appropriate number of buckets, depending on the depth of your truncated tree, you can arrange that no point is an endpoint of a bucket’s interval.

Anyway, when you’re done with that, count how many points are in each bucket. This defines a function from the interval [0,1] to non-negative integers, where the value at all points in B_i is the count of the number of Stern-Brocot points in the bucket B_i. Here’s an example graph, obtained by taking the depth 10 Stern-Brocot tree, and putting points into 59 buckets (unless I coded something up wrong :)):

You could next scale this graph vertically by a constant so that the total area of all of the rectangles is 1. That is, you could make the graph that of a probability distribution. Of course, the distribution you get will depend on not ony the depth of the tree, but also how many buckets you choose. Since I’m only vaguely sure what I’m talking about, let me not say much more about that.

A less graphical way to think about all of this is that, for each n, I define two random variables X_n and Y_n. The first, X_n picks a value from all of the nodes in the Stern-Brocot tree of depth n, assuming each is equally likely. The second is basically the same, using the dyadic tree instead. I’m not particularly well-versed in probability, but I guess you are allowed to take the limit of these probability distributions, let’s say X=\lim X_n, and similarly Y=\lim Y_n. This defines two probability functions on the interval [0,1] (well, I suppose technically it probably defines a probability function on the rationals there…).

How are the two distributions related? Well, basically the same way as the trees are related by ?(x). In particular, we have

P(a\leq X\leq b)=P(?(a)\leq Y\leq ?(b))

Since Y is just the uniform distribution (the dyadic points are equally distributed, recall), the probability on the right is just ?(b)-?(a). Let \mu be the graph of the probability function for X, so that the probability on the left is the integral \int_a^b \mu\ dx. We have, then

\displaystyle \int_a^b \mu(x)\ dx=?(b)-?(a)

This looks so much like the fundamental theorem of calculus that we want to say that ?'(x)=\mu(x). You could also read it as saying that ?(x) is the cumulative distribution function of \mu(x). Thinking back to my buckets from earlier, this means that if I define the value over the interval covered by B_i to be the sum of the counts of all the values in all the buckets up to B_i, then I should get a graph that looks like ?(x). You be the judge:

Of course, we probably aren’t exactly allowed to say ?'(x)=\mu(x). Consider the rational s/t, and suppose that ?'(s/t) exists, i.e., that

\displaystyle\lim_{x\to s/t}\dfrac{?(x)-?(s/t)}{x-s/t}

exists. If this two sided limit exists, then we can calculate it by coming at it from either side. Let’s think about x\to s/t^+, that is, from the right. But let’s simplify some more, still. If the one-sided limit exists, then we can calculate it by calculating the limit of the difference quotient for any sequence of values x converging to s/t. The sequence of values, on the right of s/t, that I want to consider are those neighbors, on the right, of s/t, in the Stern-Brocot sequences.

Let’s say that the first neighbor is the right child of s/t in the Stern-Brocot tree, and call it p/q. Then all of the successive neighbors are the successive left children of p/q. These points are all of the points (ms+p)/(mt+q), for positive values m. If we let m\to \infty, these weighted mediants converge to s/t.

So let’s consider the following:


Let’s start simplifying in the denominator. That fraction is (pt-sq)/(t(mt+q)), but since p/q and s/t are neighbors in some Farey sequence, this difference is 1. So the denominator is 1/(t(mt+q)).

In the numerator, we are evaluating ?(x) at weighted mediants of s/t and p/q. The corresponding ?(x) values are then weighted averages. Playing with the formulas a little bit, you can determine that


Combining all of the formulas so far, and simplifying, it turns out that the limit we are after is just

\displaystyle\lim_{m\to\infty}\left(?(p/q)-?(s/t)\right)\cdot t\cdot (mt+q)\cdot 2^{-m}

Of course, that power of two wins the race to infinity, and so this limit is 0.

We have shown then, that if ?'(p/q) exists, then it is 0. Since ?(x) isn’t a constant graph, we better have ?'(x) non-zero on the irrationals. Perhaps I’ll leave that to you.

I should, of course, mention that I learned about this reading the paper “On the Minkowski measure” by Linas Vepstas, available online here (along with a handful of other papers that look fun). Hopefully I haven’t mangled anything too badly.


November 10, 2009

Sorry about the lack of a MaBloWriMo post yesterday. I got a little behind, left resources I wanted at the office, and did some other things.

If you remember, my original goal in this process was to relate the Farey sequences to the Riemann hypothesis. I have some vague understanding that the relationship is based on the distribution of the terms in the Farey sequence, as points in the unit interval. It’s getting on toward time to talk about the Riemann hypothesis side of this picture, so I can start relating the two. But before I do, there’s one more diversion (hopefully it isn’t too far off base) I’d like to make related to Farey sequences. I’ve still got some learning to do about it, but let’s see what I can get down today, laying some foundation for the next day or two, perhaps.

Recall that the way to construct one Farey sequence, F_{n+1}, from the previous one, F_n, is to throw in all of the mediants of successive terms whose denominators aren’t too big. What if we didn’t bother with that last part, and just left in all the mediants? All the numbers we’ll write down are still reduced, and we’ll obtain all of the fractions in [0,1] this way, because they all show up in some Farey sequence, and we’re getting all the terms in all the Farey sequences, just sometimes a little earlier than Farey would.

Just to be clear, our first few sequences now look like this:

0/1\quad 1/1

0/1\quad 1/2\quad 1/1

0/1\quad 1/3\quad 1/2\quad 2/3\quad 1/1

0/1\quad 1/4\quad 1/3\quad 2/5\quad 1/2\quad 3/5\quad 2/3\quad 3/4\quad 1/1

and so on. We see that the n-th sequence in this list contains all of F_n, as well as some other numbers with bigger denominators. A nice property of these sequences, which isn’t shared with the Farey sequences, is that every other term in the n-th sequence is a new term that wasn’t in the n-1-st sequence. So when I go to make the next sequence in this process, I’m always taking the mediant of a “new” number and an “old” number. Moreover, each “new” number will be involved in two mediants… one taken “to the left” (with the “old” number on the left, making a smaller value) and one taken “to the right”.

This language let’s me write down a binary tree (every node has two branches down) as follows: Begin with 1/2 as the root (at the top, I’m throwing out 0/1 and 1/1). This is a new fraction in the second sequence above, so to make the third sequence, I added in the “left mediant” (1/3) and “right mediant” (2/3) from 1/2. To make the sequence after that, 1/3 gives a “left mediant” (1/4) and a “right mediant” (2/5), and similarly 2/3 spawns 3/5 and 3/4. So I build up a tree that looks like this:

\begin{picture}(200,80) \put(85,65){1/2} \put(45,45){1/3}\put(125,45){2/3} \put(25,25){1/4} \put(65,25){2/5} \put(105,25){3/5} \put(145,25){3/4} \put(5,5){\vdots} \put(45,5){\vdots} \put(85,5){\vdots} \put(125,5){\vdots} \put(165,5){\vdots} \put(55,55){\line(2,1){20}} \put(35,35){\line(1,1){10}} \put(125,55){\line(-2,1){20}} \put(65,35){\line(-1,1){10}} \put(115,35){\line(1,1){10}} \put(145,35){\line(-1,1){10}} \end{picture}

I’ll call this the Stern-Brocot tree, though the actual definition my be slightly different.

Another way I could build up a sequence of sequences, and generate another binary tree, is to use midpoints, instead of mediants. If my first sequence is still 0/1 and 1/1, then taking midpoints will always give me denominators that are powers of 2. My tree will now look like:

\begin{picture}(200,80) \put(85,65){1/2} \put(45,45){1/4}\put(125,45){3/4} \put(25,25){1/8} \put(65,25){3/8} \put(105,25){5/8} \put(145,25){7/8} \put(5,5){\vdots} \put(45,5){\vdots} \put(85,5){\vdots} \put(125,5){\vdots} \put(165,5){\vdots} \put(55,55){\line(2,1){20}} \put(35,35){\line(1,1){10}} \put(125,55){\line(-2,1){20}} \put(65,35){\line(-1,1){10}} \put(115,35){\line(1,1){10}} \put(145,35){\line(-1,1){10}} \end{picture}

I could now line up these trees, and determine a function that takes one rational number, finds it’s position in the first tree, and spits out the value in the same position from the second tree. For example, using our pictures above, we see that this function will take 3/5 to 5/8. This function is known as Minkowski’s question mark function, ?(x), and Wikipedia has a graph, so you should click that last link. I’ve only defined it on rationals, but you wave the “continuity” wand to extend my definition to a function on [0,1].

So, to evaluate ?(x) for some rational, you have to find that rational in the Stern-Brocot tree. How do you do that? You know that you don’t have to go further down in the tree than your denominator, but that’s still a lot of fractions to look at, potentially. Perhaps there’s a better way?

I guess one way is to play the over-under game. Start at 1/2. If that’s your fraction, you are done. Otherwise, if your fraction is bigger, move right, and if your fraction is smaller, move left. Repeat until you get to your fraction.

But there’s more math we can say if we think about this process slightly differently.

The point of the tree being a tree is that there is only one way to get from the root, 1/2, to any fraction. Working your way down, you make a sequence of moves, either left (L) or right (R). So you generate some string – LLRLRRRL, for example. Of course, that notation gets out of hand quickly, so we might also use exponents to write this example as L^2RLR^3L. To such a string we can associate a sequence of fractions by writing down whatever fraction we are at when we change direction. Let us write this sequence as p'_1/q'_1,p'_2/q'_2,\ldots, and call these “turning points”. I’ve used primes here, because the numbers I’m actually interested in are one before I make a turn. Let me call that sequence p_1/q_1,p_2/q_2,\ldots, calling these values parents (so p_k/q_k has a line directly down to p'_k/q'_k in the Stern-Brocot tree, connecting the two). So, for example, the sequence L^3R^2L^2 has turning points p'_1/q'_1=1/5, p'_2/q'_2=3/13, and finally p'_3/q'_3=7/31 and associated parents p_1/q_1=1/4, p_2/q_2=2/9, and 5/22.

What is the relationship among successive turning points and parents? Suppose you are at the turning point p'_k/q'_k, with parent p_k/q_k, looking to get to the next parent or turning point, and are going to do this with a sequence of a moves (left or right, it doesn’t actually matter). The next parent will be the (1,a-1) weighted mediant of the turning point p'_k/q'_k with its parent p_k/q_k. That is, p_{k+1}=p'_k+(a-1)p_k, and q_{k+1}=q'_k+(a-1)q_k. Similarly, the next turning point will be the (1,a) weighted mediant of p'_k/q'_k with p_k/q_k. This also means that it is the (1,1) weighted mediant of p_k/q_k with p_{k+1}/q_{k+1}.

Combining all of these relationships, messing with the algebra just slightly, you can determine that p_{k+1}=ap_k+p_{k-1} and q_{k+1}=aq_k+q_{k-1}. You may recall that these are exactly the relations between the convergents of a continued fraction (if you start putting subscripts on your a, writing it as a_{k+1}).

So, given a continued fraction [0;a_1,\ldots,a_n] for a rational x in (0,1), begin at 1/1 (which isn’t in the tree, I know) and move left a_1 times (the first move getting you to 1/2), then move a_2 right, and so on. In your final step, you only move a_n-1 (if you wonder about the “-1”, you might think that the n-th convergent is the fraction you are after, and that is the n-th parent, not the n-th turning point), and you’ll arrive at x. On your way, all of the fractions that you hit, one before a turning point, will be the convergents to x.

That only gets us half-way into evaluating ?(x), for this rational x you’ve picked. You first find its continued fraction and then, by the process above, you’ll see how to get down to x from the top of your tree. That means ?(x) will be the value in our second tree (taking averages, the tree with powers of two in the denominator) that follows the same path. The initial a_1 lefts, from 1/1, will get us to a value of 1/2^{a_1}. Then we are supposed to take a_2 rights. Taking 1 right will get us to 1/2^{a_1}+1/2^{a_1+1}. Another right will then be 1/2^{a_1}+1/2^{a_1+1}+1/2^{a_1+2}, which I’ll write 1/2^{a_1}+3/2^{a_1+2}. Yet another right gets us to 1/2^{a_1}+7/(2^{a_1+3}), and so on, until 1/2^{a_1}+(2^{a_2}-1)/2^{a_1+a_2}. After that, we move left a_3 times. The analysis of where we end up is basically the same, you just have to remember that you are subtracting 1/2^{a_1+a_2+i}s, instead of adding them.

Putting this all together, we obtain (remembering that in the last stage we only went a_n-1 moves.

?([0;a_1,\ldots,a_n])=\dfrac{1}{2^{a_1}}+\dfrac{2^{a_2}-1}{2^{a_1+a_2}}-\dfrac{2^{a_3}-1}{2^{a_1+a_2+a_3}}+\cdots\pm \dfrac{2^{a_n-1}-1}{2^{a_1+\cdots+a_n-1}}

Just as a warning, I may be off slightly in some of my formulas. I haven’t written down anything knowing that it was false, but I’m not convinced I obtain the correct formula at the end of the day (it looks slightly different from the formula at Wikipedia, for instance). If anybody sees an error, please let me know.

That’s probably plenty for today. Up next (I think), I want to take the derivative (even though I can’t) of ?(x), because Wikipedia thinks I’ll get something related to the density of the Farey sequences.

By the way, a paper I was looking at to figure out about these things was “The Minkowski Question Mark, GL(2,\mathbb{Z}) and the Modular Group (Expository)” by Linas Vepstas. The printed copy I have, from some time summer 2009, is incomplete, but it looks to be available online.

The Only LFTs You’ll Ever Need

November 8, 2009

Let us consider linear fractional transformations \frac{az+b}{cz+d} where the coefficients are all integers and ad-bc=1. Since multiplying all of the coefficients by a constant doesn’t change the transformation, we may multiply everything by -1 whenever we want (for instance, to make a chosen coefficient positive). Recall, also, that writing the transformation as a matrix, \left(\begin{smallmatrix}a&b\\ c&d\end{smallmatrix}\right) is not an entirely unreasonable thing to do, as composites of transformations are correctly determined by matrix multiplication. I’ll be talking about sequence of these matrices, so let my original matrix be \left(\begin{smallmatrix}a_0&b_0\\ c_0&d_0\end{smallmatrix}\right)

If c_0=0, then the requirement a_0d_0-b_0c_0=1 means that a_0=d_0=\pm 1, and we may choose a_0=d_0=1. Then our transformation is simply \left(\begin{smallmatrix}1 & b_0 \\ 0 & 1\end{smallmatrix}\right). Since b_0 is an integer, we could write this as \left(\begin{smallmatrix}1 & 1\\ 0 & 1\end{smallmatrix}\right)^{b_0}.

If c_0\neq 0, then there is a q_0\in \mathbb{Z} so that 0\leq a_0-q_0c_0<c_0. I’ll leave you to check the following:

L_0 = \begin{pmatrix}1 & q_0\\ 0 & 1\end{pmatrix}\begin{pmatrix}0 & 1\\ 1 & 0\end{pmatrix}\begin{pmatrix}c_0 & d_0 \\ a_0-q_0c_0 & b_0-q_0d_0\end{pmatrix}

Writing this right-most matrix as L_1=\left(\begin{smallmatrix}a_1 & b_1\\ c_1 & d_1\end{smallmatrix}\right), we are looking at another matrix with integer coefficients whose determinant is 1. So we may repeat the process, finding integers q_k and LFTs L_k with L_k=\left(\begin{smallmatrix}1&q_k\\ 0&1\end{smallmatrix}\right)\left(\begin{smallmatrix}0&1\\ 1&0\end{smallmatrix}\right)L_{k+1}. The process must stop eventually, because c_0>c_1>c_2>\cdots are all non-negative integers. Eventually, one of them will be 0, and we already dealt with transformations like that in the previous paragraph.

Let T=\left(\begin{smallmatrix}1&1\\ 0&1\end{smallmatrix}\right), and note that T^b=\left(\begin{smallmatrix}1&b\\ 0&1\end{smallmatrix}\right), which we’ve used before. Also, let \overline{S}=\left(\begin{smallmatrix}0&1\\1&0\end{smallmatrix}\right) (you might note that \overline{S}^2 is the identity matrix). Then our relation above is L_k=T^{q_k}\overline{S}L_{k+1}. That is,

\begin{array}{rcl}\begin{pmatrix}a&b\\ c&d\end{pmatrix}=L_0 &=& T^{q_0}\overline{S}L_1 \\ &=&T^{q_0}\overline{S}T^{q_1}\overline{S}L_2\\ &=&\cdots=T^{q_0}\overline{S}T^{q_1}\overline{S}\cdots T^{q_k}.\end{array}

The collection of matrices that we have been working with, 2×2 matrices with integer coefficients whose determinant is 1, tend to get called the modular group and we have just shown that T and \overline{S} can be multiplied together to give you any element of the group. More officially, they “generate” the group. You might fuss and say that technically these matrices are really the special linear group SL_2(\mathbb{Z}), and note that the modular group is really the quotient of this group where you say two matrices are the same if one is -1 times the other.

One interesting thing about this process is that it is really just the process you go through to find the greatest common divisor of a_0 and c_0. The standard algorithm (Euclid’s) is to note that, if c_0\neq 0, there is an integer q_0 such that a_0=q_0c_0+c_1 where 0\leq c_1<c_0. The next step is to repeat the process, writing c_0=q_1c_1+c_2, with 0\leq c_2<c_1. Iterating, you find q_k so that c_{k-1}=q_kc_k+c_{k+1}, and eventually a c_n is 0, and the process stops.

You might be a little worried (justifiably) that this process only seems to look at a_0 and c_0, leaving b_0 and d_0 out of the picture. If you look at what we did with the matrices, our process took \left(\begin{smallmatrix}a_0&b_0\\ c_0&d_0\end{smallmatrix}\right) and wrote it as T^{q_0}\overline{S}\left(\begin{smallmatrix}a_1 & b_1\\ c_1 & d_1\end{smallmatrix}\right), and the b_1 and d_1 depended on b_0 and d_0. In the final step of the process, where the c-coefficient is 0, we ended up with a d-coefficient of 1, but the b-coefficient will be a function of the values b_0 and d_0, so those values do show up and get used, eventually. Of course, since a_0d_0-b_0c_0=1, knowing one of b_0 or d_0 is enough to determine the other (assuming you know a_0 and c_0). You should be thinking that the b_i and d_i are the extra variables that run around in Euclid’s alrgorithm when you do the “extended” version.

We’ve been talking about all of this in terms of matrices, but remember that the matrices represent linear fractional transformations. The matrix T is the transformation z+1, and the power T^b is then z+b. These are just horizontal translations. The matrix \overline{S} is the transformation 1/z. With our relations among the L_i above, we see that we could write

\begin{array}{rcl}\dfrac{a_0z+b_0}{c_0z+d_0} &=& T^{q_0}\overline{S}L_1 = q_0 + \frac{1}{L_1} \\ &=& q_0+\frac{1}{T^{q_1}\overline{S}L_2}=q_0+\cfrac{1}{q_1+\cfrac{1}{L_2}} \\ &=& q_0+\cfrac{1}{q_1+\cfrac{1}{\ddots+\cfrac{1}{q_k+z}}}\end{array}

Hurray for continued fractions! Of course, I feel like I set something up backwards. Plugging in z=0 will give me a continued fraction for b_0/d_0, but I was saying that the q_k come from thinking about the greatest common divisor of a_0 and c_0. Ah well.

[Update 20091203: The error I’m concerned about above stems from some loose dealings with the end of the iterative procedure. Letting L_i=\left(\begin{smallmatrix}a_i & b_i\\ c_i & d_i\end{smallmatrix}\right), we will get to something like L_n=T^{q_n}\overline{S}L_{n+1} where L_{n+1}=\left(\begin{smallmatrix}1 & b_{n+1} \\ 0 & 1\end{smallmatrix}\right), i.e. L_{n+1}=T^{b_{n+1}}. So we end up saying L_0=T^{q_0}\overline{S}T^{q_1}\overline{S}\cdots T^{q_n}\overline{S}T^{b_{n+1}}. And that b_{n+1} does depend on the initial b_0, presumably in a manner such that plugging in z=0 makes the fraction make sense.]

What I’ve said is that every 2×2 matrix with coefficients in \mathbb{Z} and determinant 1 can be written in terms of T and \overline{S}. However, you might want to use S=\left(\begin{smallmatrix}0&-1\\ 1&0\end{smallmatrix}\right)=-1/z instead of \overline{S}. Going through essentially the same process as before, with some minus signs sprinkled in appropriately, one can determine that T and S can be used, instead of T and \overline{S}, to generate any of the matrices we are considering. What is the advantage of S over \overline{S}? One way to state the advantage, for our story, is that applying S to points in the upper half plane leaves them in the upper half plane (and similaly for the lower half), whereas \overline{S} flips points in upper half plane to the lower half plane. We should think of this an advantage, because all of our Ford circles lie in the upper half plane. If you go back to yesterday’s discussion, you’ll note that I used S in the factorization.

So, anyway, T and S are the only LFTs you need. You can quickly check that S^2=1 and also (not quite as quickly) that (TS)^3=1. If you write U=TS, then I guess you get to say that the modular group is generated by S and U, and can establish an isomorphism between the modular group and the free product of the cyclic group of order 2 with the cyclic group of order 3. That’s something.

LFTs and Ford Circles

November 7, 2009

Given 4 complex numbers, a,b,c,d, we may consider the linear fractional transformation (LFT)


Well, 4 numbers are enough to make a matrix, \left(\begin{smallmatrix} a & b\\c & d\end{smallmatrix}\right). Is there any better reason to relate the linear fractional transformation with this matrix?

Suppose you have two matrices, \left(\begin{smallmatrix}a&b \\ c&d\end{smallmatrix}\right) and \left(\begin{smallmatrix}a'&b'\\c'&d'\end{smallmatrix}\right). Then the product is as follows:

\begin{pmatrix}a&b\\c&d\end{pmatrix}\begin{pmatrix}a'&b'\\c'&d'\end{pmatrix}=\begin{pmatrix}aa'+bc' & ab'+bd' \\ ca'+dc' & cb'+dd'\end{pmatrix}.

If you take the composite of the two linear fractional transformations, i.e.,

\dfrac{a\cdot \frac{a'z+b'}{c'z+d'}+b}{c\cdot \frac{a'z+b'}{c'z+d'}+d}

and then play around simplifying that expression for a few minutes, you obtain the LFT


which is precisely the LFT corresponding to the product matrix above. So, if nothing else, writing LFTs as matrices this way won’t lead us astray when thinking about composites.

This idea is not without its confusion, for me anyway. Generally when you think about a 2×2 matrix of complex values, you are thinking about that matrix as a linear map \mathbb{C}^2\to\mathbb{C}^2, which is not what we are doing above. Instead, I guess we are saying that the “monoid” (group without inverses) of 2×2 matrices, M_2(\mathbb{C}), acts (in the technical sense) on \mathbb{C} as linear fractional transformations. My guess is that there are even better ways to say what is going on.

I think it is also important to keep in mind that two different matrices may correspond to the same LFT. For example, \left(\begin{smallmatrix}1&2\\ 0&2\end{smallmatrix}\right) represents the same LFT as \left(\begin{smallmatrix} 1/2 & 1\\ 0 & 1\end{smallmatrix}\right). More generally, if \lambda is any complex value (nonzero), then \left(\begin{smallmatrix}a&b\\c&d\end{smallmatrix}\right) represents the same LFT as \left(\begin{smallmatrix}\lambda a&\lambda b\\ \lambda c & \lambda d\end{smallmatrix}\right). I guess one can think of M_2(\mathbb{C}) as a \mathbb{C}-vector space (isomorphic to \mathbb{C}^4), and then think of its projective space (the quotient where two “vectors” (matrices here) are the same when they differ by a scalar (complex) multiple), which I’ll denote P(M_2(\mathbb{C})). Then I think I’m saying that the action of M_2(\mathbb{C}) on \mathbb{C} actually is an action of the quotient, P(M_2(\mathbb{C})). I’m not sure if this is a useful viewpoint (or, indeed, correct).

Yesterday, when I was talking about how to picture what an LFT does to \mathbb{C}, I wrote down a factorization of the LFT as a composite. Our new notation gives us another way to write that factorization (recall \alpha=-(ad-bc)/c^2, and that we had assumed c\neq 0):

\begin{pmatrix}a&b\\c&d\end{pmatrix}=\begin{pmatrix}\alpha & a/c\\ 0&1\end{pmatrix}\begin{pmatrix}0&1\\ 1&0\end{pmatrix}\begin{pmatrix}1&d/c\\ 0&1\end{pmatrix}.

As is frequently useful, we will assume that ad-bc\neq 0 (indeed, this factorization seems to require it – I think I’m missing something somewhere, anybody see it?). Notice that ad-bc is the determinant of the matrix representing our LFT. We may then multiply all entries in our matrix (without changing the LFT, as discussed above) by 1/(ad-bc), and obtain a matrix with determinant 1. Let’s do that, making \alpha=-1/c^2.

Yesterday, when I was working on the factorization above, I only had something like \epsilon idea where I was going. I think today I’ve got about twice that, so I want to re-write the factorization. Let me write it as

\begin{pmatrix}1 & a/c\\ 0 & 1\end{pmatrix}\begin{pmatrix}1&0 \\ 0& c^2\end{pmatrix}\begin{pmatrix}0 & -1\\ 1 & 0\end{pmatrix}\begin{pmatrix}1 & d/c\\ 0 & 1\end{pmatrix}.

So what’s the connection with Ford circles? Recall that for a reduced fraction h/k, the associated Ford circle is the circle centered at (h/k,1/(2k^2)) with radius 1/(2k^2). Following Rademacher (and, presumably, others), let us say that the “fraction” 1/0 also gets a Ford “circle”, the line y=1 in the plane. This isn’t such a nasty thing to do, as it has the tangency properties I talked about when talking about Ford circles. Anyway, let us think about applying our transformation \left(\begin{smallmatrix}a&b\\c&d\end{smallmatrix}\right), as the composite given above, and see what happens to this line y=1. We’ll assume that a,b,c, and d are all integers.

The first step, \left(\begin{smallmatrix}1&d/c\\ 0 &1\end{smallmatrix}\right) is the linear translation z+d/c. Since d/c is real (since c,d are integers), this translation is a horizontal shift, which leaves y=1 unchanged.

Next up, \left(\begin{smallmatrix}0&-1\\1&0\end{smallmatrix}\right), which is -1/z. Thinking of a point on the line y=1, you can quickly determine that its polar coordinates are (\csc \theta,\theta). The transformation -1/z is the composite of: (1) inversion with respect to the unit circle (the point becomes (\sin \theta,\theta)), (2) reflection across the horizontal axis (giving (\sin \theta,-\theta)), and finally (3) multiplication by -1 (giving (-\sin \theta,-\theta), since these are polar coordinates). This final point is (\sin \theta,\pi-\theta)=(\sin \pi-\theta,\pi-\theta). As \theta varies through (0,\pi), \pi-\theta also varies through this interval, and so we get the graph of the polar curve r=\sin \theta. If you know your polar curves, you know what this looks like…

So, the first two transformations take the line y=1 to the circle with center (0,1/2) and radius 1/2. The next in our composite is multiplication by 1/c^2, which is just a scaling (since c\in \mathbb{R}). This scaling takes our circle to the circle with center (0,1/(2c^2)) and radius 1/(2c^2). Finally, the last transformation is another horizontal translation, leaving our circle centered at (a/c,1/(2c^2)). We recognize this as the Ford circle for the fraction a/c (as long as that fraction is reduced).

Wasn’t that fun? If you want to think about it some more, you might convince yourself that any point above the line y=1 will get moved to a point inside the Farey circle resulting from this process.

Anyway, enough out of me. Hopefully tomorrow I’ll have slightly more of an idea what I’m talking about. Don’t count on it though.

Linear Fractional Transformations

November 6, 2009

a.k.a. Möbius Transformations, are a type of function. I’ll talk about them as functions from the complex plane to itself. Such functions are given by a formula


where a,b,c,d are complex values. If c and d are both 0, this isn’t much of a function, so we’ll assume at least one isn’t 0.

I’d like to talk about what these functions do, how to have some hope of picturing them as transformations \mathbb{C}\to\mathbb{C}. To do this, let’s consider some easy cases first.

If c=0 (and so by assumption d\neq 0), then we may write the function \frac{a}{d}z+\frac{b}{d}, or simply a'z+b' for some complex values a',b'. This is now a linear (some might say affine) transformation of the complex plane. Think about it as the composite z\mapsto a'z\mapsto a'z+b, where the first map multiplies by a', and the second adds b'. Multiplying by a complex value a' is the same as scaling by the real value |a'| (the “norm” of a', distance from a' to the origin) and then rotating by the “argument” of a'. If you think about a' as a point (r,\theta) in polar coordinates, then the argument of a' is \theta (or so), and so multiplication by a' is multiplication by the real value r (which is just a stretching (or shrinking) of the complex plane away from (toward) the origin if r>1 (resp. 0\leq r<1)) and then rotation by the angle \theta. The second transformation in the composite, “add b'“, just shifts the entire plane (as a “rigid transformation”) in the direction of b'.

So the case when c=0 is just a linear transformation, which aren’t too difficult to picture. Another important case is 1/z, so the coefficients are a=0,b=1,c=1,d=0. To talk about what this does, let’s first talk about “inversion” with respect to a fixed circle.

Let C be a circle with radius r, in the plane, and z any point in the plane. Let O denote the center of the circle and d the distance from O to z. The inversion of z, with respect to C, is the point on the line through O and z (in the direction of z from O) whose distance from O is d'=r^2/d. This means that points near the center of C are sent far away, and vice versa. Points on C are unchanged. Technically I guess we should say that this function isn’t defined at O, but people like to say it is taken to “the point at infinity” and, conversely, that inversion takes \infty to O. These things can be made precise.

You might notice that doing inversion twice in a row gets you right back where you started. It also turns out that If C' is another circle in the plane, not passing through O, then the inversion of all of its points is another circle. If C' passes through O, then the inversion works out to be a line. Since doing inversion twice is the identity, inversion takes lines to circles through O. If you’re thinking about the comments about \infty above, this makes sense because every line “goes to \infty“, and so the inversion of a line will go through the inversion of \infty, which I said should be O.

All of this talk about inversion was to describe the function 1/z. This function is the composite of inversion with respect to the unit circle centered at the origin followed by a reflection across the horizontal axis (real line). Don’t believe me? The equation d'=r^2/d defining the relationship between distances when doing the inversion can be re-written as dd'=r^2. If we’re doing inversion with respect to a unit circle, then dd'=1. This means that when we multiply z with its inversion with respect to the unit circle, call it z', the result will be a point with norm 1 (i.e., a point on the unit circle). Next up, multiplying z by z' produces a point whose angle from the positive real axis (which I called the argument before, the \theta from polar coordinates) is the sum of the angles for z and z'. Since we did the reflection across the horizontal axis, the argument for z' is precisely the negative of the argument for z, meaning their sum (the argument of their product) is 0. So zz' is a point on the unit circle making an angle of 0 with the positive real line, i.e., zz'=1. That makes z'=1/z, as promised.

Let’s get back to the general setup, with the function


and let’s assume c\neq 0 (since we already handled the case c=0, it’s just a linear transformation). For some notational convenience, let me let \alpha=-(ad-bc)/c^2. Consider the following composite:

\begin{array}{rcl} z & \xrightarrow{w\mapsto w+\frac{d}{c}} & z+\dfrac{d}{c} \\ {} & \xrightarrow{w\mapsto \frac{1}{w}} & \dfrac{c}{cz+d} \\ {} & \xrightarrow{w\mapsto \alpha w+\frac{a}{c}} & \dfrac{\alpha c}{cz+d}+\dfrac{a}{c} \end{array}

If you check all of these steps, and then play around simplifying the final expression, then you obtain the original formula above. So we can think of any linear fractional transformation as a composite of some linear functions and an inversion, and we know how to picture all of those steps.

That’s maybe enough for today. It’s certainly enough for me for today. Before I go, I’ll leave you with a video that might be helpful, and is pretty either way.

Ford Circles

November 5, 2009

Now that we’ve got some foundations laid for Farey sequences and rational approximations, let’s put pictures to some of those claims, via Ford circles.

For every rational h/k, in reduced terms, draw a circle in the plane with center (h/k,1/(2k^2)) and radius 1/(2k^2). I’m not going to try to come up with a better picture for you than the one used on the Wikipedia page for Ford circles. (I tried embedding the picture below, but that didn’t work. So I’ll leave it to you to click the link.)

What’s so great about this construction? Well, you may recognize those radii as the error term in our discussion of rational approximations. The statement that for every irrational \omega there are infinitely many reduced fractions h/k such that |\omega-h/k|<1/(2k^2) is the same as saying that the vertical line x=\omega intersects infinitely many of the Ford circles. The improved bound 1/(\sqrt{5}k^2) can also be found by (very carefully) analyzing vertical lines passing through Ford circles (more specifically, the gaps between them), but I think for now I’m going to let that statement be for now. If you want, the proof via Ford circles is in [Rademacher], and a proof via continued fractions is in [Hardy and Wright]. The result is known as Hurwitz’s theorem.

What else can be said about Ford circles? Well, yesterday I was talking about neighbors in Farey sequences. It turns out that h/k and H/K are neighbors in a Farey sequence if and only if the Ford circles at h/k and at H/K are tangent to each other.

To see why this is true, consider the circles at h/k and H/K, with h/k<H/K so Hk-hK>0). Since we know the location of their centers, we can find an expression for the distance between those centers. The two circles will intersect iff the distance between the centers is no bigger than the sum of the radii of the two circles. Re-arranging terms (and working with squares to avoid square roots), you might look at d^2-(r_k+r_K)^2, where d is the distance between the centers, and r_k and r_K denote the (hopefully obvious) radii. The two circles will be tangent if this is 0, and won’t intersect if the difference is positive. More care needs to be used when talking about what happens if the difference is negative, but we’ll see momentarily that this doesn’t happen, so let’s forget it. So, it turns out that d^2-(r_k+r_K)^2 is ((Hk-hK)^2-1)/(k^2K^2). Since all the letters there are positive integers, we see that d^2-(r_k+r_K)^2\geq 0, with equality iff Hk-hK=1. This says, based on our work yesterday, precisely that two Ford circles intersect iff the fractions they correspond to are neighbors in a Farey sequence, and when this happens, they intersect at a single point (they are tangent to eachother).

This justifies the picture from above, then. The circles never overlap, and some of them are tangent.

It might be worth it to go back and justify why any vertical line at an irrational value slices through infinitely many of the Ford circles. To that end, let \omega\in (0,1) be irrational. The two circles at 0/1 and 1/1 both have radius 1/2, and so certainly x=\omega hits at least one Ford circle, kicking off an induction argument.

Suppose that x=\omega hits the Ford circle h/k. For the sake of argument, let us suppose that \omega>h/k, so the vertical line lies to the right of the center of the circle. Not much changes if the opposite inequality is true, so let’s just stick with this one. I aim to show that x=\omega hits a circle with a smaller radius, i.e. goes through a circle corresponding to a fraction with a bigger denominator than k. I claim that it hits one of the circles tangent to the circle at h/k. By the above, we know that these are the circles corresponding to neighbors of h/k in some Farey sequence. Since I’m thinking about circles to the right of h/k, these will be successors in the Farey sequence. Yesterday we said that all of these successors are (m,1)-weighted mediants of h/k with H/K where H/K is the successor of h/k in F_k (the first Farey sequence that h/k shows up in).

Consider, then, the circles corresponding to the (1,1), (2,1), (3,1),… weighted mediants of h/k with H/K. Let me call the circles C_1,C_2,\ldots, using C for the circle at h/k. It isn’t too much work to check the necessary inequalities to show that the center of C_1 is further right than the right-most point on C. I’ve already said, even if not directly, that for each n, C_n is tangent to C_{n+1}. So, if you want, you may start at the top of C_1, and move left along the tops of all the circles in turn, jumping from one to the next at points of tangency. How far can you go doing this? Well, if you take the limit as m\to\infty of the (m,1)-weighted mediant of h/k with H/K, you get h/k (l’Hospital!). So, the tops of the circles form an unbroken path from the top of C_1 to as close to h/k, on the right, as we care to get. The vertical line x=\omega, which passes through the right side of C, must surely pass through this path, and therefore at least one of the C_n.

That’s probably just about enough with Ford circles for now. I’m not sure what I’ll be writing about tomorrow. Hopefully something else fun. I would like to note that, as I was writing this post, I was excited to find a connection to continued fractions. I knew a connection should be there (and I’ll probably find plenty more later), and found it while asking “how can we tell which of the C_n does x=\omega pass through?”. I remembered from reading about continued fractions, [a_0;a_1,a_2,\ldots], that the “convergents”, [a_0;a_1,\ldots,a_n] are rationals p_n/q_n that give “best rational approximations”. Those should be the circles that x=\omega passes through, or so. And so if you pass through p_n/q_n, the next circle you should get to is p_{n+1}/q_{n+1}. Well, it turns out, there is a nice relationship between successive convergents. Namely, p_{n+1}=a_{n+1}p_n+p_{n-1} and q_{n+1}=a_{n+1}q_n+q_{n-1}. Look at that! It’s the (a_{n+1},1)-weighted mediant of p_n/q_n with p_{n-1}/q_{n-1}! Fantastic. Could this story get any better? I’m glad I’ve decided to find out this month.

Neighbors in Farey Sequences

November 5, 2009

I was reading about Ford circles today, and was planning on writing about them. But then I noticed that one of the things I wanted to say was based on some things I didn’t say about Farey sequences when I talked about them a few days ago. So I thought I’d go back and fill in that gap, and save Ford circles for tomorrow.

So what did I skip? Well, I mentioned that to get from one Farey sequence to the next, you throw in mediants whose denominators aren’t too big. Is there a way, given a fraction h/n in F_n to figure out what terms in F_{n-1} it was the mediant of? Well, we could, of course, list the terms in F_{n-1}, in order, and then pick the two that surround the value h/n. I’m after another way.

Since h/n is supposed to be reduced, h and n are relatively prime, and so there are integers s and t such that hs-nt=1 (this usually (in my mind) gets written with a “+”, not a “-“, but just change the sign of your t). Once you’ve found such a pair (s,t), all of the other pairs that satisfy the same equation are of the form (s+mn,t+mh), where m is any integer. Among the s+mn, there is only one in the interval [0,n). Abusing notation a bit, or assuming you picked well to start, let’s call this value s, and its partner t. I claim that t/s is the term directly before h/n in F_n.

Before talking about t/s, I should have checked that s\neq 0. If this is not the case, then hs-nt=1 means n=1, in which case we are in F_1 which is just 0/1 and 1/1. Not much interesting going on there. So let’s go ahead and assume s>0.

Now, the linear equation hs-nt=1 gets re-written h/n-t/s=1/(ns) to show that h/n>t/s (so the fractions are at least in the correct order). Messing about with inequalities some more, you can check that 0\leq t<s, and since s and t are relatively prime (they satisfy the linear equation), the fraction t/s is actually in F_n. Let a/b be the predecessor of h/n in F_n (we want to show a=t,b=s). Since it is a neighbor, we showed last time that then hb-na=1. But now we’re saying that all solutions of the equation hx-ny=1 have a fixed form. That is, b=s+mn and a=t+mh for some integer m. But if m\neq 0 then b is not in [0,n), and so a/b is not in F_n. So m=0 and a=t,b=s, and we’re done.

Man, ok, so, after all that, we now know how to find the term before h/n in F_n. I’ll call it t/s, sticking with our notation above. What about the term after h/n? You could play mostly the same games as above, I suppose. Or, if the successor is T/S, then we know that h=t+T and n=s+S, since we made h/n as a mediant. So T=h-t and S=n-s, easy enough. [Update 20091109: You have to be careful with this. The above method for finding the successor to h/n in F_n only works because you are in F_n. If you look for successors in h/k in F_n, where k<n, this method, using the predecessor, is wrong. For example, 1/4 is the predecessor of 1/3 in F_5, but the fraction (1-1)/(3-4) suggested by the previous method is clearly NOT the sucessor of 1/3 in F_5]

So we know the neighbors of h/n in F_n. What is the next Farey sequence when h/n gets a new neighbor? Let’s stick to the next time it gets a new neighbor “on the left” (less than itself). We know that when that happens, it had to be because we made the mediant with the old neighbor, our friend t/s. So the next neighbor is (t+h)/(s+n), which doesn’t show up until F_{s+n}. The newest neighbor after that will be the mediant with this new neighbor, i.e., (t+2h)/(s+2n), in F_{s+2n}. Continuing on, we see that all of the neighbors are of the form (t+mh)/(s+mn), for some positive integer m. I called these (1,m)-weighted mediants (of t/s and h/n) yesterday.

The story goes pretty much the same on the right side. I called the neighbor on the right, in F_n, T/S. The next neighbors will be (h+T)/(n+S), (2h+T)/(2n+S), etc. These are the (m,1)-weighted mediants of h/n and T/S.

We can put both of these into the same statement by noticing that the (1,-m) mediant of t/s with h/n is the same as the (m-1,1) mediant of h/n with T/S. In other words, all of the neighbors of h/n, in all of the Farey sequences, can be written as (1,m) weighted mediants of t/s with h/n. They will have numerators of the form a=t+mh and denominators of the form b=s+mn. Guess what? These are exactly all of the solutions to hx-ny=1. Go back a few paragraphs, it’s the same thing I said there.

In summary, the neighbors to h/n, in any Farey sequence, are in one-to-one correspondence with solutions to hx-ny=1.

If I knew what I were doing, I could probably have gotten to this statement a bit more quickly. But I don’t remember the last time I knew what I was doing, so there you have it.

Farey Sequences

November 3, 2009

Ok, so if I’m going to talk about Farey sequences and the Riemann hypothesis for MaBloWriMo, then Farey sequences are probably a good place to start.

Let us say that the n-th Farey sequence is the sequence of fractions h/k where 0\leq h\leq k\leq n and h and k are relatively prime (so that h/k is in “lowest terms”). Let’s denote this sequence by F_n. The first few are:

F_1: 0/1\quad 1/1

F_2: 0/1 \quad 1/2\quad 1/1

F_3: 0/1\quad 1/3\quad 1/2\quad 2/3\quad 1/1

F_4: 0/1\quad 1/4\quad 1/3\quad 1/2\quad 2/3\quad 3/4\quad 1/1.

There’s sort of a nice way to get from one Farey sequence to the next: add fractions incorrectly.

The mediant of the fractions h_1/k_1 and h_2/k_2 is defined to be the fraction (h_1+h_2)/(k_1+k_2). It turns out that the mediant of two fractions always lies between them, which you can check by taking differences and noticing they have the correct sign. Now, to make the sequence F_{n+1}, knowing the sequence F_n, you simply include all the existing terms, and any mediants of successive terms that have a denominator no bigger than n+1. So, for example, consider F_4 above. We consider the fractions obtained by adding to this sequence the mediants:

0/1\quad 1/5\quad 1/4\quad 2/7\quad 1/3\quad 2/5\quad 1/2\quad 3/5\quad 2/3\quad 5/7\quad 3/4 \quad 4/5\quad 1/1

and then we throw out any fraction whose denominator is bigger than 5, giving us

F_5: 0/1\quad 1/5\quad 1/4 \quad 1/3\quad 2/5\quad 1/2\quad 3/5\quad 2/3\quad 3/4 \quad 4/5\quad 1/1.

There’s a little something to worry about in this process. Well, probably a few little somethings. The one I’ve got in mind is: are the mediants always reduced? However, you might also ask: are we sure we get all of the fractions this way?

To begin to answer these, let me change the subject. Look at any of the Farey sequences listed above, and pick any two successive terms in that sequence. Call them h_1/k_1 and h_2/k_2, and let’s say that we’ve picked our indices so that h_1/k_1 < h_2/k_2. Now compute h_2/k_2 - h_1/k_1. Done? You got 1/(k_1k_2), didn’t you? This happens no matter what Farey sequence you are in, and no matter which successive terms you take. You always find that, with the notation above, h_2k_1-h_1k_2=1. This is fairly interesting.

We already knew that, for example, h_1 and k_1 were relatively prime. And for a pair h_1,k_1 of relatively prime integers, there always exist integers s,t such that h_1s+k_1t=1. Conversely, if you can find such a pair of integers, then h_1 and k_1 are relatively prime. What’s perhaps a little surprising is that, in this case, the s and t happen to show up in the next term of the Farey sequence: s=-k_2 and t=h_2.

Armed with the observation that the difference between successive terms has a 1 in the numerator, in even just the first Farey sequence, F_1, we can proceed by induction on n to show that if h_1/k_1 < h_2/k_2 are successive terms in F_n then: (1) h_2k_1-h_1k_2=1, and also (2) the mediant of these fractions is reduced. Therefore, if the mediant has a denominator no bigger than n+1, it will get included in F_{n+1}.

Does this process of expanding one Farey sequence to the next by throwing in mediants actually give us all of the terms of the next Farey sequence? That is, if m/n is a fraction in reduced form, is it the mediant of two successive fractions from F_{n-1}?

Let me answer that by considering what I’ll call weighted mediants (a name I just made up, it may or may not show up in your favorite reference). Let me say that the (a,b)-weighted mediant of fractions h_1/k_1 < h_2/k_2 is the fraction (ah_1+bh_2)/(ak_1+bk_2). I’ll assume a,b,h_1,h_2,k_1,k_2>0. So, for example, the normal mediant, defined above, is just the (1,1)-weighted mediant. I claim that: (1) every weighted mediant lies between the two original fractions, (2) every fraction between two successive terms in a Farey sequence can be obtained as a weighted mediant.

Claim (1) is easy, just take appropriate differences and note that they have the appropriate sign, as before. For claim (2), suppose h_1/k_1<p/q<h_2/k_2, where h_1/k_1,h_2/k_2 are successive terms in a Farey sequence, so that h_2k_1-h_1k_2=1. Our goal is to find a,b so that p=ah_1+bh_2 and q=ak_1+bk_2. The assumption h_2k_1-h_1k_2=1 means that we can, indeed, find integers satisfying this condition, namely: a=-k_2p+h_2q and b=k_1p-h_1q.

So, back to the question about obtaining every fraction in F_n by taking mediants from F_{n-1}. We need to show that if m/n<1 and m is relatively prime to n, then m/n is the mediant of two successive fractions in F_{n-1}. Well, m/n certainly lies between two of them, say h_1/k_1 and h_2/k_2 in keeping with our notation, and we have just shown that it may therefore be obtained as some weighted mediant (we want it to be the (1,1)-weighted mediant). Since the denominator of the (a,b)-weighted mediant is ak_1+bk_2, the smallest denominator is obtained only with the (1,1)-weighted (i.e., normal) mediant. This denominator must be at least n, because otherwise the fraction would show up in some Farey sequence before F_n. As m/n is obtainable as a weighted mediant, it must be the (1,1)-weighted mediant, which is what we wanted to show. Let’s call it a day.