Posts Tagged ‘mablowrimo’

Möbius Inversion

November 25, 2009

Yesterday I mentioned Möbius inversion, and I decided that it was worth looking at a little bit more. First of all, it’s pretty fun. Secondly, it’ll give us a useful identity concerning \zeta(s).

Before I get too far, I should add another reference to my growing list. With Edwards’ book, and Hardy and Wright, include Apostol’s “Introduction to Analytic Number Theory.”

Anyway, yesterday I defined the Möbius function

\displaystyle \mu(n)=\begin{cases}1 & n=1 \\ 0 & n\text{ not squarefree} \\ (-1)^k & n=p_1\cdots p_k\text{ distinct primes}\end{cases}

and stated Möbius inversion as follows:

Thm: If f(n)=\sum\limits_{d|n}g(d) then g(n)=\sum\limits_{d|n}\mu(d)f(n).

Let’s begin today with a proof of this statement. Pf: Suppose f(n) is as stated, and consider \sum\limits_{d|n}\mu(d)f(n). By assumption on f, this is \sum\limits_{d|n}\mu(d)\sum\limits_{d'|(n/d)} g(d') which I’ll write as \sum\limits_{dd'|n}\mu(d)g(d'). Now in this sum, what is the coefficient of g(m)? Well, it is \sum\limits_{dm|n}\mu(d)=\sum\limits_{d|(n/m)}\mu(d). We saw yesterday that this sum is 0 unless n/m=1, i.e., unless m=n. So \sum\limits_{d|n}\mu(d)f(n)=g(n), as claimed.

That’s pretty fun, I think.

There’s another way to go about these things, and it is by way of what are called Dirichlet series (according to Wikipedia, Dirichlet had many names). Suppose f is a function defined on the positive integers (such a function seems to sometimes get called an arithmetic function). I’ll say that the Dirichlet series for f is the series F(s)=\sum_n f(n)n^{-s}. I’ve written it this way, so that I have a function defined whenever the series converges, but I’ll pretty much ignore convergence aspects in what follows.

Suppose F(s) and G(s) are the Dirichlet series associated to some f,g, respectively. What does the product of these two series look like? Well, since f(n)n^{-s}\cdot g(m)m^{-s}=f(n)g(m)(nm)^{-s}, you can determine that the product of these two Dirichlet series is another Dirichlet series, this one associated to the function c(n)=\sum\limits_{d|n}f(d)g(n/d). Indeed, just ask yourself, “what is the coefficient of n^{-s} in the product of the series F(s) and G(s)?”

By the way, I’ve called the product series c because it also gets called the convolution of f and g. The convolution of f and g is denoted f*g, and is simply the function (f*g)(n)=\sum\limits_{d|n}f(d)g(n/d).

Now you’ve got an operation, *, on arithmetic functions, and you can start asking what properties it has. It turns out that * gives the set of arithmetic functions (well, those functions that aren’t 0 on the input 1) the structure of a commutative group. The identity is the function I with I(1)=1 and I(n)=0 if n>1 (corresponding to the identity Dirichlet series, the constant function 1). It’s sort of a fun little induction problem to find a formula for the inverse of a given arithmetic series. We won’t need the formula exactly, so perhaps I’ll leave it for you.

Let z(n)=1, so that the associated Dirichlet series is \zeta(s).

We now have another language in which to state Möbius inversion:

Thm: If f=g*z then g=\mu*f.

In this statement, let g=I (the identity of convolution) so that f=z. Then we obtain I=\mu *z. On taking Dirichlet series, we find that

\displaystyle 1=\left(\sum_n \frac{\mu(n)}{n^s}\right)\zeta(s),

or, in other words,

\displaystyle \frac{1}{\zeta(s)}=\sum_n \frac{\mu(n)}{n^s}.

How’s that for an identity?

There’s another way to get at this identity, without the language of Dirichlet series (explicitly anyway), which I wrote “holy crap” next to in the book (Hardy and Wright) when I read it. So probably I should share. Using Euler’s product (recall the indexing is over primes)

\displaystyle \zeta(s)=\prod_p (1-p^{-s})^{-1}

we have

\displaystyle \frac{1}{\zeta(s)}=\prod_p (1-p^{-s}).

Here’s the part that rocked me: Look at 1-p^{-s} and write it as 1-p^{-s}+0p^{-2s}+0p^{-3s}+\cdots, i.e. as \sum_k \mu(p^k)p^{-ks}. Because the first thing I think to do when I see a binomial is to write it as an infinite series of mostly 0s. Like I said yesterday, I guess it helps if you know where you are going. So anyway, we’ve got

\displaystyle \frac{1}{\zeta(s)}=\prod_p \sum_k \frac{\mu(p^k)}{(p^k)^s}.

Now, like we’ve been doing, let’s ask, “what is the coefficient of n^{-s} when the product is all written out?” Well, the only way to get n^{-s} is the term corresponding to the prime factorization of n. If n=p_1^{e_1}\cdots p_k^{e_k}, then the coefficient is \mu(p_1^{e_1})\cdots \mu(p_k^{e_k}). But \mu is “multiplicative,” meaning that if a and b are relatively prime then \mu(a)\mu(b)=\mu(ab). So the coefficient of n^{-s} is just \mu(p_1^{e_1}\cdots p_k^{e_k})=\mu(n). This proves the identity.

That’s the identity we’ll use in the next post or two to finally tie together Farey sequences and the zeta function, believe it or not. So I’ll basically stop here. But I do want to mention that you can do some of the things we’ve done above in slightly more general contexts, and I remember really enjoying reading about them. Namely, you can work in other posets. In what we’ve been doing above, there has, somewhere in the background, been the poset of positive integers ordered by n\leq m iff n|m. Probably a good place to start reading about the generalization is Wikipedia, as usual. If I find, sometime, the paper I read where I first saw these things, I’ll update this post.

In the mean time…


A Formula for Primes

November 24, 2009

I wanted to write this post yesterday, but I couldn’t work out some of the steps that I wanted to be able to work out, so I had to wait a day and try again. But I think I’ve got it now, let’s see…

We’ve been working with the function

\displaystyle J(x)=\sum_{p^n\leq x}\frac{1}{n}.

Last time, we arrived (skipping over some rather large gaps) at an analytic formula for J(x), namely:

\begin{array}{l}\displaystyle J(x)=Li(x)-\sum_{\text{Im }\rho>0}\left(Li(x^{\rho})+Li(x^{1-\rho})\right)\\ \displaystyle\qquad\qquad +\int_x^{\infty}\frac{dt}{t(t^2-1)\ln t}-\ln 2,\end{array}

where (mod issues)

\displaystyle Li(x)=\int_0^x \frac{dt}{\ln t}.

The goal today is to use this formula for J(x) to find a formula for primes. In particular, we will find a formula for \pi(x), which is the number of primes no larger than x. These two functions are related, as follows: recall that J(x) is a step function which jumps by 1/n at every prime power p^n. Pick an x_0. How many jumps of size 1 does J(x) make before getting to x_0? Well, it jumps by 1 at the primes, so it makes \pi(x_0) jumps of size 1. How many jumps of size 1/2? These are jumps at prime squares, p^2\leq x_0. This inequality is the same as p\leq x_0^{1/2}, so we make \pi(x_0^{1/2}) jumps of size 1/2. Similarly, we’ll make \pi(x_0^{1/n}) jumps of size 1/n to calculate J(x_0). Eventually (for some large enough N) x_0^{1/N} will be less than 2, and so \pi(x_0^{1/N})=0, and we can stop looking for jumps.

Translating the above into symbols, we have just decided that


I’ve written it as an infinite sum, but eventually (like I said) those \pi values will be 0, so we’ve really got a finite sum.

Now wait, our goal was a formula for \pi(x), in terms of J(x), and I’ve got a formula the other way, J(x) in terms of \pi(x).

This is the part I got stuck on yesterday. Edwards’ book says (or perhaps doesn’t, causing my problems) that basically you do Möbius inversion to the sum above, and obtain a formula for \pi(x) in terms of J(x). He says slightly more than that, but I wasn’t able to piece it together as easily as he seemed to indicate it should work. But anyway, what’s this Möbius inversion thing? I’ll take a pretty low-level account, leaving lots of room for things to be jazzed up (which I may return to and do sometime soonish).

An integer is said to be “squarefree” if it factors into the product of distinct primes or, equivalently, it is not divisible by a square. Define the following function, called the Möbius function, on positive integers (p_i is supposed to indicate a prime):

\displaystyle \mu(n) = \begin{cases}1& n=1 \\ 0 & n\text{ not squarefree} \\ (-1)^k & n=p_1\cdots p_k\text{ squarefree}\end{cases}

So, for example, \mu is -1 at any prime, \mu(6)=\mu(10)=1, and \mu(8)=\mu(12)=0.

Möbius inversion is then the following:

If f(n)=\sum_{d|n}g(d) then g(d)=\sum_{d|n}\mu(d)f(n/d), and conversely.

Well, I tried to figure out how my expression for J(x), in terms of \pi(x^{1/n})s was such an f of g. I don’t see it, as stated. Perhaps somebody can point me in the right direction in the comments.

The observation you are supposed to make comes when you consider something like \frac{1}{2}J(x^{1/2}). Using our formula above, we can write


or, more generally, we have

\displaystyle \frac{1}{n}J(x^{1/n})=\sum_k \frac{1}{nk}\pi(x^{1/nk}).

If we subtract this equation (with n=2) from the equation for J(x) in terms of \frac{1}{n}\pi(x^{1/n})s, we can eliminate one term from the right-hand side. The idea is to iterate this process. I think a (reasonably, by my blog’s standards) clean way to write what is happening is as follows (it helps when you know the answer you are supposed to get :)):

Consider the sum

\displaystyle \sum_n \frac{\mu(n)}{n}J(x^{1/n}),

which by our most recent formula is the same as

\displaystyle \sum_{n,k}\frac{\mu(n)}{nk}\pi(x^{1/nk}).

Now, fix a positive integer m. The coefficient of \pi(x^{1/m}) in this last double sum is

\displaystyle \sum_{nk=m}\frac{\mu(n)}{m}=\frac{1}{m}\sum_{n|m}\mu(n).

Almost there… what is \sum_{n|m}\mu(n)? Turns out, it is 1 if m=1 (which is clear), and 0 otherwise. Indeed, suppose m has prime factorization p_1^{e_1}\cdots p_k^{e_k}. Then all of the divisors, n in the sum we’re after, are of the form p_1^{f_1}\cdots p_k^{f_k} where each 0\leq f_i\leq e_i. The only divisors, n, where \mu(n)\neq 0 are those where the f_i are all either 0 or 1. So the worthwhile n could be 1, or some p_i, or a p_ip_j (i\neq j), or p_ip_jp_{\ell} (i\neq j, i\neq \ell, j\neq \ell), etc. So

\begin{array}{l}\displaystyle \sum_{n|m}\mu(n)=1+\sum_i \mu(p_i)+\sum_{i,j} \mu(p_ip_j)+\cdots+\mu(p_1\cdots p_k) \\ \qquad \qquad \displaystyle =1-k+\binom{k}{2}-\binom{k}{3}+\cdots+(-1)^k=(1-1)^k=0.\end{array}

(By the way, this is (almost verbatim) the proof in Hardy and Wright’s “An Introduction to the Theory of Numbers”, and I’ll count that as one of my references for anything I say about \mu).

So, let’s gather back up here. We’ve now shown that in the sum

\displaystyle \sum_n \frac{\mu(n)}{n}J(x^{1/n})=\sum_{n,k}\frac{\mu(n)}{nk}\pi(x^{1/nk})

the coefficient of \pi(x^{1/m}) is 0 unless m=1, when the coefficient is 1. Therefore, we have obtained out formula for \pi(x):

\begin{array}{l} \pi(x)=\displaystyle \sum_n \frac{\mu(n)}{n}J(x^{1/n})\\ \qquad \displaystyle=J(x)-\frac{1}{2}J(x^{1/2})-\frac{1}{3}J(x^{1/3})-\frac{1}{5}J(x^{1/5})+\frac{1}{6}J(x^{1/6})-\cdots\end{array}

I guess the point of Riemann’s paper was that the terrible integral formula he obtained for J(x) could then be used in this expression for \pi(x), to give an exact analytic expression (that is, one only an analyst would find pretty) for \pi(x). Pretty wild, in my mind.

It seems that if you get some feeling for the terms in J(x), you can decide that the “dominant” term is the Li(x) term. And then that still ends up as the dominant term in the expression above for \pi(x). And so you get to say

\pi(x)\sim Li(x).

So you’ve got that going for you, which is nice.

Another Formula for J(x)

November 22, 2009

Yesterday, I related the logarithm of \zeta(s) to a piecewise linear function J(x). You may recall that J(x) was defined for positive reals by setting it equal to 0 at x=0, and then jumping by 1/n whenever x=p^n, for some prime p and integer n. At the end of the day, we got to

\displaystyle J(x)=\frac{1}{2\pi i}\int_{a-i\infty}^{a+i\infty}\ln \zeta(s) x^s\frac{ds}{s}

where a=\text{Re}(s)>1. Today, we’ll analyze \ln \zeta(s) some more, and re-write the formula above.

When I introduced \zeta(s), I ended with the following formula:

\displaystyle \zeta(s)=\frac{\Gamma(1-s)}{2\pi i}\int_{+\infty}^{+\infty} \frac{(-x)^s}{e^x-1} \frac{dx}{x}

where the bounds on that integral are supposed to represent a curve that “starts” at the right-hand “end” of the real line, loops around 0, and then goes back out the positive axis to infinity. I’m not good enough at complex line integrals at this point to say any more about this. But apparently if you are good at these sorts of integrals, using Cauchy’s integral formula and things you can find the so-called “functional equation”


If you then use the relation I mentioned previously:

\Gamma(s)\Gamma(1-s)\sin(\pi s)=\pi

(well, you use this for s=s/2), and one I haven’t mentioned:


and move some symbols around, you arrive at a more symmetric equation:


Notice that if you plug 1-s in the formula on the left-hand side, you obtain the right-hand side.

This function on the left-hand side apparently has poles at 0 and 1, so if we define


then we obtain an entire analytic function satisfying \xi(s)=\xi(1-s). Using the factorial relation for \Gamma, we can re-write \xi(s) as


I get the impression that if you know what you are doing, then the things above aren’t tooo hard to justify. Apparently the next part is a bit trickier. Apparently, you can write


where the product is indexed over the roots, \rho, of \xi (so \xi(\rho)=0).

If you’ve heard anything about the Riemann hypothesis, you know that the roots (the “non-trivial” ones, I didn’t talk about the trivial ones) of \zeta(s) are a big deal. Our second formula for \xi(s) shows that they are (basically) the same as the roots of \xi(s), and so they are the \rho that the sum above is indexed over. The symmetric equation from earlier has a little something to say about the zeroes, and it has been shown that all of the zeroes have real part bigger than 0 and less than 1 (this is called the “critical strip”). The hypothesis (whose truth won’t affect what we’re saying below) is that all of the zeroes have real part 1/2 (this is the “critical line”). Apparently Riemann didn’t need this hypothesis for the things in his paper that introduced it, so I don’t really have much more to say about it right now. Although, honestly, I still don’t see what all the fuss is about 🙂 The formulas we’ll get below and tomorrow work even if the roots aren’t on the critical line (unless I’m missing something important. If I am, please comment).

Anyway, back to the topic at hand. Let me try to convince you that it isn’t horribly unreasonable to think about writing a function as a product over its roots, as I’ve done above. For the sake of example, let f(x)=3x^3+3x^2-30x+24 (or pick your own favorite polynomial). The usual way this would get factored, in all the classes I’ve ever taken or taught, is (up to permutation) f(x)=3(x+4)(x-1)(x-2), showing that the roots are x=1,2,-4. However, if you factor a 4 out of the x+4 term, and -1 and -2 out of the other terms, you can also write f(x)=24(1-\frac{x}{-4})(1-x)(1-\frac{x}{2}). You still see all the zeroes when you write the polynomial this way. You can also see that the coefficient in the front is f(0). So we’ve written f(x)=f(0)\prod_{\rho}(1-x/\rho), which is the same goal as what we’re doing with \xi above. Incidentally, the idea of writing a function this way was also used by Euler to establish \zeta(2)=\sum 1/n^2=\pi^2/6 (I’ve mentioned this briefly elsewhere).

We now have two formulas for \xi(s), so we can put them together to get


Recalling that our formula for J(x), at the beginning, involved \ln\zeta(s), let’s take the log of the equation above and solve for the \zeta term:

\begin{array}{l} \ln \zeta(s)=\ln \xi(0)+\sum_{\rho}\ln(1-\frac{s}{\rho})\\ \qquad\qquad -\ln \Gamma(\frac{s}{2}+1)+\frac{s}{2}\ln \pi-\ln(s-1).\end{array}

The idea is now to plug this in the formula for J(x). Apparently if you do, though, you’ll have some issues with convergence. So actually try to do the integral in J(x), using integration by parts (hint: dv=x^s ds). The “uv” term goes to 0 and you obtain

\displaystyle J(x)=\frac{-1}{2\pi i}\cdot \frac{1}{\ln x}\int_{a-i\infty}^{a+i\infty}\frac{d}{dx}\left[\frac{\ln\zeta(s)}{s}\right]x^s\ ds,

where, as before a=\text{Re } s. Now plug in the 5 terms we’ve got above for \ln \zeta(s), and you get a formula for J(x). What happens to the terms? Can you actually work out any of the integrals?

Well, you might be able to. I’m not. Not right now anyway. But I can tell you about what others have figured out (rather like I’ve been doing all along, in fact)…

It’s clear that the \frac{s}{2}\ln \pi term drops out, because you divide by s and then take the derivative of a constant and just get 0. The term with \ln\xi(0) ends up just giving you \ln\xi(0), which is -\ln(2).

The term corresponding to the term with \Gamma in it can be rewritten as

\displaystyle \int_x^{\infty}\frac{dt}{t(t^2-1)\ln t}

(as if that were helpful).

The important terms seem to involve the function

\displaystyle Li(x)=\int_0^x \frac{dt}{\ln t}.

Of course, this integrand has a bit of an asymptote at 1, so really Li(x) (in Edwards’ book, anyway) is the “Cauchy principal value” of this integral, namely

\displaystyle Li(x)=\lim_{\epsilon\to 0^+}\int_0^{1-\epsilon}\frac{dt}{\ln t}+\int_{1+\epsilon}^x \frac{dt}{\ln t}.

This function is, rather famously, related to approximating the number of primes less than a given bound. In fact, tomorrow I plan on having more to say about this. But back to the terms in our integral for J(x).

The term corresponding to the sum over the roots ends up giving you

\displaystyle -\sum_{\text{Im }\rho>0}Li(x^{\rho})+Li(x^{1-\rho}).

But apparently the dominant term is the term corresponding to \ln (s-1). It actually gives you Li(x)

So, finally, we have written

\begin{array}{l}\displaystyle J(x)=Li(x)-\sum_{\text{Im }\rho>0}\left(Li(x^{\rho})+Li(x^{1-\rho})\right)\\ \displaystyle\qquad\qquad +\int_x^{\infty}\frac{dt}{t(t^2-1)\ln t}-\ln 2.\end{array}

Doesn’t that make you feel better? We started with the reasonably understandable

\displaystyle J(x)=\sum_{p^n\leq x}\frac{1}{n},

and created the monstrosity above. I guess this is why I’m not an analyst. To me, it seems worse to write J(x) as this terrible combination of lots of integrals. But apparently it’s useful in analysis to have such formulas. I guess we’ll see a use tomorrow…

The Log of Zeta

November 21, 2009

Last time (too long ago, sorry), I finally got around to talking about the function \zeta(s). Today, I think I’d like to talk about \log \zeta(s).

First, though, I should mention Euler’s product formula. The goal is to re-write the infinite series \zeta(s)=\sum_n n^{-s} as a product. For each prime p, we can tease out of the series above those terms where n is p to some power. That is, we can think about the sub-series \sum_{k\geq 0} (p^k)^{-s}. This is a geometric series that converges to (1-p^{-s})^{-1}. If we take two such series and multiply them together, which terms from our starting series for \zeta(s) will we recover?

Suppose p and q are two primes, and consider the product

\displaystyle\left(1+\frac{1}{p^s}+\frac{1}{p^{2s}}+\frac{1}{p^{3s}}+\cdots\right)\cdot \left(1+ \frac{1}{q^s}+\frac{1}{q^{2s}}+\frac{1}{q^{3s}}+\cdots\right).

The terms in \sum_n n^{-s} that we obtain from this product are those where n is a product of some power (possibly the 0 power) of p and some power (again, possibly 0) of q. We could then think about another prime, say r, and it’s geometric series, and multiply it by the above, and obtain all the terms in \sum_n n^{-s} where n=p^aq^br^c for some 0\leq a,b,c.

Continuing on, and recalling that every positive integer has a unique factorization as a product of primes, we obtain the magical formula (isn’t most of Euler’s work magical?):

\displaystyle \sum_n \frac{1}{n^s}=\prod_p (1-p^{-s})^{-1},

where the product is indexed by the primes. What’s nice about this, for us, at the moment, is that logarithms work well with products and powers, but not soo well for sums. Recalling the Taylor polynomial

\ln(1-x)=-1-x-x^2-x^3-\cdots=-\sum\limits_{n=0}^{\infty} x^n

we find

\displaystyle \ln \zeta(s)=\sum_p\sum_n \frac{1}{n}p^{-ns}.

That was fun. Incidentally, it’s not much of a jump from here to show that the series of prime reciprocals diverges. I mentioned it in a post a while ago.

Let’s switch gears for a little bit. I’m going to define a function J(x) on the positive reals. If you’re following Edwards’ book with me (as I jump all over the place), this will be almost exactly his J(x), differing only at the integers (for today anyway :)). Let me start by setting J(0)=0. The function J(x) will be a step function, which I’ll define to mean piecewise linear with slope 0 on each piece. So to define J(x) I only need to tell you when to jump, and by how much. The rule is: when you get to an integer n, if n=p^k for some prime p, then you jump up by 1/k. So at 2,3,5,7,11,\ldots you jump by 1, at 4,9,25,49,\ldots you jump by 1/2, at 8, 27, \ldots you jump by 1/3, etc. Here’s a little picture of J(x), with a few values highlighted:

Slightly more precisely, we can write

\displaystyle J(x)=\sum_{p^n\leq x}\frac{1}{n}.

Now, let me see if I can convince you that there is some justification in writing

\displaystyle \ln \zeta(s)=s\int_0^{\infty} J(x)x^{-s-1}\ dx.

Let’s work with the right-hand side. Since J(x)=0 near x=0, I’ll actually start my integral at x=1. I think the way to go about it is as follows:

\begin{array}{rcl} \displaystyle s\int_1^{\infty}J(x)x^{-s-1}\ dx &=& \displaystyle s\sum_{m=1}^{\infty}\int_m^{m+1}J(x)x^{-s-1}\ dx \\ {} & = & \displaystyle \sum_{m=1}^{\infty}J(m)\left(\frac{1}{m^s}-\frac{1}{(m+1)^s}\right).\end{array}

Now suppose that J(m)=J(m+1) for some m. Then the terms corresponding to m and m+1 telescope as follows:

\begin{array}{c} \displaystyle J(m)\left(\frac{1}{m^s}-\frac{1}{(m+1)^s}\right)+J(m+1)\left(\frac{1}{(m+1)^s}-\frac{1}{(m+2)^s}\right)\\ =\displaystyle J(m)\left(\frac{1}{m^s}-\frac{1}{(m+2)^s}\right)\end{array}.

If, also, J(m)=J(m+2), this we can telescope another term into this one, and so on. So, really, the important m in this sum are those where J(x) jumps, which are the prime powers. Let m_i=p_i^{n_i} be the i-th prime power (i.e., the point where J(x) makes its i-th jump), starting with m_1=2. Then we can write

\begin{array}{rcl} \displaystyle s\int_0^{\infty}J(x)x^{-s-1}\ dx & = & \displaystyle \sum_i J(m_i)\left(\frac{1}{m_i^s}-\frac{1}{m_{i+1}^s}\right) \\ & = & \displaystyle \frac{1}{2}J(2)+\sum_{i=2}^{\infty} -J(m_{i-1})\frac{1}{m_i^s}+J(m_i)\frac{1}{m_i^s} \\ & = & \displaystyle \frac{1}{2}+\sum_{i=2}^{\infty} \frac{1}{m_i^s}\left(J(m_i)-J(m_{i-1})\right) \\ & = & \displaystyle \frac{1}{2}+\sum_{i=2}^{\infty} \frac{1}{m_i^s}\cdot \frac{1}{n_i} \\ & = & \displaystyle \sum_{p}\sum_n \frac{1}{n}p^{-ns} \\ & = & \ln \zeta(s).\end{array}

What good is writing \ln \zeta(s) this way? I guess if you know something about Fourier inversion (which I don’t) then you get to say that

\displaystyle J(x)=\frac{1}{2\pi i}\int_{a-i\infty}^{a+i\infty}\log \zeta(s)x^s\frac{ds}{s},

for a=\text{Re}(s)>1. What good is that? I think I’ll have to read some more and tell you about it tomorrow, but it’ll turn out to be useful once we have yet another description of \ln \zeta(s), in terms of the 0s of \zeta(s) (finally getting to those 0s everybody cares so much about).

Riemann’s Zeta Function

November 13, 2009

I guess it is about time to get to the zeta function side of this story, if we’re ever going to use Farey sequences to show how you could prove the Riemann hypothesis. I’ve been reading a bit of Edwards’ book, with the same title as this post, and thought I’d try to summarize the first chapter over the next few posts. I’m not sure that the content of the first chapter is hugely vital for the final goal of relating to the Farey sequences, but I wanted to try to learn some of it anyway.

I should mention, before talking about any of this, that I will not claim to know any complex analysis. The last complex analysis course I took was 5 years ago, I can’t be sure how much I paid attention at the time, and I haven’t used it since. I will be jumping over gaps of various sizes for quite a while in the upcoming posts. Perhaps sometimes I’ll mention that a gap is there. Mostly, though, what I’m after in this story is the outline, and how all of the parts fit together, as a big picture. Perhaps I’ll go back and fill in some gaps, as I understand more.

For today, let’s see if I can build up to an analytic expression of \zeta(s). Our starting point is the function


which is defined for real values of s larger than 1. The goal is to find a nice expression that is defined on more of the complex plane, but agrees with this definition on the reals larger than 1.

To get to that point, we’ll use the \Gamma function:

\Gamma(s)=\displaystyle\int_0^{\infty} e^{-x}x^{s-1}\ dx

This is an analytic function defined everywhere except at the negative integers and 0. If you are only interested in real values of s, this function is a continuous extension of the factorial function, which is only defined on the positive integers. Actually, there is a shift involved, so \Gamma(n)=(n-1)! (Edwards blames this shift on Legendre, whose reasons, he states, “are obscure”. Edwards uses \Pi(s)=\Gamma(s+1) throughout, so I might make some \pm 1 errors). In particular, s\Gamma(s)=\Gamma(s+1) when both sides make sense.

Another relation that we’ll use is that

\pi=\Gamma(s)\Gamma(1-s)\sin(\pi s)

If memory serves (from reading Artin’s lovely little book on the \Gamma function), you can obtain this by determining that the derivative of the expression on the right is 0. This tells you that \sin(\pi s)\Gamma(s)\Gamma(1-s) is a constant, and so you can pick your favorite value of s to obtain the value of that constant. Anyway, the identity I’ll use is a rearrangement of the one above, namely

\Gamma(s)\sin(\pi s)=\dfrac{\pi}{\Gamma(1-s)}.

Now, in the expression

\Gamma(s)=\displaystyle\int_0^{\infty}e^{-x}x^{s-1}\ dx,

substitute x=nu for some positive n. Messing about with that substitution for a few minutes (and then re-writing u=x by abuse of notation), you can arrive at

\dfrac{\Gamma(s)}{n^s}=\displaystyle\int_0^{\infty} e^{-nx}x^{s-1}\ dx.

That n^s in the denominator is useful for us, as that’s how it shows up in \zeta(s). In particular, we can obtain

\begin{array}{rcl}\Gamma(s)\zeta(s) &=& \displaystyle \Gamma(s)\sum_{n=1}^{\infty}\dfrac{1}{n^s}=\sum_{n=1}^{\infty} \frac{\Gamma(s)}{n^s} \\ &=& \displaystyle \sum_{n=1}^{\infty}\int_0^{\infty}e^{-nx}x^{s-1}\ dx \\ &=& \displaystyle \int_0^{\infty}x^{s-1}\sum_{n=1}^{\infty} e^{-nx}\ dx\\ &=& \displaystyle \int_0^{\infty} \dfrac{x^{s-1}}{e^x-1}\ dx\end{array}

That last transition coming about by summing the geometric series. There are probably some things analysts like to check in here, moving infinite sums and improper integrals past each other… I’ll let them.

Ok, we’re nearly there. Next up, you do some complex line integral and show that

\begin{array}{rcl} \displaystyle \int_{+\infty}^{+\infty} \dfrac{(-x)^s}{e^x-1}\ \dfrac{dx}{x} &=& \displaystyle (e^{i\pi s}-e^{-i\pi s})\int_0^{\infty}\dfrac{x^{s-1}}{e^x-1}\ dx \\ &=& 2i\sin(\pi s)\Gamma(s)\zeta(s)\end{array}.

That integral is a little weird, going “from +\infty to +\infty“. Really, we take a path that “starts at +\infty“, swings around 0, then goes back out to infinity. This is almost certainly one of the complex analysis things I should go back and learn more about. Since we are integrating (-x)^s=e^{s\log(- x)} along positive real values of x, we’re working with logs along the negative real axis. Perhaps I’ll return to this integral in a future post.

Using the identity about \sin(\pi s)\Gamma(s) from above, we obtain

\zeta(s)=\dfrac{\Gamma(1-s)}{2\pi i}\displaystyle \int_{+\infty}^{+\infty}\dfrac{(-x)^s}{e^x-1}\ \dfrac{dx}{x}.

We’re now at a point where we’ve (apparently) got an expression for \zeta(s) that is analytic in the complex plane except for a simple pole at s=1.


November 11, 2009

Yesterday I talked about the function ?(x). If you recall (or rather, if you don’t… if you do recall, you can probably skip to the next paragraph), I defined this function by lining up two infinite binary trees. One tree was obtained by taking mediants, giving us the Stern-Brocot tree, while the other was obtained by taking midpoints, giving us a tree of “dyadic” fractions (denominators are powers of two). To evaluate ?(x), you find x in the Stern-Brocot tree, and then ?(x) is the value in the corresponding position of the dyadic tree.

Instead of thinking about infinite binary trees, let us think about cutting each tree off at some level, n. The pictures from yesterday’s post are, if you ignore the vertical dots, the trees when n=2. In the dyadic tree, we will obtain all fractions m/2^p where 0<m<2^p is odd and 1\leq p\leq n+1. Said another way, we will get all of the values m/2^{n+1} where 0<m<2^{n+1} (not necessarily odd anymore). If you think about it for a few moments, you’ll notice that these points are equally spaced in the interval, each being 1/2^{n+1} away from its neighbors. This is not the case for the Stern-Brocot tree.

In the truncated Stern-Brocot tree, points are scattered through the interval (0,1). We can see that they are scattered in a way that is symmetric around 1/2, but what else can we say? Think about choosing some number, b, of buckets, call them B_0,\ldots,B_{b-1}. Let B_i collect all of the points from the truncated Stern-Brocot tree whose value lies between i/b and (i+1)/b. Of course, I’m being a little loose with my endpoints, but hopefully the idea comes across. If you choose an appropriate number of buckets, depending on the depth of your truncated tree, you can arrange that no point is an endpoint of a bucket’s interval.

Anyway, when you’re done with that, count how many points are in each bucket. This defines a function from the interval [0,1] to non-negative integers, where the value at all points in B_i is the count of the number of Stern-Brocot points in the bucket B_i. Here’s an example graph, obtained by taking the depth 10 Stern-Brocot tree, and putting points into 59 buckets (unless I coded something up wrong :)):

You could next scale this graph vertically by a constant so that the total area of all of the rectangles is 1. That is, you could make the graph that of a probability distribution. Of course, the distribution you get will depend on not ony the depth of the tree, but also how many buckets you choose. Since I’m only vaguely sure what I’m talking about, let me not say much more about that.

A less graphical way to think about all of this is that, for each n, I define two random variables X_n and Y_n. The first, X_n picks a value from all of the nodes in the Stern-Brocot tree of depth n, assuming each is equally likely. The second is basically the same, using the dyadic tree instead. I’m not particularly well-versed in probability, but I guess you are allowed to take the limit of these probability distributions, let’s say X=\lim X_n, and similarly Y=\lim Y_n. This defines two probability functions on the interval [0,1] (well, I suppose technically it probably defines a probability function on the rationals there…).

How are the two distributions related? Well, basically the same way as the trees are related by ?(x). In particular, we have

P(a\leq X\leq b)=P(?(a)\leq Y\leq ?(b))

Since Y is just the uniform distribution (the dyadic points are equally distributed, recall), the probability on the right is just ?(b)-?(a). Let \mu be the graph of the probability function for X, so that the probability on the left is the integral \int_a^b \mu\ dx. We have, then

\displaystyle \int_a^b \mu(x)\ dx=?(b)-?(a)

This looks so much like the fundamental theorem of calculus that we want to say that ?'(x)=\mu(x). You could also read it as saying that ?(x) is the cumulative distribution function of \mu(x). Thinking back to my buckets from earlier, this means that if I define the value over the interval covered by B_i to be the sum of the counts of all the values in all the buckets up to B_i, then I should get a graph that looks like ?(x). You be the judge:

Of course, we probably aren’t exactly allowed to say ?'(x)=\mu(x). Consider the rational s/t, and suppose that ?'(s/t) exists, i.e., that

\displaystyle\lim_{x\to s/t}\dfrac{?(x)-?(s/t)}{x-s/t}

exists. If this two sided limit exists, then we can calculate it by coming at it from either side. Let’s think about x\to s/t^+, that is, from the right. But let’s simplify some more, still. If the one-sided limit exists, then we can calculate it by calculating the limit of the difference quotient for any sequence of values x converging to s/t. The sequence of values, on the right of s/t, that I want to consider are those neighbors, on the right, of s/t, in the Stern-Brocot sequences.

Let’s say that the first neighbor is the right child of s/t in the Stern-Brocot tree, and call it p/q. Then all of the successive neighbors are the successive left children of p/q. These points are all of the points (ms+p)/(mt+q), for positive values m. If we let m\to \infty, these weighted mediants converge to s/t.

So let’s consider the following:


Let’s start simplifying in the denominator. That fraction is (pt-sq)/(t(mt+q)), but since p/q and s/t are neighbors in some Farey sequence, this difference is 1. So the denominator is 1/(t(mt+q)).

In the numerator, we are evaluating ?(x) at weighted mediants of s/t and p/q. The corresponding ?(x) values are then weighted averages. Playing with the formulas a little bit, you can determine that


Combining all of the formulas so far, and simplifying, it turns out that the limit we are after is just

\displaystyle\lim_{m\to\infty}\left(?(p/q)-?(s/t)\right)\cdot t\cdot (mt+q)\cdot 2^{-m}

Of course, that power of two wins the race to infinity, and so this limit is 0.

We have shown then, that if ?'(p/q) exists, then it is 0. Since ?(x) isn’t a constant graph, we better have ?'(x) non-zero on the irrationals. Perhaps I’ll leave that to you.

I should, of course, mention that I learned about this reading the paper “On the Minkowski measure” by Linas Vepstas, available online here (along with a handful of other papers that look fun). Hopefully I haven’t mangled anything too badly.


November 10, 2009

Sorry about the lack of a MaBloWriMo post yesterday. I got a little behind, left resources I wanted at the office, and did some other things.

If you remember, my original goal in this process was to relate the Farey sequences to the Riemann hypothesis. I have some vague understanding that the relationship is based on the distribution of the terms in the Farey sequence, as points in the unit interval. It’s getting on toward time to talk about the Riemann hypothesis side of this picture, so I can start relating the two. But before I do, there’s one more diversion (hopefully it isn’t too far off base) I’d like to make related to Farey sequences. I’ve still got some learning to do about it, but let’s see what I can get down today, laying some foundation for the next day or two, perhaps.

Recall that the way to construct one Farey sequence, F_{n+1}, from the previous one, F_n, is to throw in all of the mediants of successive terms whose denominators aren’t too big. What if we didn’t bother with that last part, and just left in all the mediants? All the numbers we’ll write down are still reduced, and we’ll obtain all of the fractions in [0,1] this way, because they all show up in some Farey sequence, and we’re getting all the terms in all the Farey sequences, just sometimes a little earlier than Farey would.

Just to be clear, our first few sequences now look like this:

0/1\quad 1/1

0/1\quad 1/2\quad 1/1

0/1\quad 1/3\quad 1/2\quad 2/3\quad 1/1

0/1\quad 1/4\quad 1/3\quad 2/5\quad 1/2\quad 3/5\quad 2/3\quad 3/4\quad 1/1

and so on. We see that the n-th sequence in this list contains all of F_n, as well as some other numbers with bigger denominators. A nice property of these sequences, which isn’t shared with the Farey sequences, is that every other term in the n-th sequence is a new term that wasn’t in the n-1-st sequence. So when I go to make the next sequence in this process, I’m always taking the mediant of a “new” number and an “old” number. Moreover, each “new” number will be involved in two mediants… one taken “to the left” (with the “old” number on the left, making a smaller value) and one taken “to the right”.

This language let’s me write down a binary tree (every node has two branches down) as follows: Begin with 1/2 as the root (at the top, I’m throwing out 0/1 and 1/1). This is a new fraction in the second sequence above, so to make the third sequence, I added in the “left mediant” (1/3) and “right mediant” (2/3) from 1/2. To make the sequence after that, 1/3 gives a “left mediant” (1/4) and a “right mediant” (2/5), and similarly 2/3 spawns 3/5 and 3/4. So I build up a tree that looks like this:

\begin{picture}(200,80) \put(85,65){1/2} \put(45,45){1/3}\put(125,45){2/3} \put(25,25){1/4} \put(65,25){2/5} \put(105,25){3/5} \put(145,25){3/4} \put(5,5){\vdots} \put(45,5){\vdots} \put(85,5){\vdots} \put(125,5){\vdots} \put(165,5){\vdots} \put(55,55){\line(2,1){20}} \put(35,35){\line(1,1){10}} \put(125,55){\line(-2,1){20}} \put(65,35){\line(-1,1){10}} \put(115,35){\line(1,1){10}} \put(145,35){\line(-1,1){10}} \end{picture}

I’ll call this the Stern-Brocot tree, though the actual definition my be slightly different.

Another way I could build up a sequence of sequences, and generate another binary tree, is to use midpoints, instead of mediants. If my first sequence is still 0/1 and 1/1, then taking midpoints will always give me denominators that are powers of 2. My tree will now look like:

\begin{picture}(200,80) \put(85,65){1/2} \put(45,45){1/4}\put(125,45){3/4} \put(25,25){1/8} \put(65,25){3/8} \put(105,25){5/8} \put(145,25){7/8} \put(5,5){\vdots} \put(45,5){\vdots} \put(85,5){\vdots} \put(125,5){\vdots} \put(165,5){\vdots} \put(55,55){\line(2,1){20}} \put(35,35){\line(1,1){10}} \put(125,55){\line(-2,1){20}} \put(65,35){\line(-1,1){10}} \put(115,35){\line(1,1){10}} \put(145,35){\line(-1,1){10}} \end{picture}

I could now line up these trees, and determine a function that takes one rational number, finds it’s position in the first tree, and spits out the value in the same position from the second tree. For example, using our pictures above, we see that this function will take 3/5 to 5/8. This function is known as Minkowski’s question mark function, ?(x), and Wikipedia has a graph, so you should click that last link. I’ve only defined it on rationals, but you wave the “continuity” wand to extend my definition to a function on [0,1].

So, to evaluate ?(x) for some rational, you have to find that rational in the Stern-Brocot tree. How do you do that? You know that you don’t have to go further down in the tree than your denominator, but that’s still a lot of fractions to look at, potentially. Perhaps there’s a better way?

I guess one way is to play the over-under game. Start at 1/2. If that’s your fraction, you are done. Otherwise, if your fraction is bigger, move right, and if your fraction is smaller, move left. Repeat until you get to your fraction.

But there’s more math we can say if we think about this process slightly differently.

The point of the tree being a tree is that there is only one way to get from the root, 1/2, to any fraction. Working your way down, you make a sequence of moves, either left (L) or right (R). So you generate some string – LLRLRRRL, for example. Of course, that notation gets out of hand quickly, so we might also use exponents to write this example as L^2RLR^3L. To such a string we can associate a sequence of fractions by writing down whatever fraction we are at when we change direction. Let us write this sequence as p'_1/q'_1,p'_2/q'_2,\ldots, and call these “turning points”. I’ve used primes here, because the numbers I’m actually interested in are one before I make a turn. Let me call that sequence p_1/q_1,p_2/q_2,\ldots, calling these values parents (so p_k/q_k has a line directly down to p'_k/q'_k in the Stern-Brocot tree, connecting the two). So, for example, the sequence L^3R^2L^2 has turning points p'_1/q'_1=1/5, p'_2/q'_2=3/13, and finally p'_3/q'_3=7/31 and associated parents p_1/q_1=1/4, p_2/q_2=2/9, and 5/22.

What is the relationship among successive turning points and parents? Suppose you are at the turning point p'_k/q'_k, with parent p_k/q_k, looking to get to the next parent or turning point, and are going to do this with a sequence of a moves (left or right, it doesn’t actually matter). The next parent will be the (1,a-1) weighted mediant of the turning point p'_k/q'_k with its parent p_k/q_k. That is, p_{k+1}=p'_k+(a-1)p_k, and q_{k+1}=q'_k+(a-1)q_k. Similarly, the next turning point will be the (1,a) weighted mediant of p'_k/q'_k with p_k/q_k. This also means that it is the (1,1) weighted mediant of p_k/q_k with p_{k+1}/q_{k+1}.

Combining all of these relationships, messing with the algebra just slightly, you can determine that p_{k+1}=ap_k+p_{k-1} and q_{k+1}=aq_k+q_{k-1}. You may recall that these are exactly the relations between the convergents of a continued fraction (if you start putting subscripts on your a, writing it as a_{k+1}).

So, given a continued fraction [0;a_1,\ldots,a_n] for a rational x in (0,1), begin at 1/1 (which isn’t in the tree, I know) and move left a_1 times (the first move getting you to 1/2), then move a_2 right, and so on. In your final step, you only move a_n-1 (if you wonder about the “-1”, you might think that the n-th convergent is the fraction you are after, and that is the n-th parent, not the n-th turning point), and you’ll arrive at x. On your way, all of the fractions that you hit, one before a turning point, will be the convergents to x.

That only gets us half-way into evaluating ?(x), for this rational x you’ve picked. You first find its continued fraction and then, by the process above, you’ll see how to get down to x from the top of your tree. That means ?(x) will be the value in our second tree (taking averages, the tree with powers of two in the denominator) that follows the same path. The initial a_1 lefts, from 1/1, will get us to a value of 1/2^{a_1}. Then we are supposed to take a_2 rights. Taking 1 right will get us to 1/2^{a_1}+1/2^{a_1+1}. Another right will then be 1/2^{a_1}+1/2^{a_1+1}+1/2^{a_1+2}, which I’ll write 1/2^{a_1}+3/2^{a_1+2}. Yet another right gets us to 1/2^{a_1}+7/(2^{a_1+3}), and so on, until 1/2^{a_1}+(2^{a_2}-1)/2^{a_1+a_2}. After that, we move left a_3 times. The analysis of where we end up is basically the same, you just have to remember that you are subtracting 1/2^{a_1+a_2+i}s, instead of adding them.

Putting this all together, we obtain (remembering that in the last stage we only went a_n-1 moves.

?([0;a_1,\ldots,a_n])=\dfrac{1}{2^{a_1}}+\dfrac{2^{a_2}-1}{2^{a_1+a_2}}-\dfrac{2^{a_3}-1}{2^{a_1+a_2+a_3}}+\cdots\pm \dfrac{2^{a_n-1}-1}{2^{a_1+\cdots+a_n-1}}

Just as a warning, I may be off slightly in some of my formulas. I haven’t written down anything knowing that it was false, but I’m not convinced I obtain the correct formula at the end of the day (it looks slightly different from the formula at Wikipedia, for instance). If anybody sees an error, please let me know.

That’s probably plenty for today. Up next (I think), I want to take the derivative (even though I can’t) of ?(x), because Wikipedia thinks I’ll get something related to the density of the Farey sequences.

By the way, a paper I was looking at to figure out about these things was “The Minkowski Question Mark, GL(2,\mathbb{Z}) and the Modular Group (Expository)” by Linas Vepstas. The printed copy I have, from some time summer 2009, is incomplete, but it looks to be available online.

The Only LFTs You’ll Ever Need

November 8, 2009

Let us consider linear fractional transformations \frac{az+b}{cz+d} where the coefficients are all integers and ad-bc=1. Since multiplying all of the coefficients by a constant doesn’t change the transformation, we may multiply everything by -1 whenever we want (for instance, to make a chosen coefficient positive). Recall, also, that writing the transformation as a matrix, \left(\begin{smallmatrix}a&b\\ c&d\end{smallmatrix}\right) is not an entirely unreasonable thing to do, as composites of transformations are correctly determined by matrix multiplication. I’ll be talking about sequence of these matrices, so let my original matrix be \left(\begin{smallmatrix}a_0&b_0\\ c_0&d_0\end{smallmatrix}\right)

If c_0=0, then the requirement a_0d_0-b_0c_0=1 means that a_0=d_0=\pm 1, and we may choose a_0=d_0=1. Then our transformation is simply \left(\begin{smallmatrix}1 & b_0 \\ 0 & 1\end{smallmatrix}\right). Since b_0 is an integer, we could write this as \left(\begin{smallmatrix}1 & 1\\ 0 & 1\end{smallmatrix}\right)^{b_0}.

If c_0\neq 0, then there is a q_0\in \mathbb{Z} so that 0\leq a_0-q_0c_0<c_0. I’ll leave you to check the following:

L_0 = \begin{pmatrix}1 & q_0\\ 0 & 1\end{pmatrix}\begin{pmatrix}0 & 1\\ 1 & 0\end{pmatrix}\begin{pmatrix}c_0 & d_0 \\ a_0-q_0c_0 & b_0-q_0d_0\end{pmatrix}

Writing this right-most matrix as L_1=\left(\begin{smallmatrix}a_1 & b_1\\ c_1 & d_1\end{smallmatrix}\right), we are looking at another matrix with integer coefficients whose determinant is 1. So we may repeat the process, finding integers q_k and LFTs L_k with L_k=\left(\begin{smallmatrix}1&q_k\\ 0&1\end{smallmatrix}\right)\left(\begin{smallmatrix}0&1\\ 1&0\end{smallmatrix}\right)L_{k+1}. The process must stop eventually, because c_0>c_1>c_2>\cdots are all non-negative integers. Eventually, one of them will be 0, and we already dealt with transformations like that in the previous paragraph.

Let T=\left(\begin{smallmatrix}1&1\\ 0&1\end{smallmatrix}\right), and note that T^b=\left(\begin{smallmatrix}1&b\\ 0&1\end{smallmatrix}\right), which we’ve used before. Also, let \overline{S}=\left(\begin{smallmatrix}0&1\\1&0\end{smallmatrix}\right) (you might note that \overline{S}^2 is the identity matrix). Then our relation above is L_k=T^{q_k}\overline{S}L_{k+1}. That is,

\begin{array}{rcl}\begin{pmatrix}a&b\\ c&d\end{pmatrix}=L_0 &=& T^{q_0}\overline{S}L_1 \\ &=&T^{q_0}\overline{S}T^{q_1}\overline{S}L_2\\ &=&\cdots=T^{q_0}\overline{S}T^{q_1}\overline{S}\cdots T^{q_k}.\end{array}

The collection of matrices that we have been working with, 2×2 matrices with integer coefficients whose determinant is 1, tend to get called the modular group and we have just shown that T and \overline{S} can be multiplied together to give you any element of the group. More officially, they “generate” the group. You might fuss and say that technically these matrices are really the special linear group SL_2(\mathbb{Z}), and note that the modular group is really the quotient of this group where you say two matrices are the same if one is -1 times the other.

One interesting thing about this process is that it is really just the process you go through to find the greatest common divisor of a_0 and c_0. The standard algorithm (Euclid’s) is to note that, if c_0\neq 0, there is an integer q_0 such that a_0=q_0c_0+c_1 where 0\leq c_1<c_0. The next step is to repeat the process, writing c_0=q_1c_1+c_2, with 0\leq c_2<c_1. Iterating, you find q_k so that c_{k-1}=q_kc_k+c_{k+1}, and eventually a c_n is 0, and the process stops.

You might be a little worried (justifiably) that this process only seems to look at a_0 and c_0, leaving b_0 and d_0 out of the picture. If you look at what we did with the matrices, our process took \left(\begin{smallmatrix}a_0&b_0\\ c_0&d_0\end{smallmatrix}\right) and wrote it as T^{q_0}\overline{S}\left(\begin{smallmatrix}a_1 & b_1\\ c_1 & d_1\end{smallmatrix}\right), and the b_1 and d_1 depended on b_0 and d_0. In the final step of the process, where the c-coefficient is 0, we ended up with a d-coefficient of 1, but the b-coefficient will be a function of the values b_0 and d_0, so those values do show up and get used, eventually. Of course, since a_0d_0-b_0c_0=1, knowing one of b_0 or d_0 is enough to determine the other (assuming you know a_0 and c_0). You should be thinking that the b_i and d_i are the extra variables that run around in Euclid’s alrgorithm when you do the “extended” version.

We’ve been talking about all of this in terms of matrices, but remember that the matrices represent linear fractional transformations. The matrix T is the transformation z+1, and the power T^b is then z+b. These are just horizontal translations. The matrix \overline{S} is the transformation 1/z. With our relations among the L_i above, we see that we could write

\begin{array}{rcl}\dfrac{a_0z+b_0}{c_0z+d_0} &=& T^{q_0}\overline{S}L_1 = q_0 + \frac{1}{L_1} \\ &=& q_0+\frac{1}{T^{q_1}\overline{S}L_2}=q_0+\cfrac{1}{q_1+\cfrac{1}{L_2}} \\ &=& q_0+\cfrac{1}{q_1+\cfrac{1}{\ddots+\cfrac{1}{q_k+z}}}\end{array}

Hurray for continued fractions! Of course, I feel like I set something up backwards. Plugging in z=0 will give me a continued fraction for b_0/d_0, but I was saying that the q_k come from thinking about the greatest common divisor of a_0 and c_0. Ah well.

[Update 20091203: The error I’m concerned about above stems from some loose dealings with the end of the iterative procedure. Letting L_i=\left(\begin{smallmatrix}a_i & b_i\\ c_i & d_i\end{smallmatrix}\right), we will get to something like L_n=T^{q_n}\overline{S}L_{n+1} where L_{n+1}=\left(\begin{smallmatrix}1 & b_{n+1} \\ 0 & 1\end{smallmatrix}\right), i.e. L_{n+1}=T^{b_{n+1}}. So we end up saying L_0=T^{q_0}\overline{S}T^{q_1}\overline{S}\cdots T^{q_n}\overline{S}T^{b_{n+1}}. And that b_{n+1} does depend on the initial b_0, presumably in a manner such that plugging in z=0 makes the fraction make sense.]

What I’ve said is that every 2×2 matrix with coefficients in \mathbb{Z} and determinant 1 can be written in terms of T and \overline{S}. However, you might want to use S=\left(\begin{smallmatrix}0&-1\\ 1&0\end{smallmatrix}\right)=-1/z instead of \overline{S}. Going through essentially the same process as before, with some minus signs sprinkled in appropriately, one can determine that T and S can be used, instead of T and \overline{S}, to generate any of the matrices we are considering. What is the advantage of S over \overline{S}? One way to state the advantage, for our story, is that applying S to points in the upper half plane leaves them in the upper half plane (and similaly for the lower half), whereas \overline{S} flips points in upper half plane to the lower half plane. We should think of this an advantage, because all of our Ford circles lie in the upper half plane. If you go back to yesterday’s discussion, you’ll note that I used S in the factorization.

So, anyway, T and S are the only LFTs you need. You can quickly check that S^2=1 and also (not quite as quickly) that (TS)^3=1. If you write U=TS, then I guess you get to say that the modular group is generated by S and U, and can establish an isomorphism between the modular group and the free product of the cyclic group of order 2 with the cyclic group of order 3. That’s something.

LFTs and Ford Circles

November 7, 2009

Given 4 complex numbers, a,b,c,d, we may consider the linear fractional transformation (LFT)


Well, 4 numbers are enough to make a matrix, \left(\begin{smallmatrix} a & b\\c & d\end{smallmatrix}\right). Is there any better reason to relate the linear fractional transformation with this matrix?

Suppose you have two matrices, \left(\begin{smallmatrix}a&b \\ c&d\end{smallmatrix}\right) and \left(\begin{smallmatrix}a'&b'\\c'&d'\end{smallmatrix}\right). Then the product is as follows:

\begin{pmatrix}a&b\\c&d\end{pmatrix}\begin{pmatrix}a'&b'\\c'&d'\end{pmatrix}=\begin{pmatrix}aa'+bc' & ab'+bd' \\ ca'+dc' & cb'+dd'\end{pmatrix}.

If you take the composite of the two linear fractional transformations, i.e.,

\dfrac{a\cdot \frac{a'z+b'}{c'z+d'}+b}{c\cdot \frac{a'z+b'}{c'z+d'}+d}

and then play around simplifying that expression for a few minutes, you obtain the LFT


which is precisely the LFT corresponding to the product matrix above. So, if nothing else, writing LFTs as matrices this way won’t lead us astray when thinking about composites.

This idea is not without its confusion, for me anyway. Generally when you think about a 2×2 matrix of complex values, you are thinking about that matrix as a linear map \mathbb{C}^2\to\mathbb{C}^2, which is not what we are doing above. Instead, I guess we are saying that the “monoid” (group without inverses) of 2×2 matrices, M_2(\mathbb{C}), acts (in the technical sense) on \mathbb{C} as linear fractional transformations. My guess is that there are even better ways to say what is going on.

I think it is also important to keep in mind that two different matrices may correspond to the same LFT. For example, \left(\begin{smallmatrix}1&2\\ 0&2\end{smallmatrix}\right) represents the same LFT as \left(\begin{smallmatrix} 1/2 & 1\\ 0 & 1\end{smallmatrix}\right). More generally, if \lambda is any complex value (nonzero), then \left(\begin{smallmatrix}a&b\\c&d\end{smallmatrix}\right) represents the same LFT as \left(\begin{smallmatrix}\lambda a&\lambda b\\ \lambda c & \lambda d\end{smallmatrix}\right). I guess one can think of M_2(\mathbb{C}) as a \mathbb{C}-vector space (isomorphic to \mathbb{C}^4), and then think of its projective space (the quotient where two “vectors” (matrices here) are the same when they differ by a scalar (complex) multiple), which I’ll denote P(M_2(\mathbb{C})). Then I think I’m saying that the action of M_2(\mathbb{C}) on \mathbb{C} actually is an action of the quotient, P(M_2(\mathbb{C})). I’m not sure if this is a useful viewpoint (or, indeed, correct).

Yesterday, when I was talking about how to picture what an LFT does to \mathbb{C}, I wrote down a factorization of the LFT as a composite. Our new notation gives us another way to write that factorization (recall \alpha=-(ad-bc)/c^2, and that we had assumed c\neq 0):

\begin{pmatrix}a&b\\c&d\end{pmatrix}=\begin{pmatrix}\alpha & a/c\\ 0&1\end{pmatrix}\begin{pmatrix}0&1\\ 1&0\end{pmatrix}\begin{pmatrix}1&d/c\\ 0&1\end{pmatrix}.

As is frequently useful, we will assume that ad-bc\neq 0 (indeed, this factorization seems to require it – I think I’m missing something somewhere, anybody see it?). Notice that ad-bc is the determinant of the matrix representing our LFT. We may then multiply all entries in our matrix (without changing the LFT, as discussed above) by 1/(ad-bc), and obtain a matrix with determinant 1. Let’s do that, making \alpha=-1/c^2.

Yesterday, when I was working on the factorization above, I only had something like \epsilon idea where I was going. I think today I’ve got about twice that, so I want to re-write the factorization. Let me write it as

\begin{pmatrix}1 & a/c\\ 0 & 1\end{pmatrix}\begin{pmatrix}1&0 \\ 0& c^2\end{pmatrix}\begin{pmatrix}0 & -1\\ 1 & 0\end{pmatrix}\begin{pmatrix}1 & d/c\\ 0 & 1\end{pmatrix}.

So what’s the connection with Ford circles? Recall that for a reduced fraction h/k, the associated Ford circle is the circle centered at (h/k,1/(2k^2)) with radius 1/(2k^2). Following Rademacher (and, presumably, others), let us say that the “fraction” 1/0 also gets a Ford “circle”, the line y=1 in the plane. This isn’t such a nasty thing to do, as it has the tangency properties I talked about when talking about Ford circles. Anyway, let us think about applying our transformation \left(\begin{smallmatrix}a&b\\c&d\end{smallmatrix}\right), as the composite given above, and see what happens to this line y=1. We’ll assume that a,b,c, and d are all integers.

The first step, \left(\begin{smallmatrix}1&d/c\\ 0 &1\end{smallmatrix}\right) is the linear translation z+d/c. Since d/c is real (since c,d are integers), this translation is a horizontal shift, which leaves y=1 unchanged.

Next up, \left(\begin{smallmatrix}0&-1\\1&0\end{smallmatrix}\right), which is -1/z. Thinking of a point on the line y=1, you can quickly determine that its polar coordinates are (\csc \theta,\theta). The transformation -1/z is the composite of: (1) inversion with respect to the unit circle (the point becomes (\sin \theta,\theta)), (2) reflection across the horizontal axis (giving (\sin \theta,-\theta)), and finally (3) multiplication by -1 (giving (-\sin \theta,-\theta), since these are polar coordinates). This final point is (\sin \theta,\pi-\theta)=(\sin \pi-\theta,\pi-\theta). As \theta varies through (0,\pi), \pi-\theta also varies through this interval, and so we get the graph of the polar curve r=\sin \theta. If you know your polar curves, you know what this looks like…

So, the first two transformations take the line y=1 to the circle with center (0,1/2) and radius 1/2. The next in our composite is multiplication by 1/c^2, which is just a scaling (since c\in \mathbb{R}). This scaling takes our circle to the circle with center (0,1/(2c^2)) and radius 1/(2c^2). Finally, the last transformation is another horizontal translation, leaving our circle centered at (a/c,1/(2c^2)). We recognize this as the Ford circle for the fraction a/c (as long as that fraction is reduced).

Wasn’t that fun? If you want to think about it some more, you might convince yourself that any point above the line y=1 will get moved to a point inside the Farey circle resulting from this process.

Anyway, enough out of me. Hopefully tomorrow I’ll have slightly more of an idea what I’m talking about. Don’t count on it though.

Linear Fractional Transformations

November 6, 2009

a.k.a. Möbius Transformations, are a type of function. I’ll talk about them as functions from the complex plane to itself. Such functions are given by a formula


where a,b,c,d are complex values. If c and d are both 0, this isn’t much of a function, so we’ll assume at least one isn’t 0.

I’d like to talk about what these functions do, how to have some hope of picturing them as transformations \mathbb{C}\to\mathbb{C}. To do this, let’s consider some easy cases first.

If c=0 (and so by assumption d\neq 0), then we may write the function \frac{a}{d}z+\frac{b}{d}, or simply a'z+b' for some complex values a',b'. This is now a linear (some might say affine) transformation of the complex plane. Think about it as the composite z\mapsto a'z\mapsto a'z+b, where the first map multiplies by a', and the second adds b'. Multiplying by a complex value a' is the same as scaling by the real value |a'| (the “norm” of a', distance from a' to the origin) and then rotating by the “argument” of a'. If you think about a' as a point (r,\theta) in polar coordinates, then the argument of a' is \theta (or so), and so multiplication by a' is multiplication by the real value r (which is just a stretching (or shrinking) of the complex plane away from (toward) the origin if r>1 (resp. 0\leq r<1)) and then rotation by the angle \theta. The second transformation in the composite, “add b'“, just shifts the entire plane (as a “rigid transformation”) in the direction of b'.

So the case when c=0 is just a linear transformation, which aren’t too difficult to picture. Another important case is 1/z, so the coefficients are a=0,b=1,c=1,d=0. To talk about what this does, let’s first talk about “inversion” with respect to a fixed circle.

Let C be a circle with radius r, in the plane, and z any point in the plane. Let O denote the center of the circle and d the distance from O to z. The inversion of z, with respect to C, is the point on the line through O and z (in the direction of z from O) whose distance from O is d'=r^2/d. This means that points near the center of C are sent far away, and vice versa. Points on C are unchanged. Technically I guess we should say that this function isn’t defined at O, but people like to say it is taken to “the point at infinity” and, conversely, that inversion takes \infty to O. These things can be made precise.

You might notice that doing inversion twice in a row gets you right back where you started. It also turns out that If C' is another circle in the plane, not passing through O, then the inversion of all of its points is another circle. If C' passes through O, then the inversion works out to be a line. Since doing inversion twice is the identity, inversion takes lines to circles through O. If you’re thinking about the comments about \infty above, this makes sense because every line “goes to \infty“, and so the inversion of a line will go through the inversion of \infty, which I said should be O.

All of this talk about inversion was to describe the function 1/z. This function is the composite of inversion with respect to the unit circle centered at the origin followed by a reflection across the horizontal axis (real line). Don’t believe me? The equation d'=r^2/d defining the relationship between distances when doing the inversion can be re-written as dd'=r^2. If we’re doing inversion with respect to a unit circle, then dd'=1. This means that when we multiply z with its inversion with respect to the unit circle, call it z', the result will be a point with norm 1 (i.e., a point on the unit circle). Next up, multiplying z by z' produces a point whose angle from the positive real axis (which I called the argument before, the \theta from polar coordinates) is the sum of the angles for z and z'. Since we did the reflection across the horizontal axis, the argument for z' is precisely the negative of the argument for z, meaning their sum (the argument of their product) is 0. So zz' is a point on the unit circle making an angle of 0 with the positive real line, i.e., zz'=1. That makes z'=1/z, as promised.

Let’s get back to the general setup, with the function


and let’s assume c\neq 0 (since we already handled the case c=0, it’s just a linear transformation). For some notational convenience, let me let \alpha=-(ad-bc)/c^2. Consider the following composite:

\begin{array}{rcl} z & \xrightarrow{w\mapsto w+\frac{d}{c}} & z+\dfrac{d}{c} \\ {} & \xrightarrow{w\mapsto \frac{1}{w}} & \dfrac{c}{cz+d} \\ {} & \xrightarrow{w\mapsto \alpha w+\frac{a}{c}} & \dfrac{\alpha c}{cz+d}+\dfrac{a}{c} \end{array}

If you check all of these steps, and then play around simplifying the final expression, then you obtain the original formula above. So we can think of any linear fractional transformation as a composite of some linear functions and an inversion, and we know how to picture all of those steps.

That’s maybe enough for today. It’s certainly enough for me for today. Before I go, I’ll leave you with a video that might be helpful, and is pretty either way.