A Favorite Theorem

November 30, 2009

The other day I was hanging out with some math folk, and one asked the group what our favorite theorems are. A tough question, clearly, since there are soo many fantastic theorems to choose from. For me, there’s a place in my heart for slightly esoteric theorems. They should be surprising and seem to come from basically nowhere (unless you’ve been looking at the subject, perhaps). They should be fairly easy to state… possibly with a few minutes introduction of some easy definitions or results. And I seem to have taken up thinking that continued fractions are pretty cool. With that in mind, I present one of my favorite theorems:

Theorem: There is a constant, K, such that for almost all \alpha in (0,1), if the continued fraction representation of \alpha is [0;a_1,a_2,\ldots], then \sqrt[n]{a_1\cdots a_n}\to K as n\to\infty.

It turns out the constant, dubbed Khinchin’s constant, is K\approx 2.685, according to Wikipedia anyway. An exact expression for K is

\displaystyle K=\prod_{r=1}^{\infty}\left(1+\frac{1}{r(r+2)}\right)^{\log_2(r)}.

I’d like to try to give some very brief background on this theorem, along with the “few minutes introduction of some easy definitions and results”.

I should mention, first, that I take Khinchin’s book “Continued Fractions” as my primary reference. Hardy and Wright also have some chapters on continued fractions. And, of course, there’s Wikipedia.

Let’s begin with the process of constructing the continued fraction for an \alpha\in (0,1). I’ll take \alpha irrational, for convenience. Since 0<\alpha<1, we have 1<1/\alpha, and we let a_1 be the largest integer less than 1/\alpha, denoted \lfloor 1/\alpha\rfloor, and we know that a_1\geq 1. That leaves a little bit over, which I’ll denote z_1=(1/\alpha)-a_1\in (0,1). We have \alpha=1/(a_1+z_1), the first step in our continued fraction. You now iterate this process, looking at 1/z_1, setting a_2=\lfloor 1/z_1\rfloor and z_2=(1/z_1)-a_2, and obtaining

\displaystyle \alpha=\cfrac{1}{a_1+\cfrac{1}{a_2+z_2}}.

Since I picked \alpha irrational, this process keeps going – you keep getting a_n (all positive integers) and z_n (all in (0,1)). And then instead of writing huge nested fractions, you trim it down to \alpha=[0;a_1,a_2,\ldots]. That initial zero is just becuase I started with \alpha\in (0,1), you could instead let a_0=\lfloor \alpha'\rfloor, if you started with an \alpha' outside (0,1). I’ll continue assuming my values are in (0,1), so I’ll frequently just write [a_1,a_2,\ldots], dropping the initial zero.

This process gives us, for every \alpha\in(0,1), a sequence of values a_1,a_2,a_3,\ldots. Restated, we have functions a_n (and z_n as well) from (0,1) to positive integers. To start getting at Khinchin’s constant, you think about combinations of a_n^{-1}(k), for collections of ns and ks. That is, given n_1,n_2,\ldots,n_s and k_1,k_2,\ldots,k_s, all positive integers, what does the set of all \alpha=[a_1,a_2,\ldots] such that a_{n_i}=k_i for i=1,\ldots,s look like?

Let’s start with a_1, since it is the easiest. What does a_1^{-1}(k) look like? Well, a_1(\alpha)=k means that \lfloor 1/\alpha\rfloor=k. Which is to say, k\leq 1/\alpha<k+1, or 1/(k+1)<\alpha\leq 1/k. So the graph of a_1(x) is a step function, taking the value k on the interval (1/(k+1),1/k]. I picture this as a staircase descending as you move to the right. Of course, the stairs are pretty narrow on the left…

As an example, a_1^{-1}(3) is (1/4,1/3], the third stair from the right.

Now, on to a_2. What does a_2^{-1}(k) look like? Well, in finding the continued fraction for \alpha, we find first \alpha=1/(a_1+z_1), and if a_2(\alpha)=k then \lfloor 1/z_1\rfloor =k, or 1/(k+1)<z_1<1/k. So we find that for any a_1, all values between 1/(a_1+1/k) and 1/(a_1+1/(k+1)) have a_2=k. That is, in each of the “stairs” from the discussion of a_1, the graph of a_2 has another staircase, this time ascending as you move from left to right.

For example, a_2^{-1}(2) comprises, in the ascending staircases in each interval (1/(k+1),1/k], the second stair from the left. It is an infinite collection of intervals. If you also ask about those where a_1 is some constant, you pick out a single subinterval.

So we’ve got at least a little idea about how these functions a_n work. Given n_1,\ldots,n_s and k_1,\ldots,k_s as above, the set of all \alpha with a_{n_i}=k_i will be an infinite collection of subintervals of (0,1). Next you could ask, what is the measure (sum of lengths of intervals) of this set? I’ll mostly talk about the case when s=1, n_1=n, and k_1=k. I’ll denote the set of appropriate \alpha by E\binom{n}{k}, and the measure of that set by \mathcal{M}\binom{n}{k}.

Apparently Gauss had a go at questions like this. He defined m_n(x) to be the measure of the set of all \alpha such that z_n(\alpha)<x, and found that

\displaystyle \lim_{n\to\infty} m_n(x)=\frac{\ln(1+x)}{\ln 2}.

Crazy (brilliant) old Gauss (who apparently had an awesome signature). Khinchin suggests that Gauss probably arrived at this result based on a recurrence relation (functional equation, if you’d rather) among the m_i. I’ll let you read about it in Khinchin’s book. Anyway, in 1928, Kuz’min came along and found a generalization. Kuz’min’s theorem gives a more precise result than Gauss’ (at least, as stated above). Namely:

Theorem: There are positive constants, A and \lambda, such that

\displaystyle \left|m_n'(x)-\frac{1}{(1+x)\ln 2}\right|<Ae^{-\lambda\sqrt{n}}.

On integrating and taking limits, we obtain Gauss’ result. Alternatively, using the relation

\displaystyle \mathcal{M}\binom{n}{k}=m_{n-1}(\tfrac{1}{k})-m_{n-1}(\tfrac{1}{k+1})=\int_{1/(k+1)}^{1/k}m_{n-1}'(x)\ dx,

one can also obtain

\displaystyle \left|\mathcal{M}\binom{n}{k}-\log_2\left(1+\frac{1}{k(k+2)}\right)\right| < \frac{A}{k(k+1)}e^{-\lambda\sqrt{n-1}}.

Or, taking limits,

\displaystyle \lim_{n\to\infty} \mathcal{M}\binom{n}{k}=\log_2\left(1+\frac{1}{k(k+2)}\right).

This, by itself, seems pretty interesting. I find it a hard limit to interpret. After all, E\binom{n}{k} and E\binom{n+1}{k} have nothing to do with eachother, so the measure that we are calculating is of a completely different set with each n. If, instead, we were taking measures of a sequence of nested subsets, or something, I feel like I could get a better handle on this theorem. But we’re not. If anybody has a nice interpretation of this limit, I’d love to hear it.

Anyway, it’s about time for Khinchin’s theorem:

Theorem: If f is a function from positive integers to positive reals such that f(r)<C\cdot r^{\frac{1}{2}-\delta} for some positive constants C and \delta, then for almost every \alpha=[a_1,a_2,\ldots] in (0,1),

\displaystyle \lim_{n\to\infty}\frac{1}{n}\sum_{k=1}^{\infty}f(a_k)=\sum_{r=1}^{\infty}f(r)\log_2\left(1+\frac{1}{r(r+2)}\right).

Khinchin’s constant is obtained from the case when f(r)=\ln(r), simply messing with the equality using log and exponent rules.

Another interesting case is when you take f(r)=1 if r=\ell and 0 otherwise. Then the limit on the left of the equality in Khinchin’s theorem should be interpreted as the density of \ell among the values of the a_n in the continued fraction for \alpha. The expression on the right in the equality simplifies to a single term, instead of an infinite series. For example, with \ell=1 we find that approximately 42% of the terms (Khinchin calls them “elements”) in the continued fraction for just about any \alpha you pick will be 1. Khinchin summarizes the result:

“Thus, an arbitrary natural number occurs as an element in the expansion of almost all numbers with equal average frequency.”

Pretty awesome. Pick a positive integer. Pick any \alpha you want (pick several!). Find all of the elements in the continued fraction for \alpha, and the percentage that are your chosen integer (it will be a positive percentage!). You’ll get the same percentage for all of the \alpha you pick. Unless, of course, you have terrible luck (or are malicious) in picking \alpha. When you’re done with all of that, you might check, as a fun little algebra exercise, that the sum of all of the expected densities, over all choices of \ell, is 1, as it should be.

There are a few other interesting results like this. For some reason I have a harder time remembering them, and so they didn’t get to be my favorite theorem. But they’re just as cool, really.

For the first, I should remind you that the n-th convergent for a continued fraction [a_1,\ldots] is the fraction [a_1,\ldots,a_n]. As this is a finite continued fraction, it represents a rational number, and it is typically denoted p_n/q_n (relatively prime). The growth of the denominators in the sequence of convergents in a continued fraction is a fairly worthwhile question. Using some elementary relationships among the a_i and q_i, you can find that a_1\cdots a_n<q_n<2^na_1\cdots a_n. The interesting theorem I have in mind is another “there is a constant” theorem:

Theorem: For almost all \alpha=[a_1,\ldots],

\displaystyle \lim_{n\to \infty}\sqrt[n]{q_n}=e^{\pi^2/(12\ln 2)}.

This constant gets called the Lévy-Khinchin constant on Wikipedia.

Another interesting result, which I stumbled across while preparing a talk about some of these things, is called Lochs’ theorem. I know even less about this than the previous theorems (it isn’t in Khinchin’s book), but apparently it basically says that each time you find a new a_n, in the process of finding the continued fraction for \alpha, the convergent has one better decimal place of accuracy than the previous convergent. So continued fractions are just as expressive as decimals. They just don’t add together quite so easily :)

So anyway, what’s your favorite theorem?

Finale!

November 27, 2009

I think I’m about ready to tie together Farey sequences and the Riemann zeta function (specifically, the Riemann hypothesis). I know there were lots of gaps along the way, and lots more to explore from here, so that all gets to be the dénouement, if/when I get to it.

Yesterday we ended with the identity

\displaystyle \frac{1}{\zeta(s)}=\sum_{n}\frac{\mu(n)}{n^s}.

Define M(x)=\sum_{n\leq x}\mu(n), which you might call the cumulative Möbius function. If you are comfortable with your measure theory, you can decide that there is a measure, which I’ll denote dM, such that \int_a^b dM=M(b)-M(a). And if you’re ok with that statement, you probably also don’t mind writing the identity for the reciprocal of zeta, above, as

\displaystyle \frac{1}{\zeta(s)}=\int_0^{\infty} x^{-s}\ dM.

If you then integrate by parts you find that

\displaystyle \frac{1}{\zeta(s)}=s\int_0^{\infty}x^{-s-1}M(x)\ dx.

Now if M(x)\leq x^a (we could say something more like M(x)=O(x^a)) for some positive real value a, then this most recent integral will converge for \text{Re }s>a. Which means that \zeta(s) would have no zeroes in the half-plane \text{Re }s>a. Since there are zeroes on the critical line, where \text{Re }s=1/2, we know that no value of a less than 1/2 will work. But if you can show that M(x)\leq x^{(1/2)+\epsilon} for every positive \epsilon, then you’ll have proven the Riemann hypothesis (and earned yourself a cool million).

Ok, so now we just have to relate the Farey sequences to this M(x) somehow.

It’s been a while, so perhaps we should recall some basics about the Farey sequences. The n-th Farey sequence, denoted F_n consists of the reduced fractions p/q in [0,1] with q\leq n, ordered as usual. For today, let’s actually remove 0 from the sequences. Thinking about how many terms are in F_n, we quickly determine that there are \sum_{i\leq n}\phi(i), where \phi(i) is the number of positive integers less than i that are relatively prime to i (with \phi(1)=1). Let me call this sum \Phi(n), so that F_n has \Phi(n) terms.

To keep my notation under control, fix an n. Then for 1\leq v\leq \Phi(n), let r_v be the v-th term in the Farey sequence F_n. The terms in F_n are not equally spaced in [0,1], so let us also consider the sequence of \Phi(n) equidistant points in [0,1]. These will be the values s_v=v/\Phi(n). We’re interested in how far the Farey sequence is from this equidistant sequence, so let’s set \delta_v=r_v-s_v.

If f:[0,1]\to \mathbb{R}, then you can show (essentially by Möbius inversion) that

\displaystyle \sum_{v=1}^{\Phi(n)}f(r_v)=\sum_{k=1}^{\infty}\sum_{j=1}^{k} f(j/k)M(n/k).

The idea is that the function D(m) that is 1 for m>1 and 0 for m<1 is also

D(m)=\sum_n M(m/n),

because you can write (fairly cleverly, I feel)

M(m)=\sum_n \mu(n)D(m/n).

By the way, I’m being a bit loose with my values of these stepwise things when they actually make their steps. Apparently the convention is to define the value at the change to be the midpoint of the values on either side. I feel like it’s a technicality I’m not hugely interested in right now.

Now we apply this identity to f(u)=e^{2\pi iu}, obtaining

\displaystyle \sum_v e^{2\pi ir_v}=\sum_k M(n/k)\sum_j f(j/k).

That inner sum on the right-hand side is the sum of k unit vectors equally distributed around the unit circle, and so is 0 except when k=1. So we obtain

M(n)=\sum_v e^{2\pi ir_v}.

Finally, replace r_v with s_v+\delta_v. After shuffling some symbols around, you can write

M(n)=\sum_v e^{2\pi s_v}(e^{2\pi i\delta_v}-1)+\sum_v e^{2\pi is_v}.

Taking absolute values on each side, using the triangle inequality, and the identity we already used about sums of equally spaced unit vectors, we can write

\begin{array}{l}|M(n)|\leq \displaystyle \sum_v \left|e^{2\pi i\delta_v}-1\right| =\sum_v \left| e^{\pi i\delta_v}-e^{-\pi i\delta_v}\right|=2\sum_v \left|\sin \pi\delta_v\right|\\ \displaystyle \qquad \leq 2\pi \sum_v \left|\delta_v\right|.\end{array}

Using our lemma from earlier, about how to obtain the Riemann hypothesis from M, we have, at last:

If \sum_v |\delta_v| is O(n^{(1/2)+\epsilon}) for every \epsilon>0, then the Riemann hypothesis is true.

Hurray! The result that encouraged me to wander down this path!

So, that was fun. It turns out that both of today’s results that imply the Riemann hypothesis are actually equivalent to the Riemann hypothesis, but perhaps that’s a topic for another day…

[Update 20091203: Noticed I had some absolute value bars in the wrong places in the line before the final theorem, and corrected them (hopefully)]

Möbius Inversion

November 25, 2009

Yesterday I mentioned Möbius inversion, and I decided that it was worth looking at a little bit more. First of all, it’s pretty fun. Secondly, it’ll give us a useful identity concerning \zeta(s).

Before I get too far, I should add another reference to my growing list. With Edwards’ book, and Hardy and Wright, include Apostol’s “Introduction to Analytic Number Theory.”

Anyway, yesterday I defined the Möbius function

\displaystyle \mu(n)=\begin{cases}1 & n=1 \\ 0 & n\text{ not squarefree} \\ (-1)^k & n=p_1\cdots p_k\text{ distinct primes}\end{cases}

and stated Möbius inversion as follows:

Thm: If f(n)=\sum\limits_{d|n}g(d) then g(n)=\sum\limits_{d|n}\mu(d)f(n).

Let’s begin today with a proof of this statement. Pf: Suppose f(n) is as stated, and consider \sum\limits_{d|n}\mu(d)f(n). By assumption on f, this is \sum\limits_{d|n}\mu(d)\sum\limits_{d'|(n/d)} g(d') which I’ll write as \sum\limits_{dd'|n}\mu(d)g(d'). Now in this sum, what is the coefficient of g(m)? Well, it is \sum\limits_{dm|n}\mu(d)=\sum\limits_{d|(n/m)}\mu(d). We saw yesterday that this sum is 0 unless n/m=1, i.e., unless m=n. So \sum\limits_{d|n}\mu(d)f(n)=g(n), as claimed.

That’s pretty fun, I think.

There’s another way to go about these things, and it is by way of what are called Dirichlet series (according to Wikipedia, Dirichlet had many names). Suppose f is a function defined on the positive integers (such a function seems to sometimes get called an arithmetic function). I’ll say that the Dirichlet series for f is the series F(s)=\sum_n f(n)n^{-s}. I’ve written it this way, so that I have a function defined whenever the series converges, but I’ll pretty much ignore convergence aspects in what follows.

Suppose F(s) and G(s) are the Dirichlet series associated to some f,g, respectively. What does the product of these two series look like? Well, since f(n)n^{-s}\cdot g(m)m^{-s}=f(n)g(m)(nm)^{-s}, you can determine that the product of these two Dirichlet series is another Dirichlet series, this one associated to the function c(n)=\sum\limits_{d|n}f(d)g(n/d). Indeed, just ask yourself, “what is the coefficient of n^{-s} in the product of the series F(s) and G(s)?”

By the way, I’ve called the product series c because it also gets called the convolution of f and g. The convolution of f and g is denoted f*g, and is simply the function (f*g)(n)=\sum\limits_{d|n}f(d)g(n/d).

Now you’ve got an operation, *, on arithmetic functions, and you can start asking what properties it has. It turns out that * gives the set of arithmetic functions (well, those functions that aren’t 0 on the input 1) the structure of a commutative group. The identity is the function I with I(1)=1 and I(n)=0 if n>1 (corresponding to the identity Dirichlet series, the constant function 1). It’s sort of a fun little induction problem to find a formula for the inverse of a given arithmetic series. We won’t need the formula exactly, so perhaps I’ll leave it for you.

Let z(n)=1, so that the associated Dirichlet series is \zeta(s).

We now have another language in which to state Möbius inversion:

Thm: If f=g*z then g=\mu*f.

In this statement, let g=I (the identity of convolution) so that f=z. Then we obtain I=\mu *z. On taking Dirichlet series, we find that

\displaystyle 1=\left(\sum_n \frac{\mu(n)}{n^s}\right)\zeta(s),

or, in other words,

\displaystyle \frac{1}{\zeta(s)}=\sum_n \frac{\mu(n)}{n^s}.

How’s that for an identity?

There’s another way to get at this identity, without the language of Dirichlet series (explicitly anyway), which I wrote “holy crap” next to in the book (Hardy and Wright) when I read it. So probably I should share. Using Euler’s product (recall the indexing is over primes)

\displaystyle \zeta(s)=\prod_p (1-p^{-s})^{-1}

we have

\displaystyle \frac{1}{\zeta(s)}=\prod_p (1-p^{-s}).

Here’s the part that rocked me: Look at 1-p^{-s} and write it as 1-p^{-s}+0p^{-2s}+0p^{-3s}+\cdots, i.e. as \sum_k \mu(p^k)p^{-ks}. Because the first thing I think to do when I see a binomial is to write it as an infinite series of mostly 0s. Like I said yesterday, I guess it helps if you know where you are going. So anyway, we’ve got

\displaystyle \frac{1}{\zeta(s)}=\prod_p \sum_k \frac{\mu(p^k)}{(p^k)^s}.

Now, like we’ve been doing, let’s ask, “what is the coefficient of n^{-s} when the product is all written out?” Well, the only way to get n^{-s} is the term corresponding to the prime factorization of n. If n=p_1^{e_1}\cdots p_k^{e_k}, then the coefficient is \mu(p_1^{e_1})\cdots \mu(p_k^{e_k}). But \mu is “multiplicative,” meaning that if a and b are relatively prime then \mu(a)\mu(b)=\mu(ab). So the coefficient of n^{-s} is just \mu(p_1^{e_1}\cdots p_k^{e_k})=\mu(n). This proves the identity.

That’s the identity we’ll use in the next post or two to finally tie together Farey sequences and the zeta function, believe it or not. So I’ll basically stop here. But I do want to mention that you can do some of the things we’ve done above in slightly more general contexts, and I remember really enjoying reading about them. Namely, you can work in other posets. In what we’ve been doing above, there has, somewhere in the background, been the poset of positive integers ordered by n\leq m iff n|m. Probably a good place to start reading about the generalization is Wikipedia, as usual. If I find, sometime, the paper I read where I first saw these things, I’ll update this post.

In the mean time…

A Formula for Primes

November 24, 2009

I wanted to write this post yesterday, but I couldn’t work out some of the steps that I wanted to be able to work out, so I had to wait a day and try again. But I think I’ve got it now, let’s see…

We’ve been working with the function

\displaystyle J(x)=\sum_{p^n\leq x}\frac{1}{n}.

Last time, we arrived (skipping over some rather large gaps) at an analytic formula for J(x), namely:

\begin{array}{l}\displaystyle J(x)=Li(x)-\sum_{\text{Im }\rho>0}\left(Li(x^{\rho})+Li(x^{1-\rho})\right)\\ \displaystyle\qquad\qquad +\int_x^{\infty}\frac{dt}{t(t^2-1)\ln t}-\ln 2,\end{array}

where (mod issues)

\displaystyle Li(x)=\int_0^x \frac{dt}{\ln t}.

The goal today is to use this formula for J(x) to find a formula for primes. In particular, we will find a formula for \pi(x), which is the number of primes no larger than x. These two functions are related, as follows: recall that J(x) is a step function which jumps by 1/n at every prime power p^n. Pick an x_0. How many jumps of size 1 does J(x) make before getting to x_0? Well, it jumps by 1 at the primes, so it makes \pi(x_0) jumps of size 1. How many jumps of size 1/2? These are jumps at prime squares, p^2\leq x_0. This inequality is the same as p\leq x_0^{1/2}, so we make \pi(x_0^{1/2}) jumps of size 1/2. Similarly, we’ll make \pi(x_0^{1/n}) jumps of size 1/n to calculate J(x_0). Eventually (for some large enough N) x_0^{1/N} will be less than 2, and so \pi(x_0^{1/N})=0, and we can stop looking for jumps.

Translating the above into symbols, we have just decided that

J(x)=\pi(x)+\frac{1}{2}\pi(x^{1/2})+\frac{1}{3}\pi(x^{1/3})+\cdots+\frac{1}{n}\pi(x^{1/n})+\cdots.

I’ve written it as an infinite sum, but eventually (like I said) those \pi values will be 0, so we’ve really got a finite sum.

Now wait, our goal was a formula for \pi(x), in terms of J(x), and I’ve got a formula the other way, J(x) in terms of \pi(x).

This is the part I got stuck on yesterday. Edwards’ book says (or perhaps doesn’t, causing my problems) that basically you do Möbius inversion to the sum above, and obtain a formula for \pi(x) in terms of J(x). He says slightly more than that, but I wasn’t able to piece it together as easily as he seemed to indicate it should work. But anyway, what’s this Möbius inversion thing? I’ll take a pretty low-level account, leaving lots of room for things to be jazzed up (which I may return to and do sometime soonish).

An integer is said to be “squarefree” if it factors into the product of distinct primes or, equivalently, it is not divisible by a square. Define the following function, called the Möbius function, on positive integers (p_i is supposed to indicate a prime):

\displaystyle \mu(n) = \begin{cases}1& n=1 \\ 0 & n\text{ not squarefree} \\ (-1)^k & n=p_1\cdots p_k\text{ squarefree}\end{cases}

So, for example, \mu is -1 at any prime, \mu(6)=\mu(10)=1, and \mu(8)=\mu(12)=0.

Möbius inversion is then the following:

If f(n)=\sum_{d|n}g(d) then g(d)=\sum_{d|n}\mu(d)f(n/d), and conversely.

Well, I tried to figure out how my expression for J(x), in terms of \pi(x^{1/n})s was such an f of g. I don’t see it, as stated. Perhaps somebody can point me in the right direction in the comments.

The observation you are supposed to make comes when you consider something like \frac{1}{2}J(x^{1/2}). Using our formula above, we can write

\frac{1}{2}J(x^{1/2})=\frac{1}{2}\pi(x^{1/2})+\frac{1}{4}\pi(x^{1/4})+\frac{1}{6}\pi(x^{1/6})+\cdots

or, more generally, we have

\displaystyle \frac{1}{n}J(x^{1/n})=\sum_k \frac{1}{nk}\pi(x^{1/nk}).

If we subtract this equation (with n=2) from the equation for J(x) in terms of \frac{1}{n}\pi(x^{1/n})s, we can eliminate one term from the right-hand side. The idea is to iterate this process. I think a (reasonably, by my blog’s standards) clean way to write what is happening is as follows (it helps when you know the answer you are supposed to get :)):

Consider the sum

\displaystyle \sum_n \frac{\mu(n)}{n}J(x^{1/n}),

which by our most recent formula is the same as

\displaystyle \sum_{n,k}\frac{\mu(n)}{nk}\pi(x^{1/nk}).

Now, fix a positive integer m. The coefficient of \pi(x^{1/m}) in this last double sum is

\displaystyle \sum_{nk=m}\frac{\mu(n)}{m}=\frac{1}{m}\sum_{n|m}\mu(n).

Almost there… what is \sum_{n|m}\mu(n)? Turns out, it is 1 if m=1 (which is clear), and 0 otherwise. Indeed, suppose m has prime factorization p_1^{e_1}\cdots p_k^{e_k}. Then all of the divisors, n in the sum we’re after, are of the form p_1^{f_1}\cdots p_k^{f_k} where each 0\leq f_i\leq e_i. The only divisors, n, where \mu(n)\neq 0 are those where the f_i are all either 0 or 1. So the worthwhile n could be 1, or some p_i, or a p_ip_j (i\neq j), or p_ip_jp_{\ell} (i\neq j, i\neq \ell, j\neq \ell), etc. So

\begin{array}{l}\displaystyle \sum_{n|m}\mu(n)=1+\sum_i \mu(p_i)+\sum_{i,j} \mu(p_ip_j)+\cdots+\mu(p_1\cdots p_k) \\ \qquad \qquad \displaystyle =1-k+\binom{k}{2}-\binom{k}{3}+\cdots+(-1)^k=(1-1)^k=0.\end{array}

(By the way, this is (almost verbatim) the proof in Hardy and Wright’s “An Introduction to the Theory of Numbers”, and I’ll count that as one of my references for anything I say about \mu).

So, let’s gather back up here. We’ve now shown that in the sum

\displaystyle \sum_n \frac{\mu(n)}{n}J(x^{1/n})=\sum_{n,k}\frac{\mu(n)}{nk}\pi(x^{1/nk})

the coefficient of \pi(x^{1/m}) is 0 unless m=1, when the coefficient is 1. Therefore, we have obtained out formula for \pi(x):

\begin{array}{l} \pi(x)=\displaystyle \sum_n \frac{\mu(n)}{n}J(x^{1/n})\\ \qquad \displaystyle=J(x)-\frac{1}{2}J(x^{1/2})-\frac{1}{3}J(x^{1/3})-\frac{1}{5}J(x^{1/5})+\frac{1}{6}J(x^{1/6})-\cdots\end{array}

I guess the point of Riemann’s paper was that the terrible integral formula he obtained for J(x) could then be used in this expression for \pi(x), to give an exact analytic expression (that is, one only an analyst would find pretty) for \pi(x). Pretty wild, in my mind.

It seems that if you get some feeling for the terms in J(x), you can decide that the “dominant” term is the Li(x) term. And then that still ends up as the dominant term in the expression above for \pi(x). And so you get to say

\pi(x)\sim Li(x).

So you’ve got that going for you, which is nice.

Another Formula for J(x)

November 22, 2009

Yesterday, I related the logarithm of \zeta(s) to a piecewise linear function J(x). You may recall that J(x) was defined for positive reals by setting it equal to 0 at x=0, and then jumping by 1/n whenever x=p^n, for some prime p and integer n. At the end of the day, we got to

\displaystyle J(x)=\frac{1}{2\pi i}\int_{a-i\infty}^{a+i\infty}\ln \zeta(s) x^s\frac{ds}{s}

where a=\text{Re}(s)>1. Today, we’ll analyze \ln \zeta(s) some more, and re-write the formula above.

When I introduced \zeta(s), I ended with the following formula:

\displaystyle \zeta(s)=\frac{\Gamma(1-s)}{2\pi i}\int_{+\infty}^{+\infty} \frac{(-x)^s}{e^x-1} \frac{dx}{x}

where the bounds on that integral are supposed to represent a curve that “starts” at the right-hand “end” of the real line, loops around 0, and then goes back out the positive axis to infinity. I’m not good enough at complex line integrals at this point to say any more about this. But apparently if you are good at these sorts of integrals, using Cauchy’s integral formula and things you can find the so-called “functional equation”

\zeta(s)=\Gamma(1-s)(2\pi)^{s-1}2\sin(s\pi/2)\zeta(1-s).

If you then use the relation I mentioned previously:

\Gamma(s)\Gamma(1-s)\sin(\pi s)=\pi

(well, you use this for s=s/2), and one I haven’t mentioned:

\Gamma(1-s)=2^{-s}\Gamma(1-\frac{s}{2})\Gamma(\frac{1-s}{2})\pi^{-1/2}

and move some symbols around, you arrive at a more symmetric equation:

\Gamma(\frac{s}{2})\pi^{-s/2}\zeta(s)=\Gamma(\frac{1-s}{2})\pi^{-(1-s)/2}\zeta(1-s).

Notice that if you plug 1-s in the formula on the left-hand side, you obtain the right-hand side.

This function on the left-hand side apparently has poles at 0 and 1, so if we define

\xi(s)=\frac{s(s-1)}{2}\Gamma(\frac{s}{2})\pi^{-s}{2}\zeta(s)

then we obtain an entire analytic function satisfying \xi(s)=\xi(1-s). Using the factorial relation for \Gamma, we can re-write \xi(s) as

\xi(s)=(s-1)\Gamma(\frac{s}{2}+1)\pi^{-s/2}\zeta(s).

I get the impression that if you know what you are doing, then the things above aren’t tooo hard to justify. Apparently the next part is a bit trickier. Apparently, you can write

\xi(s)=\xi(0)\prod_{\rho}(1-\frac{s}{\rho}),

where the product is indexed over the roots, \rho, of \xi (so \xi(\rho)=0).

If you’ve heard anything about the Riemann hypothesis, you know that the roots (the “non-trivial” ones, I didn’t talk about the trivial ones) of \zeta(s) are a big deal. Our second formula for \xi(s) shows that they are (basically) the same as the roots of \xi(s), and so they are the \rho that the sum above is indexed over. The symmetric equation from earlier has a little something to say about the zeroes, and it has been shown that all of the zeroes have real part bigger than 0 and less than 1 (this is called the “critical strip”). The hypothesis (whose truth won’t affect what we’re saying below) is that all of the zeroes have real part 1/2 (this is the “critical line”). Apparently Riemann didn’t need this hypothesis for the things in his paper that introduced it, so I don’t really have much more to say about it right now. Although, honestly, I still don’t see what all the fuss is about :) The formulas we’ll get below and tomorrow work even if the roots aren’t on the critical line (unless I’m missing something important. If I am, please comment).

Anyway, back to the topic at hand. Let me try to convince you that it isn’t horribly unreasonable to think about writing a function as a product over its roots, as I’ve done above. For the sake of example, let f(x)=3x^3+3x^2-30x+24 (or pick your own favorite polynomial). The usual way this would get factored, in all the classes I’ve ever taken or taught, is (up to permutation) f(x)=3(x+4)(x-1)(x-2), showing that the roots are x=1,2,-4. However, if you factor a 4 out of the x+4 term, and -1 and -2 out of the other terms, you can also write f(x)=24(1-\frac{x}{-4})(1-x)(1-\frac{x}{2}). You still see all the zeroes when you write the polynomial this way. You can also see that the coefficient in the front is f(0). So we’ve written f(x)=f(0)\prod_{\rho}(1-x/\rho), which is the same goal as what we’re doing with \xi above. Incidentally, the idea of writing a function this way was also used by Euler to establish \zeta(2)=\sum 1/n^2=\pi^2/6 (I’ve mentioned this briefly elsewhere).

We now have two formulas for \xi(s), so we can put them together to get

\xi(0)\prod_{\rho}(1-\frac{s}{\rho})=(s-1)\Gamma(\frac{s}{2}+1)\pi^{-s/2}\zeta(s).

Recalling that our formula for J(x), at the beginning, involved \ln\zeta(s), let’s take the log of the equation above and solve for the \zeta term:

\begin{array}{l} \ln \zeta(s)=\ln \xi(0)+\sum_{\rho}\ln(1-\frac{s}{\rho})\\ \qquad\qquad -\ln \Gamma(\frac{s}{2}+1)+\frac{s}{2}\ln \pi-\ln(s-1).\end{array}

The idea is now to plug this in the formula for J(x). Apparently if you do, though, you’ll have some issues with convergence. So actually try to do the integral in J(x), using integration by parts (hint: dv=x^s ds). The “uv” term goes to 0 and you obtain

\displaystyle J(x)=\frac{-1}{2\pi i}\cdot \frac{1}{\ln x}\int_{a-i\infty}^{a+i\infty}\frac{d}{dx}\left[\frac{\ln\zeta(s)}{s}\right]x^s\ ds,

where, as before a=\text{Re } s. Now plug in the 5 terms we’ve got above for \ln \zeta(s), and you get a formula for J(x). What happens to the terms? Can you actually work out any of the integrals?

Well, you might be able to. I’m not. Not right now anyway. But I can tell you about what others have figured out (rather like I’ve been doing all along, in fact)…

It’s clear that the \frac{s}{2}\ln \pi term drops out, because you divide by s and then take the derivative of a constant and just get 0. The term with \ln\xi(0) ends up just giving you \ln\xi(0), which is -\ln(2).

The term corresponding to the term with \Gamma in it can be rewritten as

\displaystyle \int_x^{\infty}\frac{dt}{t(t^2-1)\ln t}

(as if that were helpful).

The important terms seem to involve the function

\displaystyle Li(x)=\int_0^x \frac{dt}{\ln t}.

Of course, this integrand has a bit of an asymptote at 1, so really Li(x) (in Edwards’ book, anyway) is the “Cauchy principal value” of this integral, namely

\displaystyle Li(x)=\lim_{\epsilon\to 0^+}\int_0^{1-\epsilon}\frac{dt}{\ln t}+\int_{1+\epsilon}^x \frac{dt}{\ln t}.

This function is, rather famously, related to approximating the number of primes less than a given bound. In fact, tomorrow I plan on having more to say about this. But back to the terms in our integral for J(x).

The term corresponding to the sum over the roots ends up giving you

\displaystyle -\sum_{\text{Im }\rho>0}Li(x^{\rho})+Li(x^{1-\rho}).

But apparently the dominant term is the term corresponding to \ln (s-1). It actually gives you Li(x)

So, finally, we have written

\begin{array}{l}\displaystyle J(x)=Li(x)-\sum_{\text{Im }\rho>0}\left(Li(x^{\rho})+Li(x^{1-\rho})\right)\\ \displaystyle\qquad\qquad +\int_x^{\infty}\frac{dt}{t(t^2-1)\ln t}-\ln 2.\end{array}

Doesn’t that make you feel better? We started with the reasonably understandable

\displaystyle J(x)=\sum_{p^n\leq x}\frac{1}{n},

and created the monstrosity above. I guess this is why I’m not an analyst. To me, it seems worse to write J(x) as this terrible combination of lots of integrals. But apparently it’s useful in analysis to have such formulas. I guess we’ll see a use tomorrow…

The Log of Zeta

November 21, 2009

Last time (too long ago, sorry), I finally got around to talking about the function \zeta(s). Today, I think I’d like to talk about \log \zeta(s).

First, though, I should mention Euler’s product formula. The goal is to re-write the infinite series \zeta(s)=\sum_n n^{-s} as a product. For each prime p, we can tease out of the series above those terms where n is p to some power. That is, we can think about the sub-series \sum_{k\geq 0} (p^k)^{-s}. This is a geometric series that converges to (1-p^{-s})^{-1}. If we take two such series and multiply them together, which terms from our starting series for \zeta(s) will we recover?

Suppose p and q are two primes, and consider the product

\displaystyle\left(1+\frac{1}{p^s}+\frac{1}{p^{2s}}+\frac{1}{p^{3s}}+\cdots\right)\cdot \left(1+ \frac{1}{q^s}+\frac{1}{q^{2s}}+\frac{1}{q^{3s}}+\cdots\right).

The terms in \sum_n n^{-s} that we obtain from this product are those where n is a product of some power (possibly the 0 power) of p and some power (again, possibly 0) of q. We could then think about another prime, say r, and it’s geometric series, and multiply it by the above, and obtain all the terms in \sum_n n^{-s} where n=p^aq^br^c for some 0\leq a,b,c.

Continuing on, and recalling that every positive integer has a unique factorization as a product of primes, we obtain the magical formula (isn’t most of Euler’s work magical?):

\displaystyle \sum_n \frac{1}{n^s}=\prod_p (1-p^{-s})^{-1},

where the product is indexed by the primes. What’s nice about this, for us, at the moment, is that logarithms work well with products and powers, but not soo well for sums. Recalling the Taylor polynomial

\ln(1-x)=-1-x-x^2-x^3-\cdots=-\sum\limits_{n=0}^{\infty} x^n

we find

\displaystyle \ln \zeta(s)=\sum_p\sum_n \frac{1}{n}p^{-ns}.

That was fun. Incidentally, it’s not much of a jump from here to show that the series of prime reciprocals diverges. I mentioned it in a post a while ago.

Let’s switch gears for a little bit. I’m going to define a function J(x) on the positive reals. If you’re following Edwards’ book with me (as I jump all over the place), this will be almost exactly his J(x), differing only at the integers (for today anyway :)). Let me start by setting J(0)=0. The function J(x) will be a step function, which I’ll define to mean piecewise linear with slope 0 on each piece. So to define J(x) I only need to tell you when to jump, and by how much. The rule is: when you get to an integer n, if n=p^k for some prime p, then you jump up by 1/k. So at 2,3,5,7,11,\ldots you jump by 1, at 4,9,25,49,\ldots you jump by 1/2, at 8, 27, \ldots you jump by 1/3, etc. Here’s a little picture of J(x), with a few values highlighted:

Slightly more precisely, we can write

\displaystyle J(x)=\sum_{p^n\leq x}\frac{1}{n}.

Now, let me see if I can convince you that there is some justification in writing

\displaystyle \ln \zeta(s)=s\int_0^{\infty} J(x)x^{-s-1}\ dx.

Let’s work with the right-hand side. Since J(x)=0 near x=0, I’ll actually start my integral at x=1. I think the way to go about it is as follows:

\begin{array}{rcl} \displaystyle s\int_1^{\infty}J(x)x^{-s-1}\ dx &=& \displaystyle s\sum_{m=1}^{\infty}\int_m^{m+1}J(x)x^{-s-1}\ dx \\ {} & = & \displaystyle \sum_{m=1}^{\infty}J(m)\left(\frac{1}{m^s}-\frac{1}{(m+1)^s}\right).\end{array}

Now suppose that J(m)=J(m+1) for some m. Then the terms corresponding to m and m+1 telescope as follows:

\begin{array}{c} \displaystyle J(m)\left(\frac{1}{m^s}-\frac{1}{(m+1)^s}\right)+J(m+1)\left(\frac{1}{(m+1)^s}-\frac{1}{(m+2)^s}\right)\\ =\displaystyle J(m)\left(\frac{1}{m^s}-\frac{1}{(m+2)^s}\right)\end{array}.

If, also, J(m)=J(m+2), this we can telescope another term into this one, and so on. So, really, the important m in this sum are those where J(x) jumps, which are the prime powers. Let m_i=p_i^{n_i} be the i-th prime power (i.e., the point where J(x) makes its i-th jump), starting with m_1=2. Then we can write

\begin{array}{rcl} \displaystyle s\int_0^{\infty}J(x)x^{-s-1}\ dx & = & \displaystyle \sum_i J(m_i)\left(\frac{1}{m_i^s}-\frac{1}{m_{i+1}^s}\right) \\ & = & \displaystyle \frac{1}{2}J(2)+\sum_{i=2}^{\infty} -J(m_{i-1})\frac{1}{m_i^s}+J(m_i)\frac{1}{m_i^s} \\ & = & \displaystyle \frac{1}{2}+\sum_{i=2}^{\infty} \frac{1}{m_i^s}\left(J(m_i)-J(m_{i-1})\right) \\ & = & \displaystyle \frac{1}{2}+\sum_{i=2}^{\infty} \frac{1}{m_i^s}\cdot \frac{1}{n_i} \\ & = & \displaystyle \sum_{p}\sum_n \frac{1}{n}p^{-ns} \\ & = & \ln \zeta(s).\end{array}

What good is writing \ln \zeta(s) this way? I guess if you know something about Fourier inversion (which I don’t) then you get to say that

\displaystyle J(x)=\frac{1}{2\pi i}\int_{a-i\infty}^{a+i\infty}\log \zeta(s)x^s\frac{ds}{s},

for a=\text{Re}(s)>1. What good is that? I think I’ll have to read some more and tell you about it tomorrow, but it’ll turn out to be useful once we have yet another description of \ln \zeta(s), in terms of the 0s of \zeta(s) (finally getting to those 0s everybody cares so much about).

Riemann’s Zeta Function

November 13, 2009

I guess it is about time to get to the zeta function side of this story, if we’re ever going to use Farey sequences to show how you could prove the Riemann hypothesis. I’ve been reading a bit of Edwards’ book, with the same title as this post, and thought I’d try to summarize the first chapter over the next few posts. I’m not sure that the content of the first chapter is hugely vital for the final goal of relating to the Farey sequences, but I wanted to try to learn some of it anyway.

I should mention, before talking about any of this, that I will not claim to know any complex analysis. The last complex analysis course I took was 5 years ago, I can’t be sure how much I paid attention at the time, and I haven’t used it since. I will be jumping over gaps of various sizes for quite a while in the upcoming posts. Perhaps sometimes I’ll mention that a gap is there. Mostly, though, what I’m after in this story is the outline, and how all of the parts fit together, as a big picture. Perhaps I’ll go back and fill in some gaps, as I understand more.

For today, let’s see if I can build up to an analytic expression of \zeta(s). Our starting point is the function

\zeta(s)=\displaystyle\sum_{n=1}^{\infty}\frac{1}{n^s}

which is defined for real values of s larger than 1. The goal is to find a nice expression that is defined on more of the complex plane, but agrees with this definition on the reals larger than 1.

To get to that point, we’ll use the \Gamma function:

\Gamma(s)=\displaystyle\int_0^{\infty} e^{-x}x^{s-1}\ dx

This is an analytic function defined everywhere except at the negative integers and 0. If you are only interested in real values of s, this function is a continuous extension of the factorial function, which is only defined on the positive integers. Actually, there is a shift involved, so \Gamma(n)=(n-1)! (Edwards blames this shift on Legendre, whose reasons, he states, “are obscure”. Edwards uses \Pi(s)=\Gamma(s+1) throughout, so I might make some \pm 1 errors). In particular, s\Gamma(s)=\Gamma(s+1) when both sides make sense.

Another relation that we’ll use is that

\pi=\Gamma(s)\Gamma(1-s)\sin(\pi s)

If memory serves (from reading Artin’s lovely little book on the \Gamma function), you can obtain this by determining that the derivative of the expression on the right is 0. This tells you that \sin(\pi s)\Gamma(s)\Gamma(1-s) is a constant, and so you can pick your favorite value of s to obtain the value of that constant. Anyway, the identity I’ll use is a rearrangement of the one above, namely

\Gamma(s)\sin(\pi s)=\dfrac{\pi}{\Gamma(1-s)}.

Now, in the expression

\Gamma(s)=\displaystyle\int_0^{\infty}e^{-x}x^{s-1}\ dx,

substitute x=nu for some positive n. Messing about with that substitution for a few minutes (and then re-writing u=x by abuse of notation), you can arrive at

\dfrac{\Gamma(s)}{n^s}=\displaystyle\int_0^{\infty} e^{-nx}x^{s-1}\ dx.

That n^s in the denominator is useful for us, as that’s how it shows up in \zeta(s). In particular, we can obtain

\begin{array}{rcl}\Gamma(s)\zeta(s) &=& \displaystyle \Gamma(s)\sum_{n=1}^{\infty}\dfrac{1}{n^s}=\sum_{n=1}^{\infty} \frac{\Gamma(s)}{n^s} \\ &=& \displaystyle \sum_{n=1}^{\infty}\int_0^{\infty}e^{-nx}x^{s-1}\ dx \\ &=& \displaystyle \int_0^{\infty}x^{s-1}\sum_{n=1}^{\infty} e^{-nx}\ dx\\ &=& \displaystyle \int_0^{\infty} \dfrac{x^{s-1}}{e^x-1}\ dx\end{array}

That last transition coming about by summing the geometric series. There are probably some things analysts like to check in here, moving infinite sums and improper integrals past each other… I’ll let them.

Ok, we’re nearly there. Next up, you do some complex line integral and show that

\begin{array}{rcl} \displaystyle \int_{+\infty}^{+\infty} \dfrac{(-x)^s}{e^x-1}\ \dfrac{dx}{x} &=& \displaystyle (e^{i\pi s}-e^{-i\pi s})\int_0^{\infty}\dfrac{x^{s-1}}{e^x-1}\ dx \\ &=& 2i\sin(\pi s)\Gamma(s)\zeta(s)\end{array}.

That integral is a little weird, going “from +\infty to +\infty“. Really, we take a path that “starts at +\infty“, swings around 0, then goes back out to infinity. This is almost certainly one of the complex analysis things I should go back and learn more about. Since we are integrating (-x)^s=e^{s\log(- x)} along positive real values of x, we’re working with logs along the negative real axis. Perhaps I’ll return to this integral in a future post.

Using the identity about \sin(\pi s)\Gamma(s) from above, we obtain

\zeta(s)=\dfrac{\Gamma(1-s)}{2\pi i}\displaystyle \int_{+\infty}^{+\infty}\dfrac{(-x)^s}{e^x-1}\ \dfrac{dx}{x}.

We’re now at a point where we’ve (apparently) got an expression for \zeta(s) that is analytic in the complex plane except for a simple pole at s=1.

?’(x)

November 11, 2009

Yesterday I talked about the function ?(x). If you recall (or rather, if you don’t… if you do recall, you can probably skip to the next paragraph), I defined this function by lining up two infinite binary trees. One tree was obtained by taking mediants, giving us the Stern-Brocot tree, while the other was obtained by taking midpoints, giving us a tree of “dyadic” fractions (denominators are powers of two). To evaluate ?(x), you find x in the Stern-Brocot tree, and then ?(x) is the value in the corresponding position of the dyadic tree.

Instead of thinking about infinite binary trees, let us think about cutting each tree off at some level, n. The pictures from yesterday’s post are, if you ignore the vertical dots, the trees when n=2. In the dyadic tree, we will obtain all fractions m/2^p where 0<m<2^p is odd and 1\leq p\leq n+1. Said another way, we will get all of the values m/2^{n+1} where 0<m<2^{n+1} (not necessarily odd anymore). If you think about it for a few moments, you’ll notice that these points are equally spaced in the interval, each being 1/2^{n+1} away from its neighbors. This is not the case for the Stern-Brocot tree.

In the truncated Stern-Brocot tree, points are scattered through the interval (0,1). We can see that they are scattered in a way that is symmetric around 1/2, but what else can we say? Think about choosing some number, b, of buckets, call them B_0,\ldots,B_{b-1}. Let B_i collect all of the points from the truncated Stern-Brocot tree whose value lies between i/b and (i+1)/b. Of course, I’m being a little loose with my endpoints, but hopefully the idea comes across. If you choose an appropriate number of buckets, depending on the depth of your truncated tree, you can arrange that no point is an endpoint of a bucket’s interval.

Anyway, when you’re done with that, count how many points are in each bucket. This defines a function from the interval [0,1] to non-negative integers, where the value at all points in B_i is the count of the number of Stern-Brocot points in the bucket B_i. Here’s an example graph, obtained by taking the depth 10 Stern-Brocot tree, and putting points into 59 buckets (unless I coded something up wrong :)):

You could next scale this graph vertically by a constant so that the total area of all of the rectangles is 1. That is, you could make the graph that of a probability distribution. Of course, the distribution you get will depend on not ony the depth of the tree, but also how many buckets you choose. Since I’m only vaguely sure what I’m talking about, let me not say much more about that.

A less graphical way to think about all of this is that, for each n, I define two random variables X_n and Y_n. The first, X_n picks a value from all of the nodes in the Stern-Brocot tree of depth n, assuming each is equally likely. The second is basically the same, using the dyadic tree instead. I’m not particularly well-versed in probability, but I guess you are allowed to take the limit of these probability distributions, let’s say X=\lim X_n, and similarly Y=\lim Y_n. This defines two probability functions on the interval [0,1] (well, I suppose technically it probably defines a probability function on the rationals there…).

How are the two distributions related? Well, basically the same way as the trees are related by ?(x). In particular, we have

P(a\leq X\leq b)=P(?(a)\leq Y\leq ?(b))

Since Y is just the uniform distribution (the dyadic points are equally distributed, recall), the probability on the right is just ?(b)-?(a). Let \mu be the graph of the probability function for X, so that the probability on the left is the integral \int_a^b \mu\ dx. We have, then

\displaystyle \int_a^b \mu(x)\ dx=?(b)-?(a)

This looks so much like the fundamental theorem of calculus that we want to say that ?'(x)=\mu(x). You could also read it as saying that ?(x) is the cumulative distribution function of \mu(x). Thinking back to my buckets from earlier, this means that if I define the value over the interval covered by B_i to be the sum of the counts of all the values in all the buckets up to B_i, then I should get a graph that looks like ?(x). You be the judge:

Of course, we probably aren’t exactly allowed to say ?'(x)=\mu(x). Consider the rational s/t, and suppose that ?'(s/t) exists, i.e., that

\displaystyle\lim_{x\to s/t}\dfrac{?(x)-?(s/t)}{x-s/t}

exists. If this two sided limit exists, then we can calculate it by coming at it from either side. Let’s think about x\to s/t^+, that is, from the right. But let’s simplify some more, still. If the one-sided limit exists, then we can calculate it by calculating the limit of the difference quotient for any sequence of values x converging to s/t. The sequence of values, on the right of s/t, that I want to consider are those neighbors, on the right, of s/t, in the Stern-Brocot sequences.

Let’s say that the first neighbor is the right child of s/t in the Stern-Brocot tree, and call it p/q. Then all of the successive neighbors are the successive left children of p/q. These points are all of the points (ms+p)/(mt+q), for positive values m. If we let m\to \infty, these weighted mediants converge to s/t.

So let’s consider the following:

\displaystyle\lim_{m\to\infty}\dfrac{?\left(\frac{ms+p}{mt+q}\right)-?\left(\frac{s}{t}\right)}{\frac{ms+p}{mt+q}-\frac{s}{t}}.

Let’s start simplifying in the denominator. That fraction is (pt-sq)/(t(mt+q)), but since p/q and s/t are neighbors in some Farey sequence, this difference is 1. So the denominator is 1/(t(mt+q)).

In the numerator, we are evaluating ?(x) at weighted mediants of s/t and p/q. The corresponding ?(x) values are then weighted averages. Playing with the formulas a little bit, you can determine that

?\left(\frac{ms+p}{mt+q}\right)=\dfrac{(2^m-1)?(s/t)+?(p/q)}{2^m}.

Combining all of the formulas so far, and simplifying, it turns out that the limit we are after is just

\displaystyle\lim_{m\to\infty}\left(?(p/q)-?(s/t)\right)\cdot t\cdot (mt+q)\cdot 2^{-m}

Of course, that power of two wins the race to infinity, and so this limit is 0.

We have shown then, that if ?'(p/q) exists, then it is 0. Since ?(x) isn’t a constant graph, we better have ?'(x) non-zero on the irrationals. Perhaps I’ll leave that to you.

I should, of course, mention that I learned about this reading the paper “On the Minkowski measure” by Linas Vepstas, available online here (along with a handful of other papers that look fun). Hopefully I haven’t mangled anything too badly.

?(x)

November 10, 2009

Sorry about the lack of a MaBloWriMo post yesterday. I got a little behind, left resources I wanted at the office, and did some other things.

If you remember, my original goal in this process was to relate the Farey sequences to the Riemann hypothesis. I have some vague understanding that the relationship is based on the distribution of the terms in the Farey sequence, as points in the unit interval. It’s getting on toward time to talk about the Riemann hypothesis side of this picture, so I can start relating the two. But before I do, there’s one more diversion (hopefully it isn’t too far off base) I’d like to make related to Farey sequences. I’ve still got some learning to do about it, but let’s see what I can get down today, laying some foundation for the next day or two, perhaps.

Recall that the way to construct one Farey sequence, F_{n+1}, from the previous one, F_n, is to throw in all of the mediants of successive terms whose denominators aren’t too big. What if we didn’t bother with that last part, and just left in all the mediants? All the numbers we’ll write down are still reduced, and we’ll obtain all of the fractions in [0,1] this way, because they all show up in some Farey sequence, and we’re getting all the terms in all the Farey sequences, just sometimes a little earlier than Farey would.

Just to be clear, our first few sequences now look like this:

0/1\quad 1/1

0/1\quad 1/2\quad 1/1

0/1\quad 1/3\quad 1/2\quad 2/3\quad 1/1

0/1\quad 1/4\quad 1/3\quad 2/5\quad 1/2\quad 3/5\quad 2/3\quad 3/4\quad 1/1

and so on. We see that the n-th sequence in this list contains all of F_n, as well as some other numbers with bigger denominators. A nice property of these sequences, which isn’t shared with the Farey sequences, is that every other term in the n-th sequence is a new term that wasn’t in the n-1-st sequence. So when I go to make the next sequence in this process, I’m always taking the mediant of a “new” number and an “old” number. Moreover, each “new” number will be involved in two mediants… one taken “to the left” (with the “old” number on the left, making a smaller value) and one taken “to the right”.

This language let’s me write down a binary tree (every node has two branches down) as follows: Begin with 1/2 as the root (at the top, I’m throwing out 0/1 and 1/1). This is a new fraction in the second sequence above, so to make the third sequence, I added in the “left mediant” (1/3) and “right mediant” (2/3) from 1/2. To make the sequence after that, 1/3 gives a “left mediant” (1/4) and a “right mediant” (2/5), and similarly 2/3 spawns 3/5 and 3/4. So I build up a tree that looks like this:

\begin{picture}(200,80) \put(85,65){1/2} \put(45,45){1/3}\put(125,45){2/3} \put(25,25){1/4} \put(65,25){2/5} \put(105,25){3/5} \put(145,25){3/4} \put(5,5){\vdots} \put(45,5){\vdots} \put(85,5){\vdots} \put(125,5){\vdots} \put(165,5){\vdots} \put(55,55){\line(2,1){20}} \put(35,35){\line(1,1){10}} \put(125,55){\line(-2,1){20}} \put(65,35){\line(-1,1){10}} \put(115,35){\line(1,1){10}} \put(145,35){\line(-1,1){10}} \end{picture}

I’ll call this the Stern-Brocot tree, though the actual definition my be slightly different.

Another way I could build up a sequence of sequences, and generate another binary tree, is to use midpoints, instead of mediants. If my first sequence is still 0/1 and 1/1, then taking midpoints will always give me denominators that are powers of 2. My tree will now look like:

\begin{picture}(200,80) \put(85,65){1/2} \put(45,45){1/4}\put(125,45){3/4} \put(25,25){1/8} \put(65,25){3/8} \put(105,25){5/8} \put(145,25){7/8} \put(5,5){\vdots} \put(45,5){\vdots} \put(85,5){\vdots} \put(125,5){\vdots} \put(165,5){\vdots} \put(55,55){\line(2,1){20}} \put(35,35){\line(1,1){10}} \put(125,55){\line(-2,1){20}} \put(65,35){\line(-1,1){10}} \put(115,35){\line(1,1){10}} \put(145,35){\line(-1,1){10}} \end{picture}

I could now line up these trees, and determine a function that takes one rational number, finds it’s position in the first tree, and spits out the value in the same position from the second tree. For example, using our pictures above, we see that this function will take 3/5 to 5/8. This function is known as Minkowski’s question mark function, ?(x), and Wikipedia has a graph, so you should click that last link. I’ve only defined it on rationals, but you wave the “continuity” wand to extend my definition to a function on [0,1].

So, to evaluate ?(x) for some rational, you have to find that rational in the Stern-Brocot tree. How do you do that? You know that you don’t have to go further down in the tree than your denominator, but that’s still a lot of fractions to look at, potentially. Perhaps there’s a better way?

I guess one way is to play the over-under game. Start at 1/2. If that’s your fraction, you are done. Otherwise, if your fraction is bigger, move right, and if your fraction is smaller, move left. Repeat until you get to your fraction.

But there’s more math we can say if we think about this process slightly differently.

The point of the tree being a tree is that there is only one way to get from the root, 1/2, to any fraction. Working your way down, you make a sequence of moves, either left (L) or right (R). So you generate some string – LLRLRRRL, for example. Of course, that notation gets out of hand quickly, so we might also use exponents to write this example as L^2RLR^3L. To such a string we can associate a sequence of fractions by writing down whatever fraction we are at when we change direction. Let us write this sequence as p'_1/q'_1,p'_2/q'_2,\ldots, and call these “turning points”. I’ve used primes here, because the numbers I’m actually interested in are one before I make a turn. Let me call that sequence p_1/q_1,p_2/q_2,\ldots, calling these values parents (so p_k/q_k has a line directly down to p'_k/q'_k in the Stern-Brocot tree, connecting the two). So, for example, the sequence L^3R^2L^2 has turning points p'_1/q'_1=1/5, p'_2/q'_2=3/13, and finally p'_3/q'_3=7/31 and associated parents p_1/q_1=1/4, p_2/q_2=2/9, and 5/22.

What is the relationship among successive turning points and parents? Suppose you are at the turning point p'_k/q'_k, with parent p_k/q_k, looking to get to the next parent or turning point, and are going to do this with a sequence of a moves (left or right, it doesn’t actually matter). The next parent will be the (1,a-1) weighted mediant of the turning point p'_k/q'_k with its parent p_k/q_k. That is, p_{k+1}=p'_k+(a-1)p_k, and q_{k+1}=q'_k+(a-1)q_k. Similarly, the next turning point will be the (1,a) weighted mediant of p'_k/q'_k with p_k/q_k. This also means that it is the (1,1) weighted mediant of p_k/q_k with p_{k+1}/q_{k+1}.

Combining all of these relationships, messing with the algebra just slightly, you can determine that p_{k+1}=ap_k+p_{k-1} and q_{k+1}=aq_k+q_{k-1}. You may recall that these are exactly the relations between the convergents of a continued fraction (if you start putting subscripts on your a, writing it as a_{k+1}).

So, given a continued fraction [0;a_1,\ldots,a_n] for a rational x in (0,1), begin at 1/1 (which isn’t in the tree, I know) and move left a_1 times (the first move getting you to 1/2), then move a_2 right, and so on. In your final step, you only move a_n-1 (if you wonder about the “-1″, you might think that the n-th convergent is the fraction you are after, and that is the n-th parent, not the n-th turning point), and you’ll arrive at x. On your way, all of the fractions that you hit, one before a turning point, will be the convergents to x.

That only gets us half-way into evaluating ?(x), for this rational x you’ve picked. You first find its continued fraction and then, by the process above, you’ll see how to get down to x from the top of your tree. That means ?(x) will be the value in our second tree (taking averages, the tree with powers of two in the denominator) that follows the same path. The initial a_1 lefts, from 1/1, will get us to a value of 1/2^{a_1}. Then we are supposed to take a_2 rights. Taking 1 right will get us to 1/2^{a_1}+1/2^{a_1+1}. Another right will then be 1/2^{a_1}+1/2^{a_1+1}+1/2^{a_1+2}, which I’ll write 1/2^{a_1}+3/2^{a_1+2}. Yet another right gets us to 1/2^{a_1}+7/(2^{a_1+3}), and so on, until 1/2^{a_1}+(2^{a_2}-1)/2^{a_1+a_2}. After that, we move left a_3 times. The analysis of where we end up is basically the same, you just have to remember that you are subtracting 1/2^{a_1+a_2+i}s, instead of adding them.

Putting this all together, we obtain (remembering that in the last stage we only went a_n-1 moves.

?([0;a_1,\ldots,a_n])=\dfrac{1}{2^{a_1}}+\dfrac{2^{a_2}-1}{2^{a_1+a_2}}-\dfrac{2^{a_3}-1}{2^{a_1+a_2+a_3}}+\cdots\pm \dfrac{2^{a_n-1}-1}{2^{a_1+\cdots+a_n-1}}

Just as a warning, I may be off slightly in some of my formulas. I haven’t written down anything knowing that it was false, but I’m not convinced I obtain the correct formula at the end of the day (it looks slightly different from the formula at Wikipedia, for instance). If anybody sees an error, please let me know.

That’s probably plenty for today. Up next (I think), I want to take the derivative (even though I can’t) of ?(x), because Wikipedia thinks I’ll get something related to the density of the Farey sequences.

By the way, a paper I was looking at to figure out about these things was “The Minkowski Question Mark, GL(2,\mathbb{Z}) and the Modular Group (Expository)” by Linas Vepstas. The printed copy I have, from some time summer 2009, is incomplete, but it looks to be available online.

The Only LFTs You’ll Ever Need

November 8, 2009

Let us consider linear fractional transformations \frac{az+b}{cz+d} where the coefficients are all integers and ad-bc=1. Since multiplying all of the coefficients by a constant doesn’t change the transformation, we may multiply everything by -1 whenever we want (for instance, to make a chosen coefficient positive). Recall, also, that writing the transformation as a matrix, \left(\begin{smallmatrix}a&b\\ c&d\end{smallmatrix}\right) is not an entirely unreasonable thing to do, as composites of transformations are correctly determined by matrix multiplication. I’ll be talking about sequence of these matrices, so let my original matrix be \left(\begin{smallmatrix}a_0&b_0\\ c_0&d_0\end{smallmatrix}\right)

If c_0=0, then the requirement a_0d_0-b_0c_0=1 means that a_0=d_0=\pm 1, and we may choose a_0=d_0=1. Then our transformation is simply \left(\begin{smallmatrix}1 & b_0 \\ 0 & 1\end{smallmatrix}\right). Since b_0 is an integer, we could write this as \left(\begin{smallmatrix}1 & 1\\ 0 & 1\end{smallmatrix}\right)^{b_0}.

If c_0\neq 0, then there is a q_0\in \mathbb{Z} so that 0\leq a_0-q_0c_0<c_0. I’ll leave you to check the following:

L_0 = \begin{pmatrix}1 & q_0\\ 0 & 1\end{pmatrix}\begin{pmatrix}0 & 1\\ 1 & 0\end{pmatrix}\begin{pmatrix}c_0 & d_0 \\ a_0-q_0c_0 & b_0-q_0d_0\end{pmatrix}

Writing this right-most matrix as L_1=\left(\begin{smallmatrix}a_1 & b_1\\ c_1 & d_1\end{smallmatrix}\right), we are looking at another matrix with integer coefficients whose determinant is 1. So we may repeat the process, finding integers q_k and LFTs L_k with L_k=\left(\begin{smallmatrix}1&q_k\\ 0&1\end{smallmatrix}\right)\left(\begin{smallmatrix}0&1\\ 1&0\end{smallmatrix}\right)L_{k+1}. The process must stop eventually, because c_0>c_1>c_2>\cdots are all non-negative integers. Eventually, one of them will be 0, and we already dealt with transformations like that in the previous paragraph.

Let T=\left(\begin{smallmatrix}1&1\\ 0&1\end{smallmatrix}\right), and note that T^b=\left(\begin{smallmatrix}1&b\\ 0&1\end{smallmatrix}\right), which we’ve used before. Also, let \overline{S}=\left(\begin{smallmatrix}0&1\\1&0\end{smallmatrix}\right) (you might note that \overline{S}^2 is the identity matrix). Then our relation above is L_k=T^{q_k}\overline{S}L_{k+1}. That is,

\begin{array}{rcl}\begin{pmatrix}a&b\\ c&d\end{pmatrix}=L_0 &=& T^{q_0}\overline{S}L_1 \\ &=&T^{q_0}\overline{S}T^{q_1}\overline{S}L_2\\ &=&\cdots=T^{q_0}\overline{S}T^{q_1}\overline{S}\cdots T^{q_k}.\end{array}

The collection of matrices that we have been working with, 2×2 matrices with integer coefficients whose determinant is 1, tend to get called the modular group and we have just shown that T and \overline{S} can be multiplied together to give you any element of the group. More officially, they “generate” the group. You might fuss and say that technically these matrices are really the special linear group SL_2(\mathbb{Z}), and note that the modular group is really the quotient of this group where you say two matrices are the same if one is -1 times the other.

One interesting thing about this process is that it is really just the process you go through to find the greatest common divisor of a_0 and c_0. The standard algorithm (Euclid’s) is to note that, if c_0\neq 0, there is an integer q_0 such that a_0=q_0c_0+c_1 where 0\leq c_1<c_0. The next step is to repeat the process, writing c_0=q_1c_1+c_2, with 0\leq c_2<c_1. Iterating, you find q_k so that c_{k-1}=q_kc_k+c_{k+1}, and eventually a c_n is 0, and the process stops.

You might be a little worried (justifiably) that this process only seems to look at a_0 and c_0, leaving b_0 and d_0 out of the picture. If you look at what we did with the matrices, our process took \left(\begin{smallmatrix}a_0&b_0\\ c_0&d_0\end{smallmatrix}\right) and wrote it as T^{q_0}\overline{S}\left(\begin{smallmatrix}a_1 & b_1\\ c_1 & d_1\end{smallmatrix}\right), and the b_1 and d_1 depended on b_0 and d_0. In the final step of the process, where the c-coefficient is 0, we ended up with a d-coefficient of 1, but the b-coefficient will be a function of the values b_0 and d_0, so those values do show up and get used, eventually. Of course, since a_0d_0-b_0c_0=1, knowing one of b_0 or d_0 is enough to determine the other (assuming you know a_0 and c_0). You should be thinking that the b_i and d_i are the extra variables that run around in Euclid’s alrgorithm when you do the “extended” version.

We’ve been talking about all of this in terms of matrices, but remember that the matrices represent linear fractional transformations. The matrix T is the transformation z+1, and the power T^b is then z+b. These are just horizontal translations. The matrix \overline{S} is the transformation 1/z. With our relations among the L_i above, we see that we could write

\begin{array}{rcl}\dfrac{a_0z+b_0}{c_0z+d_0} &=& T^{q_0}\overline{S}L_1 = q_0 + \frac{1}{L_1} \\ &=& q_0+\frac{1}{T^{q_1}\overline{S}L_2}=q_0+\cfrac{1}{q_1+\cfrac{1}{L_2}} \\ &=& q_0+\cfrac{1}{q_1+\cfrac{1}{\ddots+\cfrac{1}{q_k+z}}}\end{array}

Hurray for continued fractions! Of course, I feel like I set something up backwards. Plugging in z=0 will give me a continued fraction for b_0/d_0, but I was saying that the q_k come from thinking about the greatest common divisor of a_0 and c_0. Ah well.

[Update 20091203: The error I'm concerned about above stems from some loose dealings with the end of the iterative procedure. Letting L_i=\left(\begin{smallmatrix}a_i & b_i\\ c_i & d_i\end{smallmatrix}\right), we will get to something like L_n=T^{q_n}\overline{S}L_{n+1} where L_{n+1}=\left(\begin{smallmatrix}1 & b_{n+1} \\ 0 & 1\end{smallmatrix}\right), i.e. L_{n+1}=T^{b_{n+1}}. So we end up saying L_0=T^{q_0}\overline{S}T^{q_1}\overline{S}\cdots T^{q_n}\overline{S}T^{b_{n+1}}. And that b_{n+1} does depend on the initial b_0, presumably in a manner such that plugging in z=0 makes the fraction make sense.]

What I’ve said is that every 2×2 matrix with coefficients in \mathbb{Z} and determinant 1 can be written in terms of T and \overline{S}. However, you might want to use S=\left(\begin{smallmatrix}0&-1\\ 1&0\end{smallmatrix}\right)=-1/z instead of \overline{S}. Going through essentially the same process as before, with some minus signs sprinkled in appropriately, one can determine that T and S can be used, instead of T and \overline{S}, to generate any of the matrices we are considering. What is the advantage of S over \overline{S}? One way to state the advantage, for our story, is that applying S to points in the upper half plane leaves them in the upper half plane (and similaly for the lower half), whereas \overline{S} flips points in upper half plane to the lower half plane. We should think of this an advantage, because all of our Ford circles lie in the upper half plane. If you go back to yesterday’s discussion, you’ll note that I used S in the factorization.

So, anyway, T and S are the only LFTs you need. You can quickly check that S^2=1 and also (not quite as quickly) that (TS)^3=1. If you write U=TS, then I guess you get to say that the modular group is generated by S and U, and can establish an isomorphism between the modular group and the free product of the cyclic group of order 2 with the cyclic group of order 3. That’s something.


Follow

Get every new post delivered to your Inbox.