The Only LFTs You’ll Ever Need

November 8, 2009 by sumidiot

Let us consider linear fractional transformations \frac{az+b}{cz+d} where the coefficients are all integers and ad-bc=1. Since multiplying all of the coefficients by a constant doesn’t change the transformation, we may multiply everything by -1 whenever we want (for instance, to make a chosen coefficient positive). Recall, also, that writing the transformation as a matrix, \left(\begin{smallmatrix}a&b\\ c&d\end{smallmatrix}\right) is not an entirely unreasonable thing to do, as composites of transformations are correctly determined by matrix multiplication. I’ll be talking about sequence of these matrices, so let my original matrix be \left(\begin{smallmatrix}a_0&b_0\\ c_0&d_0\end{smallmatrix}\right)

If c_0=0, then the requirement a_0d_0-b_0c_0=1 means that a_0=d_0=\pm 1, and we may choose a_0=d_0=1. Then our transformation is simply \left(\begin{smallmatrix}1 & b_0 \\ 0 & 1\end{smallmatrix}\right). Since b_0 is an integer, we could write this as \left(\begin{smallmatrix}1 & 1\\ 0 & 1\end{smallmatrix}\right)^{b_0}.

If c_0\neq 0, then there is a q_0\in \mathbb{Z} so that 0\leq a_0-q_0c_0<c_0. I’ll leave you to check the following:

L_0 = \begin{pmatrix}1 & q_0\\ 0 & 1\end{pmatrix}\begin{pmatrix}0 & 1\\ 1 & 0\end{pmatrix}\begin{pmatrix}c_0 & d_0 \\ a_0-q_0c_0 & b_0-q_0d_0\end{pmatrix}

Writing this right-most matrix as L_1=\left(\begin{smallmatrix}a_1 & b_1\\ c_1 & d_1\end{smallmatrix}\right), we are looking at another matrix with integer coefficients whose determinant is 1. So we may repeat the process, finding integers q_k and LFTs L_k with L_k=\left(\begin{smallmatrix}1&q_k\\ 0&1\end{smallmatrix}\right)\left(\begin{smallmatrix}0&1\\ 1&0\end{smallmatrix}\right)L_{k+1}. The process must stop eventually, because c_0>c_1>c_2>\cdots are all non-negative integers. Eventually, one of them will be 0, and we already dealt with transformations like that in the previous paragraph.

Let T=\left(\begin{smallmatrix}1&1\\ 0&1\end{smallmatrix}\right), and note that T^b=\left(\begin{smallmatrix}1&b\\ 0&1\end{smallmatrix}\right), which we’ve used before. Also, let \overline{S}=\left(\begin{smallmatrix}0&1\\1&0\end{smallmatrix}\right) (you might note that \overline{S}^2 is the identity matrix). Then our relation above is L_k=T^{q_k}\overline{S}L_{k+1}. That is,

\begin{array}{rcl}\begin{pmatrix}a&b\\ c&d\end{pmatrix}=L_0 &=& T^{q_0}\overline{S}L_1 \\ &=&T^{q_0}\overline{S}T^{q_1}\overline{S}L_2\\ &=&\cdots=T^{q_0}\overline{S}T^{q_1}\overline{S}\cdots T^{q_k}.\end{array}

The collection of matrices that we have been working with, 2×2 matrices with integer coefficients whose determinant is 1, tend to get called the modular group and we have just shown that T and \overline{S} can be multiplied together to give you any element of the group. More officially, they “generate” the group. You might fuss and say that technically these matrices are really the special linear group SL_2(\mathbb{Z}), and note that the modular group is really the quotient of this group where you say two matrices are the same if one is -1 times the other.

One interesting thing about this process is that it is really just the process you go through to find the greatest common divisor of a_0 and c_0. The standard algorithm (Euclid’s) is to note that, if c_0\neq 0, there is an integer q_0 such that a_0=q_0c_0+c_1 where 0\leq c_1<c_0. The next step is to repeat the process, writing c_0=q_1c_1+c_2, with 0\leq c_2<c_1. Iterating, you find q_k so that c_{k-1}=q_kc_k+c_{k+1}, and eventually a c_n is 0, and the process stops.

You might be a little worried (justifiably) that this process only seems to look at a_0 and c_0, leaving b_0 and d_0 out of the picture. If you look at what we did with the matrices, our process took \left(\begin{smallmatrix}a_0&b_0\\ c_0&d_0\end{smallmatrix}\right) and wrote it as T^{q_0}\overline{S}\left(\begin{smallmatrix}a_1 & b_1\\ c_1 & d_1\end{smallmatrix}\right), and the b_1 and d_1 depended on b_0 and d_0. In the final step of the process, where the c-coefficient is 0, we ended up with a d-coefficient of 1, but the b-coefficient will be a function of the values b_0 and d_0, so those values do show up and get used, eventually. Of course, since a_0d_0-b_0c_0=1, knowing one of b_0 or d_0 is enough to determine the other (assuming you know a_0 and c_0). You should be thinking that the b_i and d_i are the extra variables that run around in Euclid’s alrgorithm when you do the “extended” version.

We’ve been talking about all of this in terms of matrices, but remember that the matrices represent linear fractional transformations. The matrix T is the transformation z+1, and the power T^b is then z+b. These are just horizontal translations. The matrix \overline{S} is the transformation 1/z. With our relations among the L_i above, we see that we could write

\begin{array}{rcl}\dfrac{a_0z+b_0}{c_0z+d_0} &=& T^{q_0}\overline{S}L_1 = q_0 + \frac{1}{L_1} \\ &=& q_0+\frac{1}{T^{q_1}\overline{S}L_2}=q_0+\cfrac{1}{q_1+\cfrac{1}{L_2}} \\ &=& q_0+\cfrac{1}{q_1+\cfrac{1}{\ddots+\cfrac{1}{q_k+z}}}\end{array}

Hurray for continued fractions! Of course, I feel like I set something up backwards. Plugging in z=0 will give me a continued fraction for b_0/d_0, but I was saying that the q_k come from thinking about the greatest common divisor of a_0 and c_0. Ah well.

What I’ve said is that every 2×2 matrix with coefficients in \mathbb{Z} and determinant 1 can be written in terms of T and \overline{S}. However, you might want to use S=\left(\begin{smallmatrix}0&-1\\ 1&0\end{smallmatrix}\right)=-1/z instead of \overline{S}. Going through essentially the same process as before, with some minus signs sprinkled in appropriately, one can determine that T and S can be used, instead of T and \overline{S}, to generate any of the matrices we are considering. What is the advantage of S over \overline{S}? One way to state the advantage, for our story, is that applying S to points in the upper half plane leaves them in the upper half plane (and similaly for the lower half), whereas \overline{S} flips points in upper half plane to the lower half plane. We should think of this an advantage, because all of our Ford circles lie in the upper half plane. If you go back to yesterday’s discussion, you’ll note that I used S in the factorization.

So, anyway, T and S are the only LFTs you need. You can quickly check that S^2=1 and also (not quite as quickly) that (TS)^3=1. If you write U=TS, then I guess you get to say that the modular group is generated by S and U, and can establish an isomorphism between the modular group and the free product of the cyclic group of order 2 with the cyclic group of order 3. That’s something.

LFTs and Ford Circles

November 7, 2009 by sumidiot

Given 4 complex numbers, a,b,c,d, we may consider the linear fractional transformation (LFT)

\dfrac{az+b}{cz+d}.

Well, 4 numbers are enough to make a matrix, \left(\begin{smallmatrix} a & b\\c & d\end{smallmatrix}\right). Is there any better reason to relate the linear fractional transformation with this matrix?

Suppose you have two matrices, \left(\begin{smallmatrix}a&b \\ c&d\end{smallmatrix}\right) and \left(\begin{smallmatrix}a'&b'\\c'&d'\end{smallmatrix}\right). Then the product is as follows:

\begin{pmatrix}a&b\\c&d\end{pmatrix}\begin{pmatrix}a'&b'\\c'&d'\end{pmatrix}=\begin{pmatrix}aa'+bc' & ab'+bd' \\ ca'+dc' & cb'+dd'\end{pmatrix}.

If you take the composite of the two linear fractional transformations, i.e.,

\dfrac{a\cdot \frac{a'z+b'}{c'z+d'}+b}{c\cdot \frac{a'z+b'}{c'z+d'}+d}

and then play around simplifying that expression for a few minutes, you obtain the LFT

\dfrac{(aa'+bc')z+(ab'+bd')}{(ca'+dc')z+(cb'+dd')},

which is precisely the LFT corresponding to the product matrix above. So, if nothing else, writing LFTs as matrices this way won’t lead us astray when thinking about composites.

This idea is not without its confusion, for me anyway. Generally when you think about a 2×2 matrix of complex values, you are thinking about that matrix as a linear map \mathbb{C}^2\to\mathbb{C}^2, which is not what we are doing above. Instead, I guess we are saying that the “monoid” (group without inverses) of 2×2 matrices, M_2(\mathbb{C}), acts (in the technical sense) on \mathbb{C} as linear fractional transformations. My guess is that there are even better ways to say what is going on.

I think it is also important to keep in mind that two different matrices may correspond to the same LFT. For example, \left(\begin{smallmatrix}1&2\\ 0&2\end{smallmatrix}\right) represents the same LFT as \left(\begin{smallmatrix} 1/2 & 1\\ 0 & 1\end{smallmatrix}\right). More generally, if \lambda is any complex value (nonzero), then \left(\begin{smallmatrix}a&b\\c&d\end{smallmatrix}\right) represents the same LFT as \left(\begin{smallmatrix}\lambda a&\lambda b\\ \lambda c & \lambda d\end{smallmatrix}\right). I guess one can think of M_2(\mathbb{C}) as a \mathbb{C}-vector space (isomorphic to \mathbb{C}^4), and then think of its projective space (the quotient where two “vectors” (matrices here) are the same when they differ by a scalar (complex) multiple), which I’ll denote P(M_2(\mathbb{C})). Then I think I’m saying that the action of M_2(\mathbb{C}) on \mathbb{C} actually is an action of the quotient, P(M_2(\mathbb{C})). I’m not sure if this is a useful viewpoint (or, indeed, correct).

Yesterday, when I was talking about how to picture what an LFT does to \mathbb{C}, I wrote down a factorization of the LFT as a composite. Our new notation gives us another way to write that factorization (recall \alpha=-(ad-bc)/c^2, and that we had assumed c\neq 0):

\begin{pmatrix}a&b\\c&d\end{pmatrix}=\begin{pmatrix}\alpha & a/c\\ 0&1\end{pmatrix}\begin{pmatrix}0&1\\ 1&0\end{pmatrix}\begin{pmatrix}1&d/c\\ 0&1\end{pmatrix}.

As is frequently useful, we will assume that ad-bc\neq 0 (indeed, this factorization seems to require it – I think I’m missing something somewhere, anybody see it?). Notice that ad-bc is the determinant of the matrix representing our LFT. We may then multiply all entries in our matrix (without changing the LFT, as discussed above) by 1/(ad-bc), and obtain a matrix with determinant 1. Let’s do that, making \alpha=-1/c^2.

Yesterday, when I was working on the factorization above, I only had something like \epsilon idea where I was going. I think today I’ve got about twice that, so I want to re-write the factorization. Let me write it as

\begin{pmatrix}1 & a/c\\ 0 & 1\end{pmatrix}\begin{pmatrix}1&0 \\ 0& c^2\end{pmatrix}\begin{pmatrix}0 & -1\\ 1 & 0\end{pmatrix}\begin{pmatrix}1 & d/c\\ 0 & 1\end{pmatrix}.

So what’s the connection with Ford circles? Recall that for a reduced fraction h/k, the associated Ford circle is the circle centered at (h/k,1/(2k^2)) with radius 1/(2k^2). Following Rademacher (and, presumably, others), let us say that the “fraction” 1/0 also gets a Ford “circle”, the line y=1 in the plane. This isn’t such a nasty thing to do, as it has the tangency properties I talked about when talking about Ford circles. Anyway, let us think about applying our transformation \left(\begin{smallmatrix}a&b\\c&d\end{smallmatrix}\right), as the composite given above, and see what happens to this line y=1. We’ll assume that a,b,c, and d are all integers.

The first step, \left(\begin{smallmatrix}1&d/c\\ 0 &1\end{smallmatrix}\right) is the linear translation z+d/c. Since d/c is real (since c,d are integers), this translation is a horizontal shift, which leaves y=1 unchanged.

Next up, \left(\begin{smallmatrix}0&-1\\1&0\end{smallmatrix}\right), which is -1/z. Thinking of a point on the line y=1, you can quickly determine that its polar coordinates are (\csc \theta,\theta). The transformation -1/z is the composite of: (1) inversion with respect to the unit circle (the point becomes (\sin \theta,\theta)), (2) reflection across the horizontal axis (giving (\sin \theta,-\theta)), and finally (3) multiplication by -1 (giving (-\sin \theta,-\theta), since these are polar coordinates). This final point is (\sin \theta,\pi-\theta)=(\sin \pi-\theta,\pi-\theta). As \theta varies through (0,\pi), \pi-\theta also varies through this interval, and so we get the graph of the polar curve r=\sin \theta. If you know your polar curves, you know what this looks like…

So, the first two transformations take the line y=1 to the circle with center (0,1/2) and radius 1/2. The next in our composite is multiplication by 1/c^2, which is just a scaling (since c\in \mathbb{R}). This scaling takes our circle to the circle with center (0,1/(2c^2)) and radius 1/(2c^2). Finally, the last transformation is another horizontal translation, leaving our circle centered at (a/c,1/(2c^2)). We recognize this as the Ford circle for the fraction a/c (as long as that fraction is reduced).

Wasn’t that fun? If you want to think about it some more, you might convince yourself that any point above the line y=1 will get moved to a point inside the Farey circle resulting from this process.

Anyway, enough out of me. Hopefully tomorrow I’ll have slightly more of an idea what I’m talking about. Don’t count on it though.

Linear Fractional Transformations

November 6, 2009 by sumidiot

a.k.a. Möbius Transformations, are a type of function. I’ll talk about them as functions from the complex plane to itself. Such functions are given by a formula

\dfrac{az+b}{cz+d}

where a,b,c,d are complex values. If c and d are both 0, this isn’t much of a function, so we’ll assume at least one isn’t 0.

I’d like to talk about what these functions do, how to have some hope of picturing them as transformations \mathbb{C}\to\mathbb{C}. To do this, let’s consider some easy cases first.

If c=0 (and so by assumption d\neq 0), then we may write the function \frac{a}{d}z+\frac{b}{d}, or simply a'z+b' for some complex values a',b'. This is now a linear (some might say affine) transformation of the complex plane. Think about it as the composite z\mapsto a'z\mapsto a'z+b, where the first map multiplies by a', and the second adds b'. Multiplying by a complex value a' is the same as scaling by the real value |a'| (the “norm” of a', distance from a' to the origin) and then rotating by the “argument” of a'. If you think about a' as a point (r,\theta) in polar coordinates, then the argument of a' is \theta (or so), and so multiplication by a' is multiplication by the real value r (which is just a stretching (or shrinking) of the complex plane away from (toward) the origin if r>1 (resp. 0\leq r<1)) and then rotation by the angle \theta. The second transformation in the composite, “add b'“, just shifts the entire plane (as a “rigid transformation”) in the direction of b'.

So the case when c=0 is just a linear transformation, which aren’t too difficult to picture. Another important case is 1/z, so the coefficients are a=0,b=1,c=1,d=0. To talk about what this does, let’s first talk about “inversion” with respect to a fixed circle.

Let C be a circle with radius r, in the plane, and z any point in the plane. Let O denote the center of the circle and d the distance from O to z. The inversion of z, with respect to C, is the point on the line through O and z (in the direction of z from O) whose distance from O is d'=r^2/d. This means that points near the center of C are sent far away, and vice versa. Points on C are unchanged. Technically I guess we should say that this function isn’t defined at O, but people like to say it is taken to “the point at infinity” and, conversely, that inversion takes \infty to O. These things can be made precise.

You might notice that doing inversion twice in a row gets you right back where you started. It also turns out that If C' is another circle in the plane, not passing through O, then the inversion of all of its points is another circle. If C' passes through O, then the inversion works out to be a line. Since doing inversion twice is the identity, inversion takes lines to circles through O. If you’re thinking about the comments about \infty above, this makes sense because every line “goes to \infty“, and so the inversion of a line will go through the inversion of \infty, which I said should be O.

All of this talk about inversion was to describe the function 1/z. This function is the composite of inversion with respect to the unit circle centered at the origin followed by a reflection across the horizontal axis (real line). Don’t believe me? The equation d'=r^2/d defining the relationship between distances when doing the inversion can be re-written as dd'=r^2. If we’re doing inversion with respect to a unit circle, then dd'=1. This means that when we multiply z with its inversion with respect to the unit circle, call it z', the result will be a point with norm 1 (i.e., a point on the unit circle). Next up, multiplying z by z' produces a point whose angle from the positive real axis (which I called the argument before, the \theta from polar coordinates) is the sum of the angles for z and z'. Since we did the reflection across the horizontal axis, the argument for z' is precisely the negative of the argument for z, meaning their sum (the argument of their product) is 0. So zz' is a point on the unit circle making an angle of 0 with the positive real line, i.e., zz'=1. That makes z'=1/z, as promised.

Let’s get back to the general setup, with the function

\dfrac{az+b}{cz+d}

and let’s assume c\neq 0 (since we already handled the case c=0, it’s just a linear transformation). For some notational convenience, let me let \alpha=-(ad-bc)/c^2. Consider the following composite:

\begin{array}{rcl} z & \xrightarrow{w\mapsto w+\frac{d}{c}} & z+\dfrac{d}{c} \\ {} & \xrightarrow{w\mapsto \frac{1}{w}} & \dfrac{c}{cz+d} \\ {} & \xrightarrow{w\mapsto \alpha w+\frac{a}{c}} & \dfrac{\alpha c}{cz+d}+\dfrac{a}{c} \end{array}

If you check all of these steps, and then play around simplifying the final expression, then you obtain the original formula above. So we can think of any linear fractional transformation as a composite of some linear functions and an inversion, and we know how to picture all of those steps.

That’s maybe enough for today. It’s certainly enough for me for today. Before I go, I’ll leave you with a video that might be helpful, and is pretty either way.

Ford Circles

November 5, 2009 by sumidiot

Now that we’ve got some foundations laid for Farey sequences and rational approximations, let’s put pictures to some of those claims, via Ford circles.

For every rational h/k, in reduced terms, draw a circle in the plane with center (h/k,1/(2k^2)) and radius 1/(2k^2). I’m not going to try to come up with a better picture for you than the one used on the Wikipedia page for Ford circles. (I tried embedding the picture below, but that didn’t work. So I’ll leave it to you to click the link.)

What’s so great about this construction? Well, you may recognize those radii as the error term in our discussion of rational approximations. The statement that for every irrational \omega there are infinitely many reduced fractions h/k such that |\omega-h/k|<1/(2k^2) is the same as saying that the vertical line x=\omega intersects infinitely many of the Ford circles. The improved bound 1/(\sqrt{5}k^2) can also be found by (very carefully) analyzing vertical lines passing through Ford circles (more specifically, the gaps between them), but I think for now I’m going to let that statement be for now. If you want, the proof via Ford circles is in [Rademacher], and a proof via continued fractions is in [Hardy and Wright]. The result is known as Hurwitz’s theorem.

What else can be said about Ford circles? Well, yesterday I was talking about neighbors in Farey sequences. It turns out that h/k and H/K are neighbors in a Farey sequence if and only if the Ford circles at h/k and at H/K are tangent to each other.

To see why this is true, consider the circles at h/k and H/K, with h/k<H/K so Hk-hK>0). Since we know the location of their centers, we can find an expression for the distance between those centers. The two circles will intersect iff the distance between the centers is no bigger than the sum of the radii of the two circles. Re-arranging terms (and working with squares to avoid square roots), you might look at d^2-(r_k+r_K)^2, where d is the distance between the centers, and r_k and r_K denote the (hopefully obvious) radii. The two circles will be tangent if this is 0, and won’t intersect if the difference is positive. More care needs to be used when talking about what happens if the difference is negative, but we’ll see momentarily that this doesn’t happen, so let’s forget it. So, it turns out that d^2-(r_k+r_K)^2 is ((Hk-hK)^2-1)/(k^2K^2). Since all the letters there are positive integers, we see that d^2-(r_k+r_K)^2\geq 0, with equality iff Hk-hK=1. This says, based on our work yesterday, precisely that two Ford circles intersect iff the fractions they correspond to are neighbors in a Farey sequence, and when this happens, they intersect at a single point (they are tangent to eachother).

This justifies the picture from above, then. The circles never overlap, and some of them are tangent.

It might be worth it to go back and justify why any vertical line at an irrational value slices through infinitely many of the Ford circles. To that end, let \omega\in (0,1) be irrational. The two circles at 0/1 and 1/1 both have radius 1/2, and so certainly x=\omega hits at least one Ford circle, kicking off an induction argument.

Suppose that x=\omega hits the Ford circle h/k. For the sake of argument, let us suppose that \omega>h/k, so the vertical line lies to the right of the center of the circle. Not much changes if the opposite inequality is true, so let’s just stick with this one. I aim to show that x=\omega hits a circle with a smaller radius, i.e. goes through a circle corresponding to a fraction with a bigger denominator than k. I claim that it hits one of the circles tangent to the circle at h/k. By the above, we know that these are the circles corresponding to neighbors of h/k in some Farey sequence. Since I’m thinking about circles to the right of h/k, these will be successors in the Farey sequence. Yesterday we said that all of these successors are (m,1)-weighted mediants of h/k with H/K where H/K is the successor of h/k in F_k (the first Farey sequence that h/k shows up in).

Consider, then, the circles corresponding to the (1,1), (2,1), (3,1),… weighted mediants of h/k with H/K. Let me call the circles C_1,C_2,\ldots, using C for the circle at h/k. It isn’t too much work to check the necessary inequalities to show that the center of C_1 is further right than the right-most point on C. I’ve already said, even if not directly, that for each n, C_n is tangent to C_{n+1}. So, if you want, you may start at the top of C_1, and move left along the tops of all the circles in turn, jumping from one to the next at points of tangency. How far can you go doing this? Well, if you take the limit as m\to\infty of the (m,1)-weighted mediant of h/k with H/K, you get h/k (l’Hospital!). So, the tops of the circles form an unbroken path from the top of C_1 to as close to h/k, on the right, as we care to get. The vertical line x=\omega, which passes through the right side of C, must surely pass through this path, and therefore at least one of the C_n.

That’s probably just about enough with Ford circles for now. I’m not sure what I’ll be writing about tomorrow. Hopefully something else fun. I would like to note that, as I was writing this post, I was excited to find a connection to continued fractions. I knew a connection should be there (and I’ll probably find plenty more later), and found it while asking “how can we tell which of the C_n does x=\omega pass through?”. I remembered from reading about continued fractions, [a_0;a_1,a_2,\ldots], that the “convergents”, [a_0;a_1,\ldots,a_n] are rationals p_n/q_n that give “best rational approximations”. Those should be the circles that x=\omega passes through, or so. And so if you pass through p_n/q_n, the next circle you should get to is p_{n+1}/q_{n+1}. Well, it turns out, there is a nice relationship between successive convergents. Namely, p_{n+1}=a_{n+1}p_n+p_{n-1} and q_{n+1}=a_{n+1}q_n+q_{n-1}. Look at that! It’s the (a_{n+1},1)-weighted mediant of p_n/q_n with p_{n-1}/q_{n-1}! Fantastic. Could this story get any better? I’m glad I’ve decided to find out this month.

Neighbors in Farey Sequences

November 5, 2009 by sumidiot

I was reading about Ford circles today, and was planning on writing about them. But then I noticed that one of the things I wanted to say was based on some things I didn’t say about Farey sequences when I talked about them a few days ago. So I thought I’d go back and fill in that gap, and save Ford circles for tomorrow.

So what did I skip? Well, I mentioned that to get from one Farey sequence to the next, you throw in mediants whose denominators aren’t too big. Is there a way, given a fraction h/n in F_n to figure out what terms in F_{n-1} it was the mediant of? Well, we could, of course, list the terms in F_{n-1}, in order, and then pick the two that surround the value h/n. I’m after another way.

Since h/n is supposed to be reduced, h and n are relatively prime, and so there are integers s and t such that hs-nt=1 (this usually (in my mind) gets written with a “+”, not a “-”, but just change the sign of your t). Once you’ve found such a pair (s,t), all of the other pairs that satisfy the same equation are of the form (s+mn,t+mh), where m is any integer. Among the s+mn, there is only one in the interval [0,n). Abusing notation a bit, or assuming you picked well to start, let’s call this value s, and its partner t. I claim that t/s is the term directly before h/n in F_n.

Before talking about t/s, I should have checked that s\neq 0. If this is not the case, then hs-nt=1 means n=1, in which case we are in F_1 which is just 0/1 and 1/1. Not much interesting going on there. So let’s go ahead and assume s>0.

Now, the linear equation hs-nt=1 gets re-written h/n-t/s=1/(ns) to show that h/n>t/s (so the fractions are at least in the correct order). Messing about with inequalities some more, you can check that 0\leq t<s, and since s and t are relatively prime (they satisfy the linear equation), the fraction t/s is actually in F_n. Let a/b be the predecessor of h/n in F_n (we want to show a=t,b=s). Since it is a neighbor, we showed last time that then hb-na=1. But now we’re saying that all solutions of the equation hx-ny=1 have a fixed form. That is, b=s+mn and a=t+mh for some integer m. But if m\neq 0 then b is not in [0,n), and so a/b is not in F_n. So m=0 and a=t,b=s, and we’re done.

Man, ok, so, after all that, we now know how to find the term before h/n in F_n. I’ll call it t/s, sticking with our notation above. What about the term after h/n? You could play mostly the same games as above, I suppose. Or, if the successor is T/S, then we know that h=t+T and n=s+S, since we made h/n as a mediant. So T=h-t and S=n-s, easy enough. [Update 20091109: You have to be careful with this. The above method for finding the successor to h/n in F_n only works because you are in F_n. If you look for successors in h/k in F_n, where k<n, this method, using the predecessor, is wrong. For example, 1/4 is the predecessor of 1/3 in F_5, but the fraction (1-1)/(3-4) suggested by the previous method is clearly NOT the sucessor of 1/3 in F_5]

So we know the neighbors of h/n in F_n. What is the next Farey sequence when h/n gets a new neighbor? Let’s stick to the next time it gets a new neighbor “on the left” (less than itself). We know that when that happens, it had to be because we made the mediant with the old neighbor, our friend t/s. So the next neighbor is (t+h)/(s+n), which doesn’t show up until F_{s+n}. The newest neighbor after that will be the mediant with this new neighbor, i.e., (t+2h)/(s+2n), in F_{s+2n}. Continuing on, we see that all of the neighbors are of the form (t+mh)/(s+mn), for some positive integer m. I called these (1,m)-weighted mediants (of t/s and h/n) yesterday.

The story goes pretty much the same on the right side. I called the neighbor on the right, in F_n, T/S. The next neighbors will be (h+T)/(n+S), (2h+T)/(2n+S), etc. These are the (m,1)-weighted mediants of h/n and T/S.

We can put both of these into the same statement by noticing that the (1,-m) mediant of t/s with h/n is the same as the (m-1,1) mediant of h/n with T/S. In other words, all of the neighbors of h/n, in all of the Farey sequences, can be written as (1,m) weighted mediants of t/s with h/n. They will have numerators of the form a=t+mh and denominators of the form b=s+mn. Guess what? These are exactly all of the solutions to hx-ny=1. Go back a few paragraphs, it’s the same thing I said there.

In summary, the neighbors to h/n, in any Farey sequence, are in one-to-one correspondence with solutions to hx-ny=1.

If I knew what I were doing, I could probably have gotten to this statement a bit more quickly. But I don’t remember the last time I knew what I was doing, so there you have it.

Rational Approximations

November 3, 2009 by sumidiot

Suppose x is an irrational in the interval (0,1), and fix a positive integer n. Let’s see what I can say about rational approximations to x whose denominator is no bigger than n.

First of all, x lies in one of the intervals [j/n,(j+1)/n], where 0\leq j<n. Since x is irrational, it won’t be an endpoint of such an interval, nor will it be the midpoint. Which is to say, it is closer to one end than the other. More explicitly, we can say that |x-j/n|<1/(2n) for some 0\leq j\leq n.

Instead of requiring the denominator of a rational approximation to actually be n, what if we just ask that it be no bigger than n? Then we’re asking how well rationals in the n-th Farey sequence, F_n, approximate x.

Instead of thinking about |x-p/q|, let’s multiply everything by q, and so look at |qx-p| (we’ll divide by that q again soon enough). So now we’re not asking about rational approximations, as such, but asking about pairs of integers, p and q with 0\leq p\leq q\leq n, so that qx is “pretty close” to the integer p.

Let’s re-phrase once more. Consider all of the multiples x,2x,3x,\ldots,nx of x. All of these are irrationals, but could now be bigger than 1 (our original x was less than 1). I don’t want to think about things bigger than 1, so let me trim all of these multiples qx down to just the “fractional part”. Which is to say, I’ll now think about all of the values qx-\lfloor qx\rfloor (the “floor” notation picks out the “greatest integer less than”), which are all irrationals in the interval (0,1). Now for each q I have an integer \lfloor qx\rfloor, and let me denote this integer by p_q. You might note that p_q is always less than n.

So now I am thinking about the n irrational values qx-p_q in the interval (0,1). I’ll think about them in comparison to the rational points 0,1/n,2/n,\ldots,1, like I did before. Following Rademacher (as most of my content has been and will be), we distinguish two cases.

  1. Suppose some qx-p_q falls in the interval (0,1/n). Then |qx-p_q|< 1/n, and so |x-p_q/q|<1/(qn)\leq 1/q^2 (since q\leq n).
  2. If (0,1/n) is free of our points, then by the pidgeon-hole principle, some interval (j/n,(j+1)/n) contains two values, say ax-p_a and bx-p_b (arrange notation so that a<b, forcing also p_a<p_b). Then |(bx-p_b)-(ax-p_a)|< 1/n, which we re-arrange to say that |(b-a)x-(p_b-p_a)|< 1/n. Letting q=b-a and p=p_b-p_a, we have |qx-p|< 1/n, which we again re-write, as in case (1), to say that |x-p/q|< 1/q^2.

We have, therefore, improved the “order” of our approximation. To begin with, we said that rational approximations could be found with |x-j/n|< 1/(2n). I think of this as a “order 1″ approximation, since the denominator of the error is a linear function of the denominator of the rational approximation. We have now improved this so that |x-p/q|< 1/q^2, which I would therefore call an “order 2″ approximation. If you are a little more careful with your inequalities, you can improve the bound to |x-p/q|< 1/(2q^2) (an error that I think I’ll probably end up mentioning again tomorrow), basically for the same reason there is a coefficient of 2 in the order 1 approximation, I guess.

I should probably be a little (a lot) more careful in what I am saying with the above. I chose an irrational x and an integer n, and said that an “order 2″ rational approximation could be found. The stronger claim is that, in fact, infinitely many order 2 rational approximations can be found, if you let n increase. Suppose your first order 2 approximation gives you |x-p/q|< 1/q^2, when you have bounded the denominator by n. Find a new N so that 1/N< 1/q^2. Then go through the process above, using N as the new bound on the denominator. When you find |x-p'/q'|< 1/(q'N)\leq 1/(q')^2 with that process, then you’ve found a new rational approximation (since it is closer to x then p/q was) that is still order 2 (since that’s what the process does for you).

So, can order 2 be improved on? Can you find order 3, or higher? It turns out, “no”, in general (and the golden ratio is an example). You can improve the coefficient of 2 in the order 2 approximation to \sqrt{5}, but that’s apparently the best you can do, in general. There’s more than can be said along these lines. The Roth-Liouville irrationality measure might be fun to talk about (it earned Roth a Fields medal, so it must be ok), as would continued fractions. I’m not sure how much these are related to my stated goal of understanding the link between Farey sequences and the Riemann hypothesis, so for now, perhaps I’ll leave them alone.

For some reason, I thought it might be fun to make a visualization about rational approximations using F_5. Here is a picture of the rationals in F_5, as a subset of [0,1]:

\begin{picture}(200,10) \put(0,0){\line(1,0){200}}\multiput(0,0)(40,0){6}{\circle*{3}}\multiput(50,0)(50,0){3}{\circle*{3}}\multiput(67,0)(67,0){2}{\circle*{3}}\end{picture}

Here’s a graph of the function taking x to “the rational in F_5 closest to x“:

\begin{picture}(200,200) \put(0,0){\line(1,0){200}}\put(0,0){\line(0,1){200}}\multiput(0,0)(40,0){6}{\circle*{3}}\multiput(50,0)(50,0){3}{\circle*{3}}\multiput(67,0)(67,0){2}{\circle*{3}}\put(20,40){\line(1,0){25}}\put(45,50){\line(1,0){13}}\put(58,67){\line(1,0){16}}\put(74,80){\line(1,0){16}}\put(90,100){\line(1,0){20}}\put(110,120){\line(1,0){16}}\put(126,133){\line(1,0){16}}\put(142,150){\line(1,0){13}}\put(155,160){\line(1,0){25}}\put(180,200){\line(1,0){20}}\end{picture}

Farey Sequences

November 3, 2009 by sumidiot

Ok, so if I’m going to talk about Farey sequences and the Riemann hypothesis for MaBloWriMo, then Farey sequences are probably a good place to start.

Let us say that the n-th Farey sequence is the sequence of fractions h/k where 0\leq h\leq k\leq n and h and k are relatively prime (so that h/k is in “lowest terms”). Let’s denote this sequence by F_n. The first few are:

F_1: 0/1\quad 1/1

F_2: 0/1 \quad 1/2\quad 1/1

F_3: 0/1\quad 1/3\quad 1/2\quad 2/3\quad 1/1

F_4: 0/1\quad 1/4\quad 1/3\quad 1/2\quad 2/3\quad 3/4\quad 1/1.

There’s sort of a nice way to get from one Farey sequence to the next: add fractions incorrectly.

The mediant of the fractions h_1/k_1 and h_2/k_2 is defined to be the fraction (h_1+h_2)/(k_1+k_2). It turns out that the mediant of two fractions always lies between them, which you can check by taking differences and noticing they have the correct sign. Now, to make the sequence F_{n+1}, knowing the sequence F_n, you simply include all the existing terms, and any mediants of successive terms that have a denominator no bigger than n+1. So, for example, consider F_4 above. We consider the fractions obtained by adding to this sequence the mediants:

0/1\quad 1/5\quad 1/4\quad 2/7\quad 1/3\quad 2/5\quad 1/2\quad 3/5\quad 2/3\quad 5/7\quad 4/5\quad 1/1

and then we throw out any fraction whose denominator is bigger than 5, giving us

F_5: 0/1\quad 1/5\quad 1/4 \quad 1/3\quad 2/5\quad 1/2\quad 3/5\quad 2/3 \quad 4/5\quad 1/1.

There’s a little something to worry about in this process. Well, probably a few little somethings. The one I’ve got in mind is: are the mediants always reduced? However, you might also ask: are we sure we get all of the fractions this way?

To begin to answer these, let me change the subject. Look at any of the Farey sequences listed above, and pick any two successive terms in that sequence. Call them h_1/k_1 and h_2/k_2, and let’s say that we’ve picked out indices so that h_1/k_1 < h_2/k_2. Now compute h_2/k_2 - h_1/k_1. Done? You got 1/(k_1k_2), didn’t you? This happens no matter what Farey sequence you are in, and no matter which successive terms you take. You always find that, with the notation above, h_2k_1-h_1k_2=1. This is fairly interesting.

We already knew that, for example, h_1 and k_1 were relatively prime. And for a pair h_1,k_1 of relatively prime integers, there always exist integers s,t such that h_1s+k_1t=1. Conversely, if you can find such a pair of integers, then h_1 and k_1 are relatively prime. What’s perhaps a little surprising is that, in this case, the s and t happen to show up in the next term of the Farey sequence: s=-k_2 and t=h_2.

Armed with the observation that the difference between successive terms has a 1 in the numerator, in even just the first Farey sequence, F_1, we can proceed by induction on n to show that if h_1/k_1 < h_2/k_2 are successive terms in F_n then: (1) h_2k_1-h_1k_2=1, and also (2) the mediant of these fractions is reduced. Therefore, if the mediant has a denominator no bigger than n+1, it will get included in F_{n+1}.

Does this process of expanding one Farey sequence to the next by throwing in mediants actually give us all of the terms of the next Farey sequence? That is, if m/n is a fraction in reduced form, is it the mediant of two successive fractions from F_{n-1}?

Let me answer that by considering what I’ll call weighted mediants (a name I just made up, it may or may not show up in your favorite reference). Let me say that the (a,b)-weighted mediant of fractions h_1/k_1 < h_2/k_2 is the fraction (ah_1+bh_2)/(ak_1+bk_2). I’ll assume a,b,h_1,h_2,k_1,k_2>0. So, for example, the normal mediant, defined above, is just the (1,1)-weighted mediant. I claim that: (1) every weighted mediant lies between the two original fractions, (2) every fraction between two successive terms in a Farey sequence can be obtained as a weighted mediant.

Claim (1) is easy, just take appropriate differences and note that they have the appropriate sign, as before. For claim (2), suppose h_1/k_1<p/q<h_2/k_2, where h_1/k_1,h_2/k_2 are successive terms in a Farey sequence, so that h_2k_1-h_1k_2=1. Our goal is to find a,b so that p=ah_1+bh_2 and q=ak_1+bk_2. The assumption h_2k_1-h_1k_2=1 means that we can, indeed, find integers satisfying this condition, namely: a=-k_2p+h_2q and b=k_1p-h_1q.

So, back to the question about obtaining every fraction in F_n by taking mediants from F_{n-1}. We need to show that if m/n<1 and m is relatively prime to n, then m/n is the mediant of two successive fractions in F_{n-1}. Well, m/n certainly lies between two of them, say h_1/k_1 and h_2/k_2 in keeping with our notation, and we have just shown that it may therefore be obtained as some weighted mediant (we want it to be the (1,1)-weighted mediant). Since the denominator of the (a,b)-weighted mediant is ak_1+bk_2, the smallest denominator is obtained only with the (1,1)-weighted (i.e., normal) mediant. This denominator must be at least n, because otherwise the fraction would show up in some Farey sequence before F_n. As m/n is obtainable as a weighted mediant, it must be the (1,1)-weighted mediant, which is what we wanted to show. Let’s call it a day.

MaBloWriMo

November 3, 2009 by sumidiot

I’m at least a day behind, but I’m thinking I might try this “Math Blog Writing Month” suggested by Charles Siegel over at Rigorous Trivialities.

A few months ago I was reading Hardy and Wright’s “An Introduction to the Theory of Numbers” with some friends. While that reading seems to have ended (we made it further than originally expected, to be honest), there was a topic I read about that I decided I wanted to learn more about. When H&W covered Farey sequences, I checked out the Wikipedia page and was fairly delighted to find that there is a statement about Farey sequences that is equivalent to the Riemann hypothesis (RH).

I know nearly nothing about the RH. You take a function defined by an infinite series, extend to the complex plane, notice that the zeroes have to lie in a particular region, and then the claim is that you can narrow down that region even more. Great. Why do I care about the zeroes of this function? Oh, right, because they are related to the distribution of the primes. And how is that again? I have no idea. Probably I should be embarrassed about this, being a math graduate student and not knowing the math behind the most famous open problem.

Well, now seems like as good a time as any to learn more. Farey sequences look fun, and close to some pretty pictures (Ford circles and the ? function). And so my plan is to learn enough about all of that, and the RH, to see if I can understand the relationship between the two. Not so that I can try to prove RH, mind you (I seriously have no delusions about this). It just seems like a fun thing to learn about. Plus maybe it’ll give me an excuse to give a talk in my department’s “graduate seminar”, which I always think is a fun thing to do.

I’ve got a few sources that I plan on looking at (and, indeed, have already been looking at). I found Rademacher’s “Higher Mathematics from an Elementary Point of View” to be quite delightful, and it contains information about the Farey and Ford part of the story. Of course, H&W might also help out here. I’ve got some papers tucked away somewhere I may dig up as well, and when I do, I’ll mention them. And, on the Riemann side of the story, I think I’ll look at Edwards’ “Riemann’s Zeta Function”, which has a section on the connection with Farey sequences.

I figure I’ll try to read at least a little bit every day, and share what I come across here. Perhaps in a month I’ll have some vague understanding (goal: less vague than it is now) about what’s going with the topic. Or perhaps I’ll crap out after a week (or less). But for now, let’s say I’m trying. Starting off a day late (and with an essentially non-mathematical post) means I probably should try to find a day or two and write more than one post. We’ll see. Also, I don’t expect I’ll get close to the 1000 word mark very often. We’ll see.

Trying to be Clever

October 12, 2009 by sumidiot

This week I assigned my calculus class a problem* which involved the integral

\displaystyle\int \theta\sin\theta\cos\theta-\theta^2\cos^2\theta\ d\theta.

If you just dive in and do each term separately, this integral takes 3 applications of integration by parts (IBP). However, if you use your double angle identities, and do IBP on the second term first, then some terms gather up, and you only need one further application of IBP.

I think it is somewhat interesting, or fun, that the very similar integral

\displaystyle\int \theta\sin\theta\cos\theta+\theta^2\cos^2\theta\ d\theta

can be done with a single IBP. Perhaps it isn’t soo surprising, after you spend enough time trying to integrate these functions in various ways.

Now, in all honesty, the problem I assigned actually needed the definite integral

\displaystyle \int_0^{\pi} \theta\sin\theta\cos\theta-\theta^2\cos^2\theta\ d\theta.

I had thought maybe that since the integral is definite, you can use some other cleverness and not need IBP as many times. My idea was to shift the interval to [-\pi/2,\pi/2] using the substitution \alpha=\theta-\pi/2. Then we’ve got a symmetric interval, and odd functions will automatically drop out. Shifting sine or cosine by \pi/2 is not too inconvenient, because it just changes the function to the other (with sign changes in appropriate places).

After shifting, the odd functions drop out, and the remaining functions are even, so you can use symmetry yet again to do the integral over [0,\pi/2]. At some point you also change some \sin^2 \alpha’s into \frac{1}{2}-\frac{1}{2}\cos 2\alpha’s. When you think about \cos 2\alpha on the interval [0,\pi/2], it is odd-symmetric around the mid-point, \pi/4, so those terms drop out of the integral.

In the whole process so far, we have not used IBP at all. In fact, we’ve not yet done a single anti-derivative. Unfortunately, you’re stuck with a term \alpha^2\cos 2\alpha, and I can’t see how to get rid of it, so I break down and use IBP. Then you’ve still got an \alpha \sin 2\alpha, so IBP once more.

So I’ve done a lot of work to end up still doing IBP twice. I don’t know, probably that was all much more trouble than it was worth. But it was fun. Lots of nice applications of symmetry and some trig identities. I’d be delighted to hear any other solutions to this integral, so please share if you’ve got one.

* The problem I assigned: There is a cow tied to a circular silo with a piece of rope whose length is half the circumpherence of the silo. What is the area available to the cow for grazing? This is from Stewart’s Calculus textbook, in the section on parametric curves and areas.

Surface Area of Revolution

September 27, 2009 by sumidiot

Suppose y=f(x) is some function you’d like to rotate around the x-axis (over some interval I), and that you’d like to calculate the surface area of the resulting surface. Dig up your favorite calculus reference, and you’ll find something like the formula:

\int_I 2\pi f(x)\sqrt{1+(f'(x))^2}\ dx.

If you read through the proof, you’ll see that this formula comes from breaking up the surface into lots of little end bits of cones. The way I explained those shapes to my class this semester was: take an ice cream cone and dip the “top” (where the ice cream goes) into chocolate. The bit that’s covered in chocolate is then the sort of shape we’re considering, on each little interval. These little bits give us a reasonable approximations on each interval, and we take a limit and a sum and it becomes the above integral.

Recently, another of the calc instructors asked: Why don’t we approximate each little bit with just cylinders? Well, the first several answers he got (there were a few of us talking about this) were “you get the wrong answer”, essentially. “The cylinder and cone have different surface areas”. Of course they do, but in the limit, why does it matter? Don’t those cylinders get really close to approximating the length of the curve?

Let’s start getting some formulas going. Consider the line y=mx on the interval [a,a+h]. For convenience, take m,a>0. Also consider the horizontal line y=ma on the same interval. Rotating the line y=mx gives us the cone, and rotating the line y=ma gives us the cylinder. Notice that we chose the horizontal line to be the value of the diagonal line at the point x=a.

While we’re thinking about surface area, let’s also think about volumes, to see what is different between the two situations. The formulas for the cylinder are easy, the surface area of the cylinder we made in the previous paragraph is just 2\pi mah, and the volume is \pi (ma)^2h. The formulas for the bit of the cone are slightly more complicated, but doable. If you’ve still got your calc reference out and find these formulas, you should be able to get the surface area to be \pi m\sqrt{m^2+1}(2ah+h^2) and volume as \frac{1}{3}\pi m^2(3a^2h+3ah^2+h^3).

Now what we want to do is compare how the formulas for surface area (and volume) compare as h\to 0. The way to see how they compare is to take their ratio. In the case of volume, if we put the volume of the cone in the numerator, we can simplify the formula to

\dfrac{3a^2+3ah+h^3}{3a^2}

Taking the limit as h\to 0 we get 1. That means that the formulas are essentially equivalent in terms of using them to find volume of a solid of revolution. However, in the case of surface area, the quotient boils down to

\dfrac{\sqrt{m^2+1}(2a+h)}{2a}

which only goes to 1 in the limit if m=0 (i.e., the cone is really a cylinder).

Now, if a student asks about this in calc class, how do we justify this process of taking quotients and looking at the limit? Has anybody ever had a student ask them this question?