Monad Style Comparison – TicTacToe

November 17, 2019

Following the “Book of Monads“, Chapter 13 steps us through various options for building “Custom Monads”, working through an example for the game of TicTacToe. I was excited to play with it a bit and try to understand it better, so I started a repo and threw together a couple versions of implementations. In this post I’ll walk through some of the code. The snippets below are basically copied directly from the repo, based on the state of things at this commit, so you can get full imports and things (and more comments!) if you’re inspired.

Importantly: while I present some things below that work, I’m still very much learning all of this. If you see code below (or in the repo) that could still be simplified or otherwise usefully re-organized, be it in the overall patterns or my use of cats (or any of the other libraries), or you think there’s a theme or pattern that I’m mis-understanding, please feel encouraged to respond on this post or, even better, put up a PR!

Overall setup

Generally the idea is to have some helper types for the board and players, and then define a custom monad that has two methods:

  • info, which allows you to inspect one position in the board and see if a player owns it, and if so, which player
  • take, which assigns a position on the board to a player. Intriguingly the position is an argument to this method, but the player is not.

A few different things can happen if you try to take a position, namely the game could be over (before or after you play), or the spot could be taken, or you could successfully take the position and it’d be the next player’s turn. We’ll capture that with a Return data type below.

While I was playing with things, I ended up altering the above slightly, and adding to it. In particular, I call the intended take method forceTake, to indicate that it is not responsible for checking if the spot is already taken, or if the game is done even, just to assign that spot to the player. This is then paired with a genTake, which relies on info and forceTake, which does all of that checking. The reason for this is that I didn’t want each implementation’s take to have to worry about all of that logic, which would be the same across implementations.

Additionally, one of the questions I had throughout this exercise was around the state of which player’s turn it was. I *think* that the setup above actually only captures or is intended to capture and manage the board state, leaving it up to a secondary layer to worry about the player and/or game state. However, in my implementation, I’ve lumped them all together, so end up adding two more methods (some implementations don’t add the switchPlayer, though there’s no particularly good reason for it, and hopefully the inconsistency there isn’t too distracting):

  • turn to tell us which player’s turn it is
  • switchPlayer to change which player’s turn it is

In my repo, the exploration that I did began with the book’s two monad styles (Final and Initial/Free). I also added more of an “object oriented”-looking implementation, and then sort of an analogous typeclass-based one, and finally, just for fun, one using a “record of functions” approach. I’ll walk through some of these in this post, in more or less detail, but each is also represented by it’s own file in the repo, and there’s hopefully enough useful comments you can poke around there too.

For the various implementations, I generally ended up writing the following library methods, built out of the primitives above:

  • genTake, described above, as sort of a “safe” take, wrapping the implementation’s forceTake
  • runRandom to generate random game play and return the final board state and winning player (if there was one)
  • gameEnded which relies on winner, to tell if the game has ended, and if so, which player won

Ok, enough, let’s look at some code.

“Common” code

The code in this section is used throughout the various monadic implementations, setting up the types that they all deal with.

First up, we need to define the shape of the game board, a 3×3 grid. I’ll use the following, where rows and columns are indexed independently with “F”irst, “S”econd, and “T”hird components:

  sealed trait BoardIndex
  object BoardIndex {
    case object F extends BoardIndex
    case object S extends BoardIndex
    case object T extends BoardIndex

  case class Position(row: BoardIndex, col: BoardIndex)

Another implementation might have just made 9 positions directly, with names like “UL” for upper-left. But here we are.

When I first started seeing code like that, with the sealed trait and then an object with the same name with the various extensions in it, I wondered why you’d bother putting the extensions in the object. I think the benefit is that it keeps the namespace somewhat cleaner, to only have BoardIndex, instead of that and all of its children. If you want to import the children you’re welcome to, but they don’t, by default, clobber the namespace.

We also have the notion of the two players (X and O), and set up some more types to represent the overall state of a game, and a helper for the default initial state of a game.

  sealed trait Player
  object Player {
    case object X extends Player
    case object O extends Player

  type Board = Map[Position, Player]
  case class GameState(player: Player, board: Board)

  val StartingGame = GameState(Player.X, Map.empty)

Finally, some additional utilities around positions to just conveniently access all the available positions, and define what a winning set of positions means. The cats-provided mapN creates all pairs of indices and then pushes them all through the Position constructor – these lines are a little silly here, because there’s only 9 positions to write out, but part of this exercise was an excuse to play with the cats library, so I had to try it (see the “List” portion of this section in “Scala with Cats”)

  val allPositions = {
    import cats.implicits._
    val indices = List(BoardIndex.F, BoardIndex.S, BoardIndex.T)
    (indices, indices).mapN(Position.apply)
  def randomPosition(exceptions: Set[Position]): Position =
    scala.util.Random.shuffle((allPositions.toSet -- exceptions).toList).head

  val winningCombos = {
    import BoardIndex._
         (Position(F, F), Position(F, S), Position(F, T)),
         (Position(S, F), Position(S, S), Position(S, T)),
         (Position(T, F), Position(T, S), Position(T, T)),
         (Position(F, F), Position(S, F), Position(T, F)),
         (Position(F, S), Position(S, S), Position(T, S)),
         (Position(F, T), Position(S, T), Position(T, T)),
         (Position(F, F), Position(S, S), Position(T, T)),
         (Position(T, F), Position(S, S), Position(F, T))

I know the Random bit there has side-effects which should be represented, but I decided not to worry about that for this implementation, at least in part because I figured there should be some library I was supposed to use that had it all set up (vs writing just my own ~one method), and I haven’t found said library yet. Also, the “exceptions” there are a list of positions not to return, which are just sort of handy for things like not trying to take a spot you know is taken.

Finally, we set up a custom type to work as the Result of our genTake method that we’re going to implement, described in the first section above. This type lets you know what happen when you try to take a position on the board, which can either be that the game is over, the position was already occupied, or the position is now taken by the player and it’s the next player’s turn:

  sealed trait Result
  object Result {
    final case class AlreadyTaken(by: Player) extends Result
    final case object NextTurn extends Result
    final case class GameEnded(winner: Option[Player]) extends Result

A fun extension somewhere in here would aim to set up types so that you couldn’t even call take (or forceTake or genTake) on a game that was already completed. But I haven’t set that up.

“OO” style

While in the repo I wrote the “Final” and “Initial” (and “Free”) bits before an “object oriented” version, in this writeup let’s start with the OO style, to see things that are maybe more familiar. Since I started with those other implementations, my implementations for OO-style were copied over and then made to work, to try minimize differences, so might not look like what you might write without that inspiration, but are still reasonable.

So, to begin, we set up our “interface” for a TicTacToe class:

  trait TicTacToe {
    def info(p: Position): Option[Player]
    def forceTake(p: Position): Unit
    def turn(): Player

Returning Unit from forceTake isn’t very functional, and it’s one of the entertaining differences to look at in the various implementations we’ll see in a bit.

It’s relatively easily to implement this class, if we allow ourselves mutable variables:

  class EmbeddedVarTicTacToe(var gs: GameState = StartingGame) extends TicTacToe {

    override def info(pos: Position): Option[Player] =

    override def forceTake(pos: Position): Unit =
      gs = gs.copy(player = Player.other(gs.player), board = gs.board + (pos -> gs.player))

    override def turn(): Player =


Alternatively, we could have forceTake return a new TicTacToe, and then we wouldn’t need an inner var, we could use a val. The OOTypeclass implementation (next section) basically does this, and eliminates the need for the wrapper class, and just use GameState directly. As we’ll see, in the Final style, that’s bundled up in a monad wrapping the return type, so returning Unit there is still functional.

So, let’s have a look at an implementation of how to check if there’s a winner. We have the “common” winningCombos, so basically just need to loop through each, and see if, for a given tuple of positions, if each position is occupied, and occupied by the same player. I think the inner playerWins method can probably still be shortened with some cats built-in, but I haven’t found it yet. Anyway, here’s some code:

  def winner(ttt: TicTacToe): Option[Player] = {
    def playerWins(op1: Option[Player], op2: Option[Player], op3: Option[Player]): Option[Player] =
      for {
        p1 <- op1
        p2 <- op2
        p3 <- op3 if p1 == p2 && p2 == p3
      } yield {

    def comboWinner(pos1: Position, pos2: Position, pos3: Position): Option[Player] =

      .map(t => comboWinner(t._1, t._2, t._3)) // List[Option[Player]]

We need to be able to calculate winners as part of telling if a game is over, which we want to do when somebody tries to take a position. So here’s some logic to check if a game is done and, if it is, return the winner (or None if it was a draw). Roughly, we check first if there’s a winner, using the method above, and if not, we check for a draw (meaning all positions are occupied):

  def gameEnded(ttt: TicTacToe): Option[Result.GameEnded] = {
    def isDraw: Boolean =
        .traverse(pos => // List[Option[Player]] traversed to Option[List[Player]]

    def drawResult: Option[Result.GameEnded] =
      if (isDraw) {
      } else {

    def result(op: Option[Player], or: Option[Result.GameEnded]): Option[Result.GameEnded] = => Result.GameEnded(Some(p))).orElse(or)

    result(winner(ttt), drawResult)

Finally, with those methods, we’re able to define our genTake, which is responsible for checking if a game is over, or a spot is taken, before allowing a player to take a position, and then returning a Result (from the “Common” bit above):

  def genTake(ttt: TicTacToe)(pos: Position): Result = {

    def forceTakeAndCheck: Result = {

    def takeSinceNotDone: Result =
         .fold(forceTakeAndCheck)(p => Result.AlreadyTaken(p))


“OO Typeclass”

As a quick sub-section of the OO implementation, I tried setting up a typeclass, instead of using “normal” inheritance, just to see. As mentioned above, it inspires the forceTake to not return Unit. Besides that, once you add some “bedazzling“, the methods above work as is (the type signature changes to put the typeclass constraint in, but that’s it). Here’s the setup for that code:

  trait TicTacToe[T] {
    def info(t: T, p: Position): Option[Player]
    def forceTake(t: T, p: Position): T // this is sort of a major difference to the OO version
    def turn(t: T): Player

  object TicTacToe {

    implicit case object GameStateIsTicTacToe extends TicTacToe[GameState] {

      override def info(gs: GameState, p: Position): Option[Player] =

      override def forceTake(gs: GameState, p: Position): GameState =
        gs.copy(player = Player.other(gs.player), board = gs.board + (p -> gs.player))

      override def turn(gs: GameState): Player =


    implicit class TicTacToeOps[T: TicTacToe](t: T) {

      def info(p: Position): Option[Player] =
        implicitly[TicTacToe[T]].info(t, p)

      def forceTake(p: Position): T =
        implicitly[TicTacToe[T]].forceTake(t, p)

      def turn(): Player =



winner and gameEnded, like I said, don’t change, and the changes to genTake are primarily because of the not-returning-Unit bit of forceTake:

  def genTake[T: TicTacToe](ttt: T)(pos: Position): (T, Result) = {

    def forceTakeAndCheck: (T, Result) = {
      val nt = ttt.forceTake(pos)
      (nt, gameEnded(nt).getOrElse(Result.NextTurn))

    def takeSinceNotDone: (T, Result) =
         .fold(forceTakeAndCheck)(p => (ttt, Result.AlreadyTaken(p)))

    gameEnded(ttt).map((ttt, _)).getOrElse(takeSinceNotDone)

“Record of Functions” approach

Eschewing inheritance all together, we can also choose an implementation with a class whose vals are what we’d normally represent with an object’s methods in an OO style. So instead of extending the class and overloading the method, you just instantiate the class with the methods defined. Here’s what that looks like:

  case class TicTacToe(
    info: Position => Option[Player],
    forceTake: Position => TicTacToe,
    turn: () => Player
  def embeddedVarTicTacToe(gs: GameState = StartingGame): TicTacToe =
      p => embeddedVarTicTacToe(gs.copy(player = Player.other(gs.player), board = gs.board + (p -> gs.player))),
      gs.player _

The implementations of the methods are all exactly the same as the typeclass bit above (most of which were the same as the original OO code), so I won’t show them. I’d just like to note that this seemed to be a fun and easy pattern.

“Final”, our first monad-style

This is actually the first style discussed in the book. It took me a bit of working through to get any comfort with it, but it was all fun and worthwhile. In this style, we have basically the same starting trait as in the OO version above (though have the explicit switchUser – again, forgive this minor distraction), just we wrap each result type with an unknown F wrapper. We don’t require anything about F, just that it take one type parameter, so that we can use it as a wrapper:

  trait TicTacToe[F[_]] {
    def info(p: Position): F[Option[Player]]
    def forceTake(p: Position): F[Unit]
    def turn(): F[Player]
    def switchPlayer(): F[Unit]

For convenience, I also define (and subsequently import) some syntax methods:

  object TicTacToeSyntax {
    def info[F[_]](p: Position)(implicit ev: TicTacToe[F]): F[Option[Player]] =

    def forceTake[F[_]](p: Position)(implicit ev: TicTacToe[F]): F[Unit] =

    def turn[F[_]]()(implicit ev: TicTacToe[F]): F[Player] =

    def switchPlayer[F[_]]()(implicit ev: TicTacToe[F]): F[Unit] =


Let’s have a look at our first “library” method based off the primitives – trying to determine if a TicTacToe has a winner:

  def winner[F[_] : TicTacToe : Applicative]: F[Option[Player]] = {
    def playerWins(op1: Option[Player], op2: Option[Player], op3: Option[Player]): Option[Player] =
      for {
        p1 <- op1
        p2 <- op2
        p3 <- op3 if p1 == p2 && p2 == p3
      } yield {

    def comboWinner(pos1: Position, pos2: Position, pos3: Position): F[Option[Player]] =
      (info(pos1), info(pos2), info(pos3)).mapN(playerWins)

      .traverse(t => comboWinner(t._1, t._2, t._3))

The structure is set up to look as much like the OO implementation above as I could. Indeed, playerWins is the exact same method. The comboWinner helper is different only because it relies on the mapN – since we list the Applicative constraint on F, this option basically takes a (F[Z], F[Z], F[Z]) and flips it to a F[(Z, Z, Z)], and then since Applicative implies Functor it can .map the playerWins method in. And then the final statement uses an extra traverse and map that aren’t required in the OO version. The fun thing for me in all this was that even though you don’t, in theory, know much about this wrapper F, with these reasonable assumptions (e.g. Applicative) you can still write almost the same thing as in the OO style, you just end up relying on some of the machinery (again, Applicative assumptions).

I’ll note that we expect that we should be able to write the method above using only Applicative and not Monad, because what we’re doing is inspecting the board a bunch of times, but we never use the results of one of those inspections as the input for a subsequent inspection. In fact, I’ll note that I originally used the Monad assumption, and then realized I shouldn’t have to, and you can see the corresponding changes in this diff.

So, that was fun. And basically the story continues for the gameEnded and genTake methods – you can get away with almost the same implementation as the OO code above, you just sometimes have to use some of the Applicative (/Traversable) or Monad-ic machinery (for-comprehensions). Here’s gameEnded which, again, only needs Applicative, and again differs from the OO in a few traverse and mapN calls:

  def gameEnded[F[_] : TicTacToe : Applicative]: F[Option[Result.GameEnded]] = {
    def isDraw: F[Boolean] =
        .traverse(pos => info(pos))
        .map(_.traverse(x => x).isDefined)

    def drawResult: F[Option[Result.GameEnded]] = =>
          if (d) {
          } else {

    def result(op: Option[Player], or: Option[Result.GameEnded]): Option[Result.GameEnded] = => Result.GameEnded(Some(p))).orElse(or)

    (winner, drawResult).mapN(result)

And here’s genTake, where we do need Monad because we’re sequencing operations and using prior results:

  def genTake[F[_] : TicTacToe : Monad](pos: Position): F[Result] = {
    def forceTakeAndCheck(pos: Position): F[Result] =
      for {
        _ <- forceTake(pos)
        ge <-
        _ <- switchPlayer()
      } yield {

    def takeSinceNotDone(pos: Position): F[Result] =
      for {
        op <- info(pos) // op is an Option[Player]
        res <- op.fold(forceTakeAndCheck(pos))(p => (Result.AlreadyTaken(p) : Result).pure[F])
      } yield {

    for {
      ge <- gameEnded
      res <- ge.fold(takeSinceNotDone(pos))(r => (r: Result).pure[F])
    } yield {

One of the funny differences between this and the OO style is that in the OO style, in scala, we’re able to sequence operations by just putting them on different lines and making a val, which in some sense gets bundled up in the for -comprehension’s separate lines. This is some of the intuition for how monad / bind are like the semicolon (that scala doesn’t require, but you could put between lines if you were inspired). The usual caveat that all (mental) models are wrong but some are useful applies.

So, all of the above is still basically setting up a TicTacToe library, but there’s no implementation yet! Our first implementation will use the State monad provided by cats. This is nice, because it means we don’t have to implement our own Monad, as that’s already done for us in cats. For convenience, I create the SGS type alias, fixing the first parameter (the inner state type) of a State to be GameState. This abbreviation will show up again later. In this implementation, you can see all the same GameState machinations we had in the OO version, just all “deferred” because we do them in the context of the State wrapper.

  type SGS[X] = State[GameState, X]

  implicit case object SGSIsTicTacToe extends TicTacToe[SGS] {
    override def info(p: Position): State[GameState, Option[Player]] =
      for {
        game <- State.get[GameState]
      } yield {

    override def forceTake(pos: Position): State[GameState, Unit] =
      for {
        game <- State.get[GameState]
        nb = game.board + (pos -> game.player)
        _ <- State.set(game.copy(board = nb))
      } yield { () }

    override def turn(): State[GameState, Player] =
      for {
        game <- State.get[GameState]
      } yield {

    override def switchPlayer(): State[GameState, Unit] =
      for {
        game <- State.get[GameState]
        _ <- State.set(game.copy(player = Player.other(game.player)))
      } yield { () }

This was my first time actually using the State monad, so it took some getting used to, both the syntax (I’m finding I like the for-comprehensions above, but originally started with some State.apply versions which were given functions) and how to use it later. I just had to remember that State is really a function, so the methods above are building up a composed function, and eventually at some point you have to .run it on a starting state (and then extract the .value depending on how you’re using it).

I didn’t show it for the OO version above, but I also wrote a runRandom method for each implementation, which would generate random gameplay and return who won. Here’s what that looks like:

  def runRandom[F[_] : TicTacToe : Monad](exceptions: Set[Position] = Set.empty): F[Option[Player]] = {
    val rpos = randomPosition(exceptions)
    def cont(r: Result): F[Option[Player]] =
      r match {
        case Result.GameEnded(op)   => op.pure[F]
        case _                      => runRandom(exceptions + rpos)

Now, just as an example, here’s how you could actually use that with the SGS type above (if that SGSIsTicTacToe is in scope):


Property-based tests with an F-wrapper

I was thinking about property-based tests while working on this, so took a little diversion to write some. There’s a few for the “Common” code above, and the “OO” implementation, but I was even more interested to see how things went with the Final-style monadic version, because there’s this F-wrapper in the way all the time. The complete implementation (with links to things to read, and in a project with a build.sbt where things seem to work, which wasn’t trivial), is here, but I’ll talk through some of it in this section too.

To begin, what property or properties do we think we should test? The main one I was interested in trying for was that genTake “does what it’s supposed to.” That means that if the game was already done, you don’t do anything and report that the game was done. And if it wasn’t done, but you asked to take a position that was already taken, you get an AlreadyTaken Result back, and the state hasn’t changed. And, finally, if you do take the position, then the state does change, and wins are checked appropriately.

Translating all of that text to code, basically we’re going to: inspect the state before we call genTake, inspect the state afterwards, and then work through the various scenarios of what could have happened. The following is a version of that which works (though it is a bit involved, and I should move the case class to the containing scope):

  def infoThenGenTakeIsBehaved(pos: Position): IsEq[F[Boolean]] = true.pure[F] <-> {
    case class TestState(maybeEnded: Option[TTTResult],
                         originalPosPlayer: Option[Player],
                         currentPlayer: Player,
                         takeRes: TTTResult,
                         afterTurnPosPlayer: Option[Player],
                         afterTurnCurPlayer: Player)
    def gameIsAlreadyOver(ts: TestState) =
    def alreadyTakenDoneCorrectly(ts: TestState) =
      ts.originalPosPlayer.isDefined && ts.takeRes == TTTResult.AlreadyTaken(ts.originalPosPlayer.get) && ts.currentPlayer == ts.afterTurnCurPlayer && ts.originalPosPlayer == ts.afterTurnPosPlayer
    def playerGetsPosition(ts: TestState) =
      ts.originalPosPlayer.isEmpty && ts.afterTurnPosPlayer == Some(ts.currentPlayer)
    def gameEndsInDraw(ts: TestState) =
      ts.takeRes == TTTResult.GameEnded(None)
    def playerGetsWin(ts: TestState) =
      ts.takeRes == TTTResult.GameEnded(Some(ts.currentPlayer))
    def nextTurnAndStateUpdates(ts: TestState) =
      ts.takeRes == TTTResult.NextTurn && ts.afterTurnCurPlayer == Player.other(ts.currentPlayer)
    for {
      maybeEnded <- gameEnded
      originalPosPlayer <- info(pos)
      currentPlayer <- turn
      takeRes <- genTake(pos)
      afterTurnPosPlayer <- info(pos)
      afterTurnCurPlayer <- turn
      testState = TestState(maybeEnded, originalPosPlayer, currentPlayer, takeRes, afterTurnPosPlayer, afterTurnCurPlayer)
    } yield {
      gameIsAlreadyOver(testState) ||
      alreadyTakenDoneCorrectly(testState) ||
      (playerGetsPosition(testState) &&
        (gameEndsInDraw(testState) || playerGetsWin(testState) || nextTurnAndStateUpdates(testState)))

I decided to define the various helper defs inside to make it a little more readable, so that the final yield basically matches the text description above. The funny little hidden bit in there is the true.pure[F] <-> { hiding at the top. The reason for it is that the for-comprehension yields an F[Boolean], and we have to give scalacheck a Prop, and so the property we’re giving it is “the F-wrapped boolean is always an F-wrapped true”. This relies on somebody else passing us a way to compare F-wrapped booleans, which we provide for our SGS type (with Final implementation above) with the following implicit val:

  implicit val eqSQS: Eq[SGS[Boolean]] = new Eq[SGS[Boolean]] {
    def eqv(a: SGS[Boolean], b: SGS[Boolean]): Boolean =
      List.fill(200)(genGameState.sample).flatten.forall(s => {
        val ans = a.runA(s).value == b.runA(s).value
        if (!ans) {
          println(s"Failed on:\n$s")

Recalling that SGS is our type alias for State[GameState, _], basically, we show here that two States are the same if they evaluate to the same value for any (sample of) same inputs. Effectively, we’ve pushed a generator into our comparison method. I don’t feel like this is necessarily the “right” way to set things up with this testing framework, but I haven’t seen a different way yet. Note the println which helps show what input state it failed on, which I’d rather not have to include explicitly, but am note sure how to avoid, but still get some insight into which cases fail.

Initial and Free styles

The second part of the chapter in “Book of Monads” pivoted toward the Initial and (various) Free styles. Here we look at Initial, which got the least attention from me as I read more and realized Free did everything Initial aimed for, just better. But let’s see how that evolves. Initial is interesting in that it’s the only version we’ll see where we actually have to provide our own evidence that the type is a Monad.

Initial Style

So, to begin, the idea with the Initial (and Free) style is to encode the methods of your monad as data types. Basically each method becomes it’s own type (case classes here). Each input parameter becomes a constructor argument, and the result argument corresponds to another constructor argument, except we wrap it in a “continuation”. That is, instead of returning a Result, we let somebody tell us what they’d do to pick up work after we gave them a Result. It probably helps to look at the code:

  sealed abstract class TicTacToe[A]
  case class Info[A](p: Position, k: Option[Player] => TicTacToe[A]) extends TicTacToe[A]
  case class Take[A](p: Position, k: Result => TicTacToe[A]) extends TicTacToe[A]
  case class Done[A](a: A) extends TicTacToe[A]

Note the additional Done type. If we only had a way for people to keep telling us what they’d do once we produced an answer, but no way for anybody to say we were done, we’d never stop. So the Done gives you that way to stop, because it has no continuation.

Now, that TicTacToe takes a type parameter, which means we can ask if it is a Functor, Applicative, or Monad. Indeed, it is, and writing out the evidence of it being a Functor isn’t too bad (you .map the result of the continuations), nor is Monad (you .flatMap the continuations, which sorta feels like cheating because you’re trying to define flatMap at the time, but recursion is a funny thing). However, Applicative took me a long time to stare at, to write out all the types of things I had in each case and try to jam them all together to return a thing I needed. After I got through a few examples, the remaining cases clicked a bit, and the pattern emerged. I’ll spare you all the code here, but you can check it out in the repo. Also, I never worked out the tail recursive flatMap implementation – so it goes.

Like I said, there seems to be little point writing your own Initial style monad, because Free bundles it all up for you, so let’s look at that.

Free monads! Come get your free monads!

The idea here is to set up case classes for your methods, like in the Initial style above, though they actually look a little more reasonable, hiding the continuation:

  sealed abstract class TicTacToeA[A]
  case class Info(p: Position) extends TicTacToeA[Option[Player]]
  case class Take(p: Position) extends TicTacToeA[Unit]
  case object Turn extends TicTacToeA[Player]

Here, the result types of the “methods” show up on the right, which sorta matches where we (in scala anyway) tend to look for result types, so that’s nice. If you change extends to :, and case class to def, you get basically the Final-style trait we set up. There’s a bit of funniness with the A at the end of the name, TicTacToeA. That’s because this thing we’re defining is an “algebra” (for some meaning). We then use the cats-provided Free machinery to make a monad out of that:

  type TicTacToe[A] = Free[TicTacToeA, A]

We also define some convenience methods that bring us back to “normal” looking defs, of the same shape as the methods in the original Final-style trait:

  def info(p: Position): TicTacToe[Option[Player]] =
    liftF[TicTacToeA, Option[Player]](Info(p))

  def take(p: Position): TicTacToe[Unit] =
    liftF[TicTacToeA, Unit](Take(p))

  def turn(): TicTacToe[Player] =
    liftF[TicTacToeA, Player](Turn)

After that, the implementations of the methods are the same as in Final style, and the only thing that changes is the method signature, where we don’t need the generic type parameter for the method, because we know the wrapper type is the TicTacToe type we just made. For example, here’s the signatures for winner (very similar to the difference between the typeclass and OO styles):

// Final style
def winner[F[_] : TicTacToe : Applicative]: F[Option[Player]]
// Free style
def winner: TicTacToe[Option[Player]]

It’s not really surprising that the implementations wouldn’t have to change. After all, the info, take, and turn methods for the Free implementation above are basically our evidence that this Free TicTacToe monad is in the Final-style TicTacToe typeclass. Again, cats does all the hard work of showing that, since it’s a Free, it’s a monad, so we’re off the hook for having to do that ourselves.

If I may be permitted to blabber a bit, I’ll note that when I think about “Free” things, I think about there being a (smallish) set of generators (like Info, Take, and Turn), and the free construction corresponds to lists of those generators. That is, I can do [Take, Take], or [Take, Turn, Info, Info, Take], or whatever. The Free monad’s job is to remember that whole sequence. Additionally, free things are supposed to be adjoint (which I won’t define) to “forgetful” things. In this case, we could “forget” the monad structure, and only remember the functor structure. The things we will have forgotten are the pure (captured by the Done from initial style, and hidden in the Free machinery) and the flatMap (which, again, Free‘s giving us by chaining together the generators). Certainly others have written this all up better than I am. Category Theory for Programmers, from Bartosz Milewski’s blog, is great. A post related to some of the current topic is this one on F-algebras. Returning, though, to the code at hand…

Now that we’ve set up our tic tac toe library in the Free style, we still need an implementation. This chain of generators, any combination of calls to the info, take, and turn methods, has to actually be used somewhere. Let’s convert it all into some State manipulations, following our Final style implementation:

  def freeState: TicTacToeA ~> SGS =
    new (TicTacToeA ~> SGS) {
      def apply[A](fa: TicTacToeA[A]): SGS[A] =
        fa match {
          case Info(p) => {
            // A must be Option[Player]
            for {
              game <- State.get[GameState]
            } yield {
          case Take(pos) => {
           for {
              game <- State.get[GameState]
              nb = game.board + (pos -> game.player)
              ng = GameState(Player.other(game.player), nb)
              _ <- State.set(ng)
            } yield { () }
          case Turn => {
            for {
              game <- State.get[GameState]
            } yield {

That code converts each of the generator types into the corresponding state manipulation, exactly the same chunks of State-based code as we had in the Free style. What the above allows us to do is to convert a TicTacToe[A] (coming out of the Free[TicTacToeA, _]) into an SGS[A] (again, SGS[A]=State[GameState, A]).

Just for grins, here’s what it looks like to “run” it with random play:


The runRandom produces a (Free) TicTacToe, which we then run through the conversion to SGS with .foldMap(freeState), at which point we can .run it with a StartingGame, and then finally extract the .value.

So… everything’s the same?

After all that, we have ~5 different copies of basically the same code, with some slightly different type signatures. So why bother? I only have vague ideas, because I’m still learning a lot here, but I’ll try to write it out anyway:

Going from OO to Final style, the thing we pick up is the ability to more carefully track other “effects” we might want to expose from an implementation. In the OO style, and in scala in particular, we aren’t required to note that things might throw exceptions or do IO. Going to Final style doesn’t make it a requirement either, but it does make it feasible. We’ll look at this property a little more in the last section of this post, below.

Going from Final to Free is entertaining, bundling up your methods as data objects you can pass around and manipulate. One of the things I haven’t done, but believe should be do-able somehow, is this sort of “compilation” step which might take a Free-style TicTacToe and compress it down so that all calls to info and turn that happen next to each-other boil down to a single call of either (well, collapsing infos that ask about the same Position). The idea there is that calling info on the same position 10 times in a row is really supposed to be exactly the same as doing it once (I guess besides other side effects, like logging or something). Having the whole sequence of operations available to analyze, I think, is supposed to make this sort of compression possible. And then you’d map the smaller thing over to State or whatever other monad as desired. Talking about compressing and compiling a Free TicTacToe reminds me of a recent John de Goes tweet:

I didn’t play with the other next extensions from “Book of Monads”. There’s a freestyle library which makes it easier to combine various free monads (and probably does lots of other useful things). There’s also the Eff monad which, as best I understood it in my relatively quick glance so far, has relatively similar goals. But I learned a lot implementing these other variants, so probably some day I should return to these. And I still haven’t played with zio at all yet…

Other Final wrappers – doobie

Let’s return to Final style briefly. We should be able to provide other implementations of the TicTacToe trait based on other “effects”. So far, we’ve only implemented one based on State transitions. But databases are good too, so let’s try a thing with doobie. Actually, let’s make two. doobie has a notion of a ConnectionIO which, in my relatively limited experience so far, I roughly translate to “an ability to do a query and return some things, if you give me a Transactor“. The Transactor type itself is parametrized on it’s effect, but if we just assume cats-effect‘s IO, then when we give a ConnectionIO a Transactor, we get back an IO, which is still not the thing itself (the result we’re trying to get out of the database), but a basically complete description of how to get the thing itself. Eventually we have to unsafeRunSync it to extract the thing, which we’re only encouraged to do “at the end of the world”, as far out in the edges of our code as we can.

Because I’m lazy, I cheated a bit and didn’t set up a particularly rigorous database for this exercise. Here’s a utility I wrote to do the ConnectionIO to IO conversion, as well as a method to initialize some silly database tables:

object Doo {
  def run[A, T <: Transactor[IO]](transactor: Resource[IO, T])(cio: ConnectionIO[A]): IO[A] =
    transactor.use { xa => cio.transact(xa) }

  def initializeDB[T <: Transactor[IO]](transactor: Resource[IO, T]): Unit = {
    run(transactor)(sql"""DROP TABLE turn IF EXISTS"""
    run(transactor)(sql"""DROP TABLE bps IF EXISTS"""
    run(transactor)(sql"""CREATE TABLE turn (ps VARCHAR)"""
    run(transactor)(sql"""CREATE TABLE bps (rs VARCHAR, cs VARCHAR, ps VARCHAR)"""
    run(transactor)(sql"""INSERT INTO turn (ps) VALUES ('X')"""

Then, some case classes and helper methods to do some of the conversions and unpacking (note that these rely on some methods in the “Common” code this post started with, which I didn’t show – they’re all in the repo):

  case class Turn(ps: String)
  def player(t: Turn): Player =
  case class BoardPositionState(rs: String, cs: String, ps: String)
  def position(bps: BoardPositionState): Position =
    Position(s"${}${bps.cs}").get // sinner
  def player(bps: BoardPositionState): Player =

Given that setup, we can describe some database queries that would reasonably correspond to the methods of the Final style monad:

  object Queries {

    // get the current player owning a position
    def info(p: Position): ConnectionIO[Option[Player]] =
      sql"select rs, cs, ps from bps where rs = ${p.row.toString} and cs = ${p.col.toString}"
        .query[BoardPositionState]  // Query0[BoardPositionState]
        .option                     // ConnectionIO[Option[BoardPositionState]]
        .map(         // ConnectionIO[Option[Player]]

    def take(pos: Position): ConnectionIO[Unit] =
      for {
        curPlayer <- turn
        _ <- sql"delete from bps where rs = ${pos.row.toString} and cs = ${pos.col.toString}"
        _ <- sql"insert into bps (rs, cs, ps) values (${pos.row.toString}, ${pos.col.toString}, ${curPlayer.toString})"
      } yield { () }

    def turn(): ConnectionIO[Player] =
      sql"select ps from turn"
        .query[Turn]  // Query0[Turn]
        .unique       // ConnectionIO[Turn]
        .map(player)  // ConnectionIO[Player]

    def switchPlayer(): ConnectionIO[Unit] =
      for {
        curPlayer <- turn
        nextPlayer = Player.other(curPlayer)
        _ <- sql"update turn set ps = ${nextPlayer.toString}"
      } yield { () }


Indeed, the conversion from there to a Final style implementation of TicTacToe is direct:

implicit case object DoobieConnectionIOTicTacToe extends TicTacToe[ConnectionIO] {
  override def info(p: Position): ConnectionIO[Option[Player]] =

  override def forceTake(pos: Position): ConnectionIO[Unit] =
  override def turn(): ConnectionIO[Player] =

  override def switchPlayer(): ConnectionIO[Unit] =

Now, another reasonable thing we could do, if we had a Transactor sitting around, would be to implement a TicTacToe[IO], roughly similarly to the TicTacToe[ConnectionIO] we just defined:

val t: Transactor = ???
implicit case object DoobieIOTicTacToe extends TicTacToe[IO] {
  override def info(p: Position): IO[Option[Player]] =

  override def forceTake(pos: Position): IO[Unit] =
  override def turn(): IO[Player] =

  override def switchPlayer(): IO[Unit] =

Permit me a moment to cheat a little bit and assume all the implicits can be made to work (even if they have to be explicit), and we can demonstrate how to use these two implementations:

{ // ConnectionIO version
  val transactor: Transactor = ???
{ // IO version

There’s no real surprise there. The IO version already has the and a Transactor provided elsewhere, so we don’t need them down here. But I think the ConnectionIO version is better, because it has delayed who needs to have the Transactor a bit more.

What is surprising (or was to me, anyway, when I started playing with all this), is that the IO version is actually noticeably slow, and the ConnectionIO version is not. Even just in this tiny board, with an in-memory h2 database (which, admittedly, probably wasn’t set up incredibly well), it takes literal seconds for the runRandom[IO] above (on my machine). The runRandom[ConnectionIO] takes no real perceptible time. My understanding of the difference, and again demonstrating the value of ConnectionIO and deferring the decision of the Transactor, is that in the ConnectionIO one, we’re bundling every single manipulation up into one transaction, where in the IO version we might get a few bundled up next to each other, but we still end up making lots of transactions.

It’s really entertaining (to me, with what I know right now), to still try to wrap my head around what’s going on here. We’re generating random moves every step of game play, which affect exactly which database manipulations we make and how many and such, but through the organization above, we’re still able to bundle it all up into one chunk of code that gets executed with one database transaction. I think. Good times.

LDA from scratch

June 13, 2012

The company where I work has a semi-regular, informal book club (for nerds – which we are). Recently, my supervisor has been giving talks on some papers which describe extensions of the Latent Dirichlet Allocation (LDA) topic modelling method. While I felt like I had a reasonable-enough high-level understanding, I didn’t feel like I knew the specifics well enough. If I had to do something where I’d have to make an Author-Recipient Topic Model [pdf], for example, I’d be lost (unless I found an existing R package, or so). I didn’t feel like I knew enough about the background probability, or about how to make a computer do anything given a model description and a bunch of data. So I started investigating things at this level, and presented what I had. There’s still plenty of gaps and uncertainty in my understanding, but I thought it’d be good to get notes together before I got too distracted by other things… So, the main goal with LDA is that you’re given a collection of documents (i.e., a corpus), and you’d like to determine a collection of topics that are represented in the documents (and in which proportion for each document), as well as the words which comprise each topic. You don’t know the topics ahead of time, and aim to just determine them based on the words you see in the document. There’s other approaches for accomplishing this, besides LDA; the most common seems to be Latent Semantic Analysis (LSA), which is very much like Principal Component Analysis (PCA), relying on the Singular Value Decomposition (SVD) of the so-called “term-document matrix”. But that’s all the topic of another day. Today, we’re sticking with LDA, which is a more probability-based approach. In this post, I’ll build up to LDA, starting with much similar models and questions. It’s been a while since I had a probability class, so I started with basics…


Suppose you’ve got a weighted coin, which has a probability \phi of coming up heads. You flip the coin N times, and obtain n heads. What can you say about \phi? Well, the probability of obtaining n heads is given by:


One way we can try to say something about \phi is to ask which value of \phi maximizes this expression. Calculus teachers, rejoice! Since the binomial coefficient in the front is constant with respect to \phi, we can drop it, and simply try to maximize \phi^n(1-\phi^n)^{N-n}. Many students will, naturally, dive right in and differentiate this with product and chain rules. Students who have seen the tricks before (or are especially perceptive) will realize that the expression is maximized when its log is maximized, since log is monotonically increasing. So we aim to maximize n\log(\phi)+(N-n)\log(1-\phi), for 0<\phi<1. This derivative is easy to calculate, and has a single 0, at n/N, which can be verified to be a local/absolute maximum.

So, that’s nice. And a lot of work to determine what everybody probably already knows. If you’ve got a biased coin that you flip 100 times, and obtain 75 heads and 25 tails, your guess probably should be that the sides are weighted 3 to 1 (i.e., that \phi=.75). Knowing that it’s all random to an extent, though, we aren’t certain of our answer. What we’re really like to know is the probability that \phi is 0.75, versus, say, 0.72 (which could also certainly have generated the observed data).

Well, this is what Bayes’ Theorem is for (or, at least, this is an application of it). We just calculated P(n|\phi), and want to, instead, calculate P(\phi|n). Bayes tells us we can do this via

P(\phi|n) = \dfrac{P(n|\phi)P(\phi)}{P(n)}.

The denominator is the same as \int_\phi P(n|\phi)P(\phi)\ d\phi (I’ll write x‘s below to either reduce or compound notational confusion, depending on your viewpoint). Substituting our expression for P(n|\phi), above, and cancelling the resulting binomial coefficients, we obtain

P(\phi|n) = \dfrac{\phi^n(1-\phi)^{N-n}P(\phi)}{\int_0^1 x^n(1-x)^{N-n}P(x)\ dx}.

But there’s still unknown expressions in there! We still need to know P(\phi), in order to find P(\phi|n). That’s sorta frustrating. We could pick any probability distribution for P(\phi), how to we decide which to pick? Well, an easy choice would be the uniform distribution, because then we can just put 1 in the expression above in a few places, and obtain

P(\phi|n) = \dfrac{\phi^n(1-\phi)^{N-n}}{\int_0^1 x^n(1-x)^{N-n}\ dx}.

The denominator there has no \phis in it, and so is “just” a normalization constant (more on this in a moment). That is, P(\phi|n) is basically given by \phi^n(1-\phi)^{N-n}, once you divide by the integral of that expression over the domain (the interval from 0 to 1) so that you’ve got a probability distribution. The distribution we’ve obtained is called the Beta distribution, with parameters n+1 and N-n+1 (the “plus 1s” might become more sensible in a moment). With our 75 out of 100 example from before, this distribution is as follows:

The Beta(x;76_26) distribution

The mode (not the mean) for this distribution is exactly what we calculated before, 75/100.

Now, something somewhat interesting just happened, without our trying much. We didn’t really have many good guesses for what the distribution of P(\phi), so we just picked the uniform distribution. It turns out that the uniform distribution is a special case of the Beta distribution, namely B(x;1,1). What’s noteworthy, if you’re into that sort of thing, is that we started with a “prior” guess for the distribution of \phi of a certain form, B(x;1,1), and then we updated our beliefs about \phi based on the data (the number of heads and tails flipped), and ended up with a “posterior” distribution for \phi of the same “form”, B(x;1+n,1+N-n). This is sort of a happy accident, if you’re required to do this sort of Bayesian analysis by hand, apparently. It makes it easy to update again, if we flip the coin some more times. Suppose we flip several more times and see H heads and T tails. Well, we’d use our latest beliefs about \phi, namely that it came from B(x;1+n,1+N-n), and then update that information as before. All that happened before was that we added the number of heads to the first parameter of the Beta distribution, and the number of tails to the second parameter, so we’d do that again, obtaining B(x;1+n+H,1+N-n+T). In fact, there’s no reason we’d have to cluster our flips into groups – with each flip we could update our Beta distribution parameters by simply adding 1 to the first parameter for a head, and to the second parameter for a tail. Oh, and the more formal description for what’s going on here, besides “happy accident”, is that the Beta distribution is the conjugate prior for the Binomial distribution.

This also suggests that we didn’t have to start with the uniform distribution, B(x;1,1), to have this happy accident occur. We could start with any B(x;\alpha,\beta), and then after observing H heads and T tails, we’d update our beliefs and say that \phi came from B(x;\alpha+H,\beta+T). So maybe there’s a better choice for those initial parameters, \alpha and \beta? In a very real sense, which we’ve already seen with this updating procedure, these parameters are our prior belief about the coin. If we’ve never flipped the coin at all, we’d sorta want to say that \alpha and \beta are 0, but the parameters of the Beta distribution are required to be positive. So we’d say we don’t really know, and choose a “weekly informative” prior, say 0.001 for both \alpha and \beta. The point of choosing such a small prior is that it lets the data have a strong governance on the shape of the distribution we get once we update our beliefs having seen some data. This is because B(x;\alpha+n,\beta+N-n) is very similar to B(x;n,N-n) if \alpha and \beta are small (relative to n an N-n).

If you’d like to play around with actually running some code (and who doesn’t?), below is some R code I threw together. It simulates a sequence of flips of a weighted coin, and plots the progression of updates to our belief about \phi as we run through the flips.

coinWeights <- c(.65,.35)

prior <- c(1,1)

numFlips <- 10

flips <- sample(1:2, numFlips, prob=coinWeights, replace=TRUE)

betaDist <- function(alphaBeta) {
  function(x) {
    dbeta(x, alphaBeta[1], alphaBeta[2])

colors <- rainbow(numFlips)

plot(betaDist(prior), xlim=c(0,1), ylim=c(0,5), xlab="", ylab="")
for (n in 1:numFlips) {
  prior[flips[n]] <- prior[flips[n]] + 1
  func <- betaDist(prior)
  curve(func, from=0, to=1, add=TRUE, col=colors[n])

legend("topleft", legend=c("prior",paste("flip",c("H","T")[flips])), col=c("#000000",colors), lwd=1)

I ran it once, obtaining the following plot: So that’s fun. You might play with that code to get a sense of how choosing bad priors will make it take a while (many coin flips) before the posterior distribution matches the actual coin. Before moving on to more complicated models, I’d like to mention that calling the denominator we obtained when we applied Bayes’ Theorem, above, “just” a normalizing constant, above, is a dis-service. The denominator is a function of the parameters \alpha and \beta, as is called the Beta function. Right off the bat, Calculus teachers can again rejoice at the integral determining B(.5,.5):

B(.5,.5)=\int_0^1 \dfrac{dx}{\sqrt{x}\sqrt{1-x}}=\cdots=\pi.

That integral’s doable by hand, and, unfortunately, an example of an exam question Calculus teachers probably would wet their pants over. But, better than simple tricks where goofy looking integrals “happen” to be related to circles (where did \pi come from, there, anyway?), we also have

B(x,y) = \dfrac{\Gamma(x)\Gamma(y)}{\Gamma(x+y)},


\Gamma(z) = \int_0^{\infty} e^{-t}t^{z-1}\ dt

is the analytic extension of (z-1)! to the complex plane (our binomial coefficients from before seem to have been wrapped up into the normalizing constant, B(\alpha,\beta)). While I’m sure there’s no shortage of places you could read more about either the Beta or Gamma functions, I must say that I was enthralled with Artin’s book on the Gamma function. Delightful. Ok, moving on.


Let’s take a moment to slightly alter our notation, making it easy to extend to more dimensions (number of sides on the coin/die). Let’s write \phi_1 and \phi_2 for the probability of obtaining a head or tail on a single flip, respectively. Similarly, let’s write n_1 and n_2 for the number of observed heads and tails, respectively, with N=n_1+n_2 still the total number of flips. Just to double-check if you’re following, our original probability of obtaining this data from the parameter can now be written

P(n|\phi) = \dfrac{N!}{n_1!n_2!}\phi_1^{n_1}\phi_2^{n_2}.

It’s probably now relatively easy to see how to generalize this. Suppose there are V possible outcomes of our “coin” (/die/…), and that the probability of rolling side i is given by \phi_i (letting \phi=(\phi_1,\ldots,\phi_2)). Suppose we roll our die N times, obtaining side i a total of n_i times (and letting n=(n_1,\ldots,n_N). Given just the sequence of rolls (and the number of sides), what can we say about \phi? The multi-dimensional generalization of the Beta distribution is the Dirichlet distribution. It takes parameters \beta=(\beta_1,\ldots,\beta_V) and is defined over the points x=(x_1,\ldots,x_V) having 0<x_i and \sum_i x_i=1 (i.e., x lies on the hyper-triangle (simplex) spanned by the V canonical unit vectors in V-dimensional space). Its density function is

D(x,\beta) = \dfrac{1}{B(\beta)}\prod_i x_i^{\beta_i-1}.

As before, we can treat B(\beta) as “just” a normalization constant. One useful thing about the Dirichlet distribution is that it’s defined over a particular subset of V-dimensional space – namely, the subset where the coordinates sum to 1 and are all positive. So if we draw a sample \phi from D(x;\beta), then \phi=(\phi_1,\ldots,\phi_V) can be thought of a discrete probability distribution defined on a set of size V. More in line with the die example, a sample from D(x;\beta) gives us weights for the sides of our die. The Beta distribution had the nice updating property described above – if we thought the parameters where \alpha and \beta and then we observed n_H heads and n_T tails, our new belief about the weight of the coin is described by parameters with values \alpha+n_H and \beta+n_T. The Dirichlet distribution updates just as nicely: If we previous thought the parameters were \beta (a vector of length V), and then we observed counts n (n_i the number of times we rolled side i), then our new parameters are simply \beta+n. One of my goals, coming in to this learning project, was to actually figure out how to use OpenBUGS to do Bayesian analysis. Instead of doing the algebra by hand above to determine the posterior distributions (update our beliefs after we see some data), we’re supposed to be able to code up some model and provide it, with the data, to OpenBUGS, and have it return something like the answer. More specifically, BUGS, the Bayesian inference Using Gibbs Sampler, runs a Markov Chain Monte Carlo algorithm (presumably Gibbs sampling, based on the acronym), and returns to us draws from the posterior distribution. So even if we don’t know the analytic expression for the posterior, we can let OpenBUGS generate for us as many samples from the posterior as we’d like, and we can then address questions about the posterior distribution (e..g, what is the mean?) by way of the samples. I’d love, of course, to dig in to the mechanics of all this, but in the process of preparing my talk I found that another guy at work was already a few steps ahead of me down that road. So I’m going to let him tell me about it, which I very much look forward to. For now, my goal will simply be to figure out how to get OpenBUGS to answer the same question we’ve already done analytically in the case of a coin (V=2): Given the sequence of rolls, what can it tell me about \phi? To start, let’s return to our apparently simple die-rolling procedure. What we have is a “generative model” for producing a data set. Specifically, we run through the following steps:

  1. Choose a distribution of weights for sides of our die, by picking \phi from D(\beta).
  2. For each of N iterations,
    1. Roll the die, recording the side, w_i, a sample from the categorical distribution Cat(\phi).

Having written what’s going on in this manner, we’re basically right at a description of the model in the BUGS language. What we’ve just written would get translated as:

model {
  phi[1:V] ~ ddirich(beta[])
  for (n in 1:N) {
    w[n] ~ dcat(phi[])

(Your mileage may vary with ddirich versus ddirch.) However, this model isn’t quite complete enough. We haven’t said anything about \beta in this model, but it certainly isn’t data we’ll be feeding in to the model (that’d be w, the sequence of rolls). This is where I fall back on “wave the wand” – there’s still a fair amount of magic, for me, built into the priors when working with Bayesian inference. I’ll need to learn more about this, but for now, I’ll tell BUGS to use weakly informative priors (like we did with the coin). The complete model, then, is described to BUGS as:

model {
  phi[1:V] ~ ddirich(beta[])
  for (n in 1:N) {
    w[n] ~ dcat(phi[])
  for (v in 1:V) {
    beta[v] <- 0.001

But how does BUGS know what “V” and “N” and “w[n]” are? Well, we have to feed it the data. For this, I’ll let the R package rbugs do the hard work. I’ll work with relatively familiar R objects (which still surprise and frustrate me regularly), and let rbugs translate to and from OpenBUGS for me. The model above needs to be saved in a file, say ‘die.txt’. We can then use the following R code to play with this model and generate some pictures for us:

library(rbugs) # talk to OpenBUGS
library(coda) # MCMC utilities (mostly unused below)
library(gtools) # gives us ddirichlet

# set up parameters
# args <- strsplit("100 c(.7,.3) 1 5000 1000 .001", " ")[[1]]
args <- commandArgs(trailingOnly = TRUE)

# die parameters
n.rolls <- as.integer(args[1]) # 100
pre.phi <- eval(parse(text=args[2])) # c(.7,.3)
n.sides <- length(pre.phi) # 2 (calculated)

# mcmc parameters
n.chains <- as.integer(args[3]) # 1
n.iter <- as.integer(args[4]) # 5000
n.burnin <- as.integer(args[5]) # 1000
default.beta <- as.numeric(args[6]) # .001
# generate data
rolls <- sample(1:n.sides, n.rolls, prob = pre.phi, replace=TRUE)

# precalculate right answers
post.beta <- default.beta + tabulate(rolls)
post.phi.marginal <- cbind(post.beta, sum(post.beta)-post.beta) # phi[s] is Beta(beta[s]-1, sum(beta)-2)
post.phi.mean <- post.beta / sum(post.beta)
post.phi.mode <- (post.beta-1) / (sum(post.beta)-2)
post.phi.max <- apply(cbind(post.phi.mode, post.phi.marginal), 1, function(r) { dbeta(r[1], r[2], r[3]) })
# set up data to pass to bugs <- list(
 V = n.sides,
 N = n.rolls,
 w = rolls
# generate initial values. rbugs doesn't seem to handle empty initial values well
inits <- function() {
 list(beta = rep(default.beta, n.sides))
# parameters to be saved, to compare with our algebraic calculations above
parameters <- c("phi")

# give us some samples! (make sure you have a directory 'tmp' under the pwd)
die.sim <- rbugs(, inits, parameters, "die.txt",
 n.chains=n.chains, n.iter=n.iter, n.burnin=n.burnin,
 bugsWorkingDir="tmp", bugs="/usr/local/bin/OpenBUGS", cleanBugsWorkingDir=TRUE)
# coda provides some utilities for if you know enough about mcmc, which i don't
die.mcmc <- rbugs2coda(die.sim)
# die.mcmc is a list, with length = n.chains
# die.mcmc[[n]] has dimension c((n.iter-n.burnin), # variables)

base.filename = paste(args, collapse="_")

# save some stuff, in case you feel like poking at it
save(die.mcmc, args, rolls, file=paste(base.filename, 'RData', sep='.'))

# set up some drawing utilities
colors <- rainbow(n.sides)
xlim <- c(-.05, max(c(pre.phi, max(post.phi.mean)))+.1)
ylim <- c(-.05, 1.05)

draw.mean.line <- function(x, y, col, lty) {
 lines(c(x,x), ylim, col=col, lty=lty)
 text(x, y, col=col, labels=sprintf("%.03f", x))

draw.samples.density <- function(chains, varname, col, lty) {
 samples <- chains[[1]][,varname]
 den <- density(samples)
 den$y <- den$y / max(den$y)
 lines(den, col=col, lty=lty)
 draw.mean.line(mean(samples), -.03, col, lty)

# draw the sample distributions, as well as the given distrbution
png(paste(base.filename, 'png', sep='.'), width=1000, height=800)
plot(NULL, xlim=xlim, ylim=ylim, xlab=NA, ylab=NA, main=base.filename)

for (idx in 1:n.sides) {
 # actual
 draw.samples.density(die.mcmc, paste("phi[",idx,"]", sep=""), colors[idx], 1)

# show the original chosen values for phi
 draw.mean.line(pre.phi[idx], 1.03, colors[idx], 4)

# plot the algebraic answer, scaled to have max 1
 curve(dbeta(x, post.phi.marginal[idx,1], post.phi.marginal[idx,2]) / post.phi.max[idx], lty=2, add=TRUE, from=0, to=xlim[2])
 text(post.phi.mean[idx], .5+.1*(n.sides/2)-.1*(idx), labels=sprintf("%.03f",post.phi.mean[idx]))
legend("topleft", col=colors, lty=1, legend=paste("[",1:n.sides,"]",sep=""), title="phi")

For grins, here’s the output of one of my runs: Sample run, learning the weights of the sides of a weighted die Can you work out how many rolls there were for each of the sides? Having a script like this makes it pretty easy to investigate some questions:

  1. What happens if I have more rolls (N)? Fewer?
  2. If I pick a bad prior (e.g., a \beta with large values), what happens to the output? How can it be compensated for? To what extent?
  3. What about those iterations and burnin parameters? And just how does BUGS give us these samples anyway? Another day…

Basically there’s a bunch of knobs to twiddle, and all sorts of fun can be had. But let’s see if we can actually get closer to LDA, shall we?

Hierarchical Dice

Let’s add a layer of indirection. Suppose we’ve got a collection of weighted dice (K of them), each with the same number of sides (still V), but each possibly weighted differently. Let’s also suppose we’ve got a separate weighted die with K faces. Let’s call the die with K sides the T-die, and the other dice will be the W-dice. Now instead of just rolling a single die repeatedly, we’ll repeatedly (still N times) do the following:

  1. Roll the T-die to decide which of the W-dice to roll.
  2. Roll the appropriate W-die, and record the result.

Given the sequence of rolls (where a ‘1’ on one W-die is not distinguished from a ‘1’ on another W-die), along with K,V,N, can we determine the weights of the T-die and W-dice? Naturally, we should expect that this will go better when N is relatively large. Perhaps someday I’ll come back and fill in the algebra here, but for now let’s just see what we need to do in order to get OpenBUGS to help us out again. First off, we don’t know the weights of the T-die, and we want it to help us learn those. Let’s say they are \theta_1,\ldots,\theta_K, or more concisely the vector \theta. As before, this is a discrete probability distribution, and we’ve seen that a sample from a Dirichlet gives us such a beast. So it seems sensible to suppose that \theta is chosen from some Dir(x;\alpha), for some hyper-parameter \alpha (a K-dimensional vector). As before, we’ll feed in weakly informative priors. Next up, we have the weights of the sides on each of the W-dice. Before, we only had one such die, and the weights of its sides were \phi. So let’s stick with \phi for these weights, but now it must become a matrix – specifically, it has to be a K\times V matrix. Each row in this matrix is, again, a probability distribution, so we’ll assume \phi_{i,\bullet} is chosen from some Dir(x;\beta), where \beta is a single V-dimensional vector. All together, then, the model looks like:

model {
  theta[1:K] ~ ddirich(alpha[])
  for (k in 1:K) {
    phi[k,1:V] ~ ddirich(beta[])
  for (n in 1:N) {
    z[n] ~ dcat(theta[])
    w[n] ~ dcat(phi[z[n],])
  for (k in 1:K) {
    alpha[k] <- .1
  for (v in 1:V) {
   beta[v] <- .001

There’s enough parameters around that I haven’t bothered trying to make a nice clear picture of the outputs of this. The changes necessary to simply run the thing, beginning with the R code above, are pretty minimal – take in a couple extra parameters, generate samples via the two-step process, and figure out what you’d like to do with the outputs, and you’re set. Of course, I haven’t worked out the algebraically correct answers here… But we’re now knocking on the door of


Instead of only having a single T-die, suppose you’ve got several (M, say). After your N rolls using your first T-die, and their tabulation to n, from before, you decide to repeat the process with the second T-die (roll it N times, each time rolling the appropriate W-die and recording the answer), and then go through it all again with the third T-die, and so on. Each time you use the same set of W-dice… Ok, the dice viewpoint has gotten stretched far too thin. Who would have that many weighted dice, and be interested in all those rolls anyway? Let’s translate over to document collections. Suppose you’ve got a collection of M documents (a “corpus”, in the parlance of our time), each with N words (for notational convenience, there’s no reason they all really need to have the same length). Suppose, moreover, that each document is actually a blend of any of K topics, and that each document pertains to each topic in different proportions (weights of your T-dice). Finally, suppose each topic is represented by any of V words (finally, V for “vocabulary”!), and that the different topics have different weights for the V words (your W-dice all had potentially different weights for their sides, but all had the same number of sides). Given only the word sequences for the documents, the job of LDA is to uncover the topic distribution for each document (the weights of the sides of the associated T-die) as well as the word frequency distributions corresponding to each topic (the weights of the various W-dice). But really, this is only a step or two harder than uncovering the weights of a biased coin, which we began with. There’s sort of an assumption floating around in here that the order of words in documents doesn’t much matter. The brief look I had the original LDA paper [pdf] has me thinking that this has nice theoretical consequences concerning the form of the model for generating documents. Basically, under our assumption that documents are generated by repeatedly picking a topic (rolling a T-die) and then picking a word based on the word weights for that topic (rolling the appropriate W-die), the parameters have to be as described as in the following BUGS model:

model {
  for (k in 1:K) {
    phi[k,1:V] ~ ddirich(beta[]) # weights of W-die k
  for (d in 1:M) {
    theta[d,1:K] ~ ddirich(alpha[]) # weights of T-die d
    for (n in 1:N) {
      z[d,n] ~ dcat(theta[d,])    # the n-th T-die roll for document d (choose a topic)
      w[d,n] ~ dcat(phi[z[d,n],]) # the n-th W-die roll for document d (choose a word from topic z[d,n])

I took the priors out, above; mostly I still feel like they’re a gap in my understanding. But whatever, good enough. It’s natural at this point to wonder how you’d take the word order into account anyway; I haven’t looked into it much, but there’s at least one paper [pdf] on the subject. One might also wonder how to to determine the number of topics if that constant (K) isn’t given to you with the word count tabulations. Again, this has been addressed [pdf], though I haven’t dug into it a whole lot. And let’s not forget, let’s not forget, that I still have basically no idea how or why Gibbs sampling (and/or other MCMC methods – Metropolis-Hastings, I’m looking at you) works. Finally, I was honestly a little surprised to find that the OpenBUGS model above, and the R script I coded up to use it, took something like 8 hours on sample data I generated to throw at it (maybe 1000 words, 3 documents, 2 topics, or so). I then looked at the lda package in R, manipulated my sample data so I could feed it to that, and got answers out in about a second. Clearly there’s something else going on there. My investigations so far have me thinking the difference is probably described in this paper, but it, along with everything else I’ve just mentioned, is still in the “todo” pile. So, more questions than answers, though I feel like I answered a few questions for myself in all of this. It’s probably some sort of life lesson that learning one thing just opens up the doors for not knowing a whole bunch more, or something. Hopefully I’ll keep chipping away on these or related topics (or, at least, something interesting). And maybe I’ll just find the energy to write about it, to help point out all the things I don’t understand. Anyway, thanks for coming with me on this. Sorry it took me so long, and left so many questions. In addition to the links above, I found this paper helpful/interesting.

Update 2015071: Corrected variable of integration in formula for $\Gamma(z)$, thanks to comment by Joel Nothman.

Printing in foreach’s dopar

November 5, 2011

I’ve recently been doing some work in R, and using the foreach package for some easy parallelization. I’d like to be able to spit out logging messages throughout my code, but the print lines always disappear when they are in the code executed by the %dopar% operator. So I’ve been working on trying to come up with a nice way to still capture/print those logging messages without having to change much of my existing code. I think my ultimate goal would be to write another binary operator which I can use as a drop-in replacement for %dopar%, which uses dopar under the covers, and which gets my print lines to the console. I haven’t quite gotten to that point yet, but I might be close-ish, and, either way, what I’ve got seems like it could either be useful for others, or something others could suggest how to fix.

My Setup

Following this post on closures in R, I set up most of my code as a bunch of ‘objects’ (environments). For example, I construct a logger object by calling the following function:

get.logger <- function() {
  logger <- new.env()
  logger$message <- function(str) {
    print(paste(date(), str))
  environment(logger$message) <- logger

In sort of my ‘main’ code, I then do something like logger <- get.logger(), and then pass this logger around to my other objects which do more heavy lifting. For example, I might have a function:

guy.using.logger <- function(lgr) {
  guy <- new.env()
  guy$.logger <- lgr
  guy$do.stuff <- function() {
            .combine=function(left, right) { c(left, right) }) %dopar% {
      .logger$message(paste('Working on piece', i))

  environment(guy$do.stuff) <- guy


And then I’d have something like the following in my main code (in addition to setting up the parallel backend and such):

logger <- get.logger()
gul <- guy.using.logger(logger)

That final line return a vector with 10 elements, the squares of the integers 1 through 10, as expected. However, the thing I’m trying to overcome is that none of the expected print lines show up on the console.

Misguided Quest

I admit that this may be somewhat of a misguided quest. Changing the %dopar% to simply %do%, all of the print lines show up. And maybe I shouldn’t be doing objects the way I have them. I like to think an argument could be made for still trying to make my logging work within %dopar%. I’m not going to try to make an argument though. If you think it’s a useless mission, go find something else to read – there’s plenty of other interesting things online. Or stick around anyway, perhaps you’ll find something interesting here, or be able to help understand whatever it is I’m missing to make it work the way I want.

Buffering Messages

The basic idea I’ve been working with is to replace the logger with one that doesn’t actually print lines, but stores them up so they can be retrieved and printed later. So, for starters, I made a buffering logger:

get.buffering.logger <- function() {
  logger <- new.env()
  logger$.log <- c()
  logger$message <- function(str) {
    .log <<- c(.log, paste(date(), str))
  logger$get.log <- function() {
    rv <- .log
    .log <<- c()
  environment(logger$message) <- logger
  environment(logger$get.log) <- logger

Now, if I just make logger <- get.buffering.logger() and pass that to guy.using.logger, I can’t successfully ask for the messages after a call to do.stuff(). My guess is that the issue is the lack of thread-safety with vector/c(). This isn’t too upsetting, because I don’t really want to be buffering my messages all that long anyway. If I return the logger’s buffered messages as part of the results at the end of the expression evaluated in %dopar%, then in the .combine of the foreach, I could print those messages there. More explicitly, I could do:

complicated.guy.using.logger <- function(lgr) {
  guy <- new.env()
  guy$.logger <- lgr
  guy$do.stuff <- function() {
    combiner <- function(left, right) {
      cat(right$msgs, sep='\n')
      c(left, right$ans)
            .combine=combiner) %dopar% {
      .logger$message(paste('Working on piece', i))
      list(ans=i^2, msgs=.logger$get.log())

  environment(guy$do.stuff) <- guy


logger <- get.buffering.logger()
cgul <- complicated.guy.using.logger(logger)

And I get all my print lines, and the same return value from the %dopar% call (the vector of the first ten squares) as before. Admittedly, I’m a little surprised that this seems to work, now that I think about it. The shared logger is buffering messages for several threads, which seems sort of troublesome. I guess they’re kept separate enough by the threads, or I’ve just gotten lucky the twice I’ve run the code above.

Jazzing Things Up

If I only had one or two foreach calls in my code, I’d probably just fix them as above and be content enough (if the last thread-safety concerns calmed down). However, there’s clearly room for some automation here. What has to happen?

  1. The expression evaluated by %dopar% needs to return whatever it did before, as well as the accumulated messages during the execution.
  2. The combine function needs to return what it did before, and print out the logging messages.

The easy part, from my perspective, is the second part. Let’s assume we can solve number 1, as we did in the previous section. That is, the new block will return a list, with an ‘ans’ element which has whatever was returned previously, and a ‘msgs’ element. Recall that foreach() actually returns an object of class foreach. Using summary() to get a sense of that object, it seems easy enough to write a function which takes a foreach object and returns a new foreach object which is basically the same as the original, but has a modified .combine function. Let’s try

jazz.fe <- function(fe) {
  r.fe <- fe
  r.fe$combineInfo$fun <- function(left, right) {
    cat(right$msgs, sep='\n')
    fe$combineInfo$fun(left, right$ans)

We need a similar function which modifies the expression to be evaluated (the block of code on the right of %dopar%), so that the return values are manipulated as described above. I messed around a little bit with passing unevaluated expressions around, and came up with the following:

jazz.ex <- function(ex) {
  parse(text=c('r<-', deparse(ex), 'list(msgs=.logger$get.log(), ans=r)'))

And then these bits can be used as in the following guy:

jazzed.guy.using.logger <- function(lgr) {
  guy <- new.env()
  guy$.logger <- lgr
  guy$do.stuff <- function() {
            .combine=function(left, right) { c(left, right) })) %dopar% eval(jazz.ex(quote({
      .logger$message(paste('Working on piece', i))

  environment(guy$do.stuff) <- guy


This code should be compared with the very first guy.using.logger. The only difference between the two is that we wrapped the foreach in a function call, and also wrapped the expression in… a few calls. The ultimate goal of a drop-in %dopar% replacement is tantalizingly close. If all I need to do is call some function on the foreach object, and some other function on the expression, and then I can run %dopar%, that’s easy:

'%jdp%' <- function(fe, ex) {
  jazz.fe(fe) %dopar% eval(jazz.ex(ex))

And then I could just do

failing.ultimate.guy.using.logger <- function(lgr) {
  guy <- new.env()
  guy$.logger <- lgr
  guy$do.stuff <- function() {
            .combine=function(left, right) { c(left, right) }) %jdp% quote({
      .logger$message(paste('Working on piece', i))

  environment(guy$do.stuff) <- guy


logger <- get.buffering.logger()
fgul <- failing.ultimate.guy.using.logger(logger)

No dice:

> fgul$do.stuff()
Error in e$fun(obj, substitute(ex), parent.frame(), e$data) : 
  unable to find variable ".logger"

Rats. I haven’t yet found the right combination of quote, eval, substitute, … to make this work, and I’ve tried several. I’ve been reading about environments and frames and lexical scoping and closures, and still haven’t gotten it right. If I put the definition of %jdp% in the do.stuff function, it actually works. But that’s ugly. Nothing else I’ve come up with works and involves as few changes as wrapping the two arguments to the %dopar% operator, as in the jazzed.guy.using.logger above.

So, if anybody’s got any suggestions, just let me know. In the mean time, I’ll either poke around in the %dopar% source for inspiration, or move on to other things.

An R Mumble

September 18, 2011

This weekend I attended beCamp, “Charlottesville’s version of the BarCamp unconference phenomenon”. Basically a tech meetup with talks, talking outside of talks, food, and drinks. All around, a good time. This was my first beCamp, but I enjoyed it, and will probably try to go to others in the future.

The way the camp goes, you show up Friday night and people stand up and say things they could talk about, if there’s interest. And then people put marks about if they’d attend or whatever, and a schedule of talks for Saturday comes together. When I signed up for the camp a week beforehand, my intention was to spend some free time during the week preparing a talk about R. Didn’t happen, but I stood up Friday and said I could maybe say a few words about it anyway. I got a few ‘Learn’ checkmarks, indicating a little interest. Lucky for all involved, though, I didn’t get officially scheduled to talk. Of course, I didn’t know that until I showed up Saturday, having spent about 2 hours that morning trying to throw something together, just in case. I can’t say I was too disappointed with not having to talk, though it could have been fun. At lunch, there was about an hour of “lightning talks”, just 5 minute talks. While I was sitting there, I figured that would actually be a good amount for me. Just as the line of talkers for that was starting to wind down, and I was thinking about joining it, a handful more queued up. Those talks used all the remaining time, so I was, again, off the hook.

But, hey, I’ve got this venue, I can talk about stuff whenever I want, right? So here’s some notes about R I was just about prepared to mumble about Saturday.

According to the webpage, “R is a free software environment for statistical computing and graphics.” It’s something like an open source clone of the S software for stats processing. The language has some sort of interesting aspects to it, and also some things that really get me sometimes. R has some good built-in “call for help” utilities, making it sort of easy to pick up and do fairly interesting things fairly quickly. Perhaps one of the best things is the huge collection of freely available libraries. I’ll try to talk about some or all of these things, showing by example and not being too formal (me, informal?), but hopefully still fairly close to correct (or, at least, usably incorrect). Folks wishing for more can certainly jump to the official introduction. John D. Cook has some good notes about picking up R if you know some programming (I’m assuming you do), and I also found “The R Inferno” [pdf] informative.

Right, so lets look at some code. I always feel, with a language like R, you don’t start off with “Hello World”, but with calculations. Because it’s just so novel to have a calculator:

> 2+2
[1] 4
> exp(5)
[1] 148.4132
> (1i)^(1i)
[1] 0.2078796+0i
> exp(pi*(1i))
[1] -1+0i

Those lines with the arrow in front are the input lines, where you type, and the other lines are the output. Anyway, pretty exciting, no?

I should mention that you can reproduce this, or other examples, by running R from the command line, or using R-specific editors (or emacs add-ons). I’ve been using RStudio recently, and, while it’s got its issues, it’s got some nice features too. Worth a check-out.

Ok, so R does the normal hello world too, and has if-statements (imagine!) and for loops (how are you not using it already!?). The ‘:’ operator is pretty helpful for making ranges, as the example shows:

> print("hello world")
[1] "hello world"
> if (TRUE) { print("yep") } else { print("nope") }
[1] "yep"
> for (n in 1:10) { print(log(n)) }
[1] 0
[1] 0.6931472
[1] 1.098612
[1] 1.386294
[1] 1.609438
[1] 1.791759
[1] 1.94591
[1] 2.079442
[1] 2.197225
[1] 2.302585

Here’s an example of writing our own function. Pretty straightforward… note that ‘<-‘ is the assignment operator (‘=’ would also work here, but I think ‘<-‘ is more R-ey). There’s another assignment operator, ‘<<-‘, which, I think, is a little bit of a bad habit to use. It’s along the lines of making the thing on the left a global variable, but I’d rather not get too much into that sort of thing (i.e., things I don’t understand at all, instead of things I can at least pretend a little about). I seem to recall reading somewhere the R using lexical scoping rules. If you know what that means, good for you. I think it applies here, because if we didn’t assign to fact, then the call to fact at the end of the function would fail. Oh, and note you can return things in R with return(), but more typically results get returned by just having them be the last line of the function (in my (limited) experience).

> fact <- function(n) {
+     if (n <= 1) {
+        return(1)
+     }
+     n*fact(n-1)
+ }
> fact(3)
[1] 6

There’s a few ways to iterate over a bunch of things and apply a function to each object in turn (i.e., “map”). A simple way is with sapply. In the following example, we use sapply to get the factorial of the values 0 to 4, then plot the results with lines connecting the values. We add a title to the plot, and re-plot the points. Note that we pass 0:4 in to specify the x-coordinates of the values; R indexes from 1 by default (I don’t hold R personally responsible for this, since they’re trying to be S-compatible, but still). Anyway, example (run it yourself for the picture):

> first.facts <- sapply(0:4, fact)
> plot(0:4, first.facts, xlab="n", ylab="fact(n)", type="l")
> title("Factorial")
> points(0:4, first.facts)

Plotting can get just about as complicated and involved as you’d like. So now’s probably a good place to introduce R’s help system. If you want to find out more about a command, just use ‘?’:

> ?plot

There’s another help operator, ‘??’, I’ll use later. (I think I actually saw there’s a third, ‘???’, but I haven’t used it). Another cool thing about R is that you can look at the source for functions. Just type the function without the parentheses:

> title
function (main = NULL, sub = NULL, xlab = NULL, ylab = NULL,
    line = NA, outer = FALSE, ...)
    main <- as.graphicsAnnot(main)
    sub <- as.graphicsAnnot(sub)
    xlab <- as.graphicsAnnot(xlab)
    ylab <- as.graphicsAnnot(ylab)
    .Internal(title(main, sub, xlab, ylab, line, outer, ...))
<environment: namespace:graphics>

Ok, only so informative, since it passes work off with that .Internal call, but the principle is there.

I want to do one more function example, because it shows sort of a fun thing you can do. Most functions allow you to define default values for arguments. R lets you define the value in terms of passed in values. When you call the function, you can pass in values by naming the arguments, so you don’t have to worry about order. And, when you do, you can actually cheat and use initial substrings of the argument names. An example is in order:

> <- function(feet) {
+   feet/3
+ }
> <- function(yards) {
+   yards*3
+ }
[1] 1
[1] 24
> convert <- function(, {
+ 	print(paste(feet, "feet is", yards, "yards"))
+ }
> convert(3)
[1] "3 feet is 1 yards"
> convert(yards=8)
[1] "24 feet is 8 yards"
> convert(y=5)
[1] "15 feet is 5 yards"

(that example is based on something I read in a blog post, but I can’t find the link… it was talking about the things I said in the last paragraph, and did an example of converting polar to cartesian coordinates, if memory serves…) (Update 20111004 – found it).

Anyway, I said R was for stats… we should do some of that.

> # random sample of size 100 from normal with mean 7, s.d. 3
> nums <- rnorm(100, 7, 3)
> # calculate sample standard deviation
> sd(nums)
[1] 2.655835
> # plot a histogram
> hist(nums)
> # have a quick look at nums
> summary(nums)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-0.6268  4.8390  6.8200  6.5500  8.4870 11.8500

One of the really fun things about R is “vectorized” operations (terminology taken from the R Inferno, mentioned above… could be standard-ish though, I dunno). In particular, the usual arithmetic operations are applied componentwise to vectors (and typing ‘2’ is a vector of length one, for example). Shorter vectors are recycled. There’s lots of ways to make vectors, ‘:’ was used above, c() is easy, as is seq(). Anyway, here’s an example:

> (1:10)*c(2,3)
 [1]  2  6  6 12 10 18 14 24 18 30
> c(1*2, 2*3, 3*2, 4*3, 5*2, 6*3, 7*2, 8*3, 9*2, 10*3)
 [1]  2  6  6 12 10 18 14 24 18 30

Fitting lines to data sounds like a statsy thing to do, and R is for stats, so let’s do that. Let’s take the nums we made above and use them as offsets from the line y=5x. Then we’ll fit a line to that data, and it’s slope should be pretty close to 5. Note that ‘.’ isn’t special in R like it is in many other languages, so it’s typically used to separate words in variable names (although, it can be used in formulas, see later). Sort of the equivalent to other languages’ ‘.’ is ‘$’, exemplified below.

> sloped.nums <- 5*(1:100) + nums
> sloped.line <- lsfit(1:100, sloped.nums)
> print(sloped.line$coefficients)
Intercept         X
 6.256159  5.005810

I’ll leave it for the curious to sort out why the intercept is 6ish. Of course, if you’re running this at home, you’ll get different numbers, owing to the random sampling. Anyway, we should probably plot the thing and make sure it seems ok:

> plot(1:100, sloped.nums)
> abline(sloped.line, col="RED")

Looks fine to me. Of course, it’d probably be cool to try something with actual data. If you’ve got a csv sitting around, R is quite good at reading them in. If you don’t have a csv sitting around, I found some census data, and it seems R will open csvs across the wire (which is kinda hot).

> census <- read.csv("")

So… what’d we just get? Well, we could read about it, or just mess with it. We’ve already seen summary(), so we can try that. Below, I’ve only shown the top of the output.

> summary(census)
     SUMLEV      REGION    DIVISION      STATE               NAME
 Min.   :10.00   0: 1   5      : 9   Min.   : 0.00   Alabama   : 1
 1st Qu.:40.00   1:10   8      : 8   1st Qu.:12.00   Alaska    : 1
 Median :40.00   2:13   4      : 7   Median :27.00   Arizona   : 1
 Mean   :38.07   3:18   1      : 6   Mean   :27.18   Arkansas  : 1
 3rd Qu.:40.00   4:14   0      : 5   3rd Qu.:41.00   California: 1
 Max.   :40.00   X: 1   3      : 5   Max.   :72.00   Colorado  : 1
                        (Other):17                   (Other)   :51
 Min.   :   493782   Min.   :   493783   Min.   :   493958
 1st Qu.:  1808344   1st Qu.:  1808344   1st Qu.:  1806962
 Median :  4301261   Median :  4302015   Median :  4328070
 Mean   : 14878497   Mean   : 14878639   Mean   : 14918075
 3rd Qu.:  8186453   3rd Qu.:  8186781   3rd Qu.:  8230161
 Max.   :281421906   Max.   :281424602   Max.   :282171957

Each of these headers is a name we can use to index into the census object. It’s technically a “data frame”, one of the types in R. That means it’s basically a (not-necessarily-numeric) matrix (as you might expect from a csv table), each column has the same number of entries, and within a column, all of the entries are the same type (no mixing strings with numbers). The names are the column names, and you can use the ‘$’ operator to get at any particular column.

> head(census$NAME)
[1] United States Northeast     Midwest       South         West
[6] Alabama
57 Levels: Alabama Alaska Arizona Arkansas California Colorado ... Wyoming

The last line of output indicates that census$NAME is a “factor”, one of the types in R. Basically a list of values all taken from a set. I don’t want to say too much about it. While I’m at it, though, we might as well talk about indexing into R objects. It’s one of the cool things about R. Let’s grab that NAME column, and convert it to strings:

> # $NAME and [['NAME']] do the same thing
> states <- as.character(census[['NAME']])
> states[c(5, 18, 37)]
[1] "West"       "Idaho"      "New Mexico"
> states[-(c(1:10, 15:50))] # negative indices = remove those ones
 [1] "Colorado"                 "Connecticut"
 [3] "Delaware"                 "District of Columbia"
 [5] "Vermont"                  "Virginia"
 [7] "Washington"               "West Virginia"
 [9] "Wisconsin"                "Wyoming"
[11] "Puerto Rico Commonwealth"
> states[c(11:14,51:57)]
 [1] "Colorado"                 "Connecticut"
 [3] "Delaware"                 "District of Columbia"
 [5] "Vermont"                  "Virginia"
 [7] "Washington"               "West Virginia"
 [9] "Wisconsin"                "Wyoming"
[11] "Puerto Rico Commonwealth"
> which(substr(states, 1, 1) == "P") # substr is, apparently, "vectorized"
[1] 44 57
> states[which(substr(states, 1, 1) == "P")]
[1] "Pennsylvania"             "Puerto Rico Commonwealth"

Like I said, census is basically a table. You can pull out rectangular bits of it easily, as the example below shows. And it’s easy enough to generalize that a little, and start pulling out whatever bits you want. If you leave one of the coordinates in the ‘[]’ empty, it means “everything”. So census[,] is the same as census (for some definition of “same as”).

> census[1:6,1:5] # rows 1 to 6, columns 1:5
1     10      0        0     0 United States
2     20      1        0     0     Northeast
3     20      2        0     0       Midwest
4     20      3        0     0         South
5     20      4        0     0          West
6     40      3        6     1       Alabama

Right, we’re supposed to be doing statsy things. We should be actually playing with the data… Let’s pick out some easy bit. Let’s play with the “POPESTIMATE2009”, “BIRTHS2009”, and “DEATHS2009” values for just the states.

> state.rows <- c(6:13, 15:56)
> <- census[state.rows, c("POPESTIMATE2009", "BIRTHS2009", "DEATHS2009")]
> summary(
 POPESTIMATE2009      BIRTHS2009       DEATHS2009
 Min.   :  544270   Min.   :  6407   Min.   :  3140
 1st Qu.: 1802408   1st Qu.: 25748   1st Qu.: 14667
 Median : 4403094   Median : 58800   Median : 36536
 Mean   : 6128138   Mean   : 85094   Mean   : 49617
 3rd Qu.: 6647091   3rd Qu.:100143   3rd Qu.: 58657
 Max.   :36961664   Max.   :558912   Max.   :241733
> dim(
[1] 50  3
> colnames(
[1] "POPESTIMATE2009" "BIRTHS2009"      "DEATHS2009"
> rownames(
 [1] "6"  "7"  "8"  "9"  "10" "11" "12" "13" "15" "16" "17" "18" "19" "20" "21"
[16] "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36"
[31] "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48" "49" "50" "51"
[46] "52" "53" "54" "55" "56"
> rownames( <- as.character(census$NAME[state.rows])
> head(
           POPESTIMATE2009 BIRTHS2009 DEATHS2009
Alabama            4708708      62265      47157
Alaska              698473      11307       3140
Arizona            6595778     103956      49657
Arkansas           2889450      40539      28003
California        36961664     558912     241733
Colorado           5024748      72537      31679

To get an idea how the variables relate to each other, you can plot each of them against each of the others quickly, with pairs():

> pairs(

We fit a line to data earlier, and can expand on that here. Let’s model the population estimate given the births and deaths. I’m not trying to display some dazzling insight into the data, just show how easy it is in R to do this sort of thing.

> l <- lm(POPESTIMATE2009 ~ .,
> print(l)

lm(formula = POPESTIMATE2009 ~ ., data =

(Intercept)   BIRTHS2009   DEATHS2009
  -73663.83        42.52        52.07

> summary(l)

lm(formula = POPESTIMATE2009 ~ ., data =

     Min       1Q   Median       3Q      Max
-1074015  -205561   -12694   171717   997020

              Estimate Std. Error t value Pr(>|t|)
(Intercept) -73663.831  70178.173   -1.05    0.299
BIRTHS2009      42.518      1.540   27.62   DEATHS2009      52.075      3.099   16.80   ---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 342400 on 47 degrees of freedom
Multiple R-squared: 0.9976,	Adjusted R-squared: 0.9975
F-statistic:  9652 on 2 and 47 DF,  p-value: < 2.2e-16

That looks good and statsy.

If you don’t have your own data, by the way, R comes with some, and many libraries bring in their own, for examples. Just poking around a little (here’s to tab completion), I found, for example, that R comes with a data frame with state-by-state arrest counts by crime (USArrests</code). I mentioned earlier that '?' is good for getting help on a command. If you want to find a command to use, use '??'. This frequently returns lists of packages that you can poke at. One of the great things about R is that many of the libraries you can install come with 'vignettes', providing a little tutorial on some of the tools in the package. R makes it very easy to install packages (install.packages()).

I’m sort of running out of steam for the evening, so think I may wrap this up. I had sort of envisioned talking about a couple of fun packages. Guess I’ll save that for another post (this has gotten long anyway), and try to do some cool examples, with prettier pictures.

The Prime Number Theorem

July 9, 2011

So one of the few math books I’m still fairly interested in poking at sooner rather than later, since graduating and moving on to a computer programming job, is “The Prime Number Theorem”,  by Jameson. It claims to be accessible at the undergraduate level, so I’m sort of optimistic I can follow it. When I’m feeling ambitious, I figure I’ll read the book and do the exercises, and even write posts about what I learn as I go. And cook more. And eat healthier. And…

Right, so I started reading, and got to the first set of exercises. I know I’ve done the second and third before. The second (show there are arbitrarily large gaps between primes) I actually remember doing from my freshman year as an undergraduate, so didn’t put much into it. The third (show p_n\leq 2^{2^{n-1}} and that this implies \pi(x)\geq (\log\log x)/(\log 2)) I remembered doing while I was reading Hardy & Wright more recently, and decided to see if I could get the inequalities right again. It took a few minutes, but I think I got there.

The first exercise, though, took me an embarrassing amount of time. Good start, huh? The problem reads:

Let ≥ 1 and let = {30n+r : 0 ≤ r ≤ 29}. For which values of r is 30n+r not a multiple of 2, 3, or 5? By considering the possible positions of multiples of 7, show that E contains at most seven primes (seven cases, no short cuts!)

Ok, so the first part of that question I got without much difficulty (actually, I got wrong without much difficulty). It’s just all the other primes bigger than 5 but less than 30: 7, 11, 13, 17, 19, 23, 29 (see my error yet?). The second part got me though (since I got the first part wrong). There’s only 7 numbers in that list I just picked out, so what more is there to show? Any 30n+r that’s prime must have an r from that list, because everybody else is divisible by 2, 3, or 5, right?

Nope. Stupid 1. The case r=1 isn’t accounted for. So that means that primes could actually show up in any of 8 spots. So that’s where the hint about multiples of 7 comes in. The first multiple of 7 in the 30 consecutive integers comprising E corresponds to r = 0, 1, …, 6, and then for any of those choices, you can easily write down what the other multiples of 7 are. For example, if the first multiple of 7 in E is with r = 3, then the other multiples are at 10, 17, and 24. Notice that, in this case, the prime 17 would have to get taken out of the 8 possible spots for primes (since it is now known to be a multiple of 7), and we’d be down to 7 primes. The same thing happens in the other 6 cases about the multiples of 7, and we obtain the result.

So that slightly rocky start, those 5 pages of reading and 3 exercises, took just under a week, start to finish (not that I was actively working on it that every day). And, in all honestly, I only sorted out that first one when I sat down to start writing, here, that I couldn’t figure it out. But whatever. There’s my first post since graduation. Perhaps I’ll keep learning some math after all (in addition to the math/stats sorts of things I’m learning for work). Here’s to fewer errors in section 2!

A Math Prezi

November 21, 2010

Recently I had to give a math talk about my work. Previous talks I’ve given I just did on the chalkboard, but this being my last math talk for a while, I thought I might finally try Beamer (nice quickstart). And then I realized I should try Prezi. I thought I’d share my exploration with you, in case you try to decide about something similar. If you want to just cut to the chase and see the final Beamer work (pdf or source tarball), or the final Prezi, go for it. If for some strange reason you actually care about the math, you’re welcome to read the paper (pdf or source tarball) my presentation was based on. My final recommendation: stick with Beamer (until Prezi starts handling TeX).

I started off making a Prezi just to see how it worked, how easy it was to use and such. It’s pretty simple to use, which is nice. Of course, as one comes to expect, it doesn’t handle LaTeX. Ok, so, no worries, I’ll just do some TeX to make pictures I need, and insert pictures. Prezi will actually let you put in pdfs. So what you could do is make a beamer presentation, with a very basic template and no titles, and then use “pdftk” to “burst” the pdf, making a single pdf for each frame, and then use ImageMagick‘s “convert” to change the file type, do some cropping and trimming and things (if desired), and then load all those little pictures where you want them. You probably won’t want to use the actual pictures as path steps in your prezi, because it’ll zoom all the way in, but that’s ok, because you can just wrap a hidden frame where you want it.

So I then made another prezi, putting text where I wanted, thinking I’d then go in and make lots of little pictures, and put them where I wanted. This prezi did lots more zooming and twisting, so it was sorta fun, but I did worry a little if it might turn some people off (or make them motion sick :)). And then I realized I didn’t really want to make all of those little pictures, that the fonts wouldn’t match, and that I’d probably still have resolution issues (I couldn’t find an easy way to make, say, svgs, from tex).

Ok, so, maybe I could cheat a little bit. You can get away with a lot if you just use italics for math. I thought maybe I could use this for most of the math, and then rely on fewer pictures to have to insert. Sadly, prezi won’t do italics (or underline, or bold!). That was fairly surprising to me. No LaTeX I basically expect, but no italics? Well, ok, maybe I can cheat another way. Surely there’s Unicode characters for most of what I want, I could just type those in. But no, prezi (I’d happily blame flash here, I don’t know what wasn’t really working) wouldn’t do that either. I’d type unicode characters, and nothing would show up. I’d copy a unicode character typed elsewhere, and try to paste it in, and nothing. Sigh.

I pretty much gave up at that point, and made a beamer presentation. But the prezi urge just wouldn’t die. I decided that if I took whole frames from my beamer presentation, and added those to my prezi, I would (a) have consistent fonts, (b) wouldn’t have lots of tiny pictures to upload, and (c) probably wouldn’t do as much twisting and spinning and zooming, and would maybe, then, end up with a better presentation. One could pdftk burst and convert like I mentioned before, but I think I was having some issues getting good resolution that way (looking back, I question this, so you may want to play around). So I decided I could take screenshots of every frame, when it was in full-screen mode, and use those as my pictures. ImageMagick’s “import” takes screenshots, and with the ‘window -root’ option, it grabs full-screen. I don’t know how to force xpdf to turn the page from outside the program, so I set up a quick little bash script that would beep, sleep 2 seconds, and then import a screenshot. Switch workspace to my full-screened xpdf (put ‘xpdf*FullScreenMatteColor: white’ in .Xdefaults and do ‘xrdb -merge .Xdefaults’ before running xpdf), and just press the spacebar after every beep. Badabing. 2-3 minutes later, and I’ve got a 1280×800 image of every frame from my presentation. Upload to prezi, twist, zoom, and you’re done.

Except, no. Prezi has the dis-fortune of having to work on any screen resolution. I don’t know what they’re magic zoomer does to decide how to zoom, but things don’t go great if you try to present my prezi fullscreen at a different resolution. And, unfortunately, I ended up in a room with a computer whose screen was at a different resolution, and that I wasn’t allowed to change. So I fell back on my beamer talk. :-/ People said it was good anyway.

According to the support forums and associated prezi, I maybe should have been able to figure this out. Perhaps converting to pngs was my downfall. I really thought I tried keeping things as pdfs. I’ve been wrong before. Oh well, it’s over now. And I did learn other fun stuff with all this fiddling.

While I was doing all this, I finally figured out how to use pstricks to do text in a circle (or along other paths). I think I’ve tried before, but never quite figured out what was going on with \PSforPDF, even if I was able to put text on a path. But this time I got it, thanks to this presentation [pdf] (which I probably looked at before, too). If you’re working on project.tex, put all the pstricks stuff in \PSforPDF blocks, run latex, dvips, and ps2pdf, eventually outputting project-pics.pdf. Then when you run pdflatex project.tex, since you’re doing pdflatex instead of latex, \PSforPDF probably expands to some sort of \includegraphics[project-pics], and imports the *-pics.pdf (making that -pics assumption about the filename) you just made. Good stuff. LaTeX will probably be one of the things I miss the most about getting out of mathematics in academia.

Palindromes in Python

October 23, 2010

So you may have noticed my lack of math posts recently. Things change, you know. I might still have a few left in me, at some point. Either way, this blog may start having more programming. I’m putting this one here instead of at Euler’s Circus, only because it’s not a Project Euler problem. Getting on toward time to consolidate my blogging… maybe when I’m done with grad school (soon!).

Anyway, right. So here’s a programming post. I’m not sure how I came across the Programming Praxis blog, but one of their recent posts caught my eye: find the longest palindrome in a string. Given a string, what is the longest palindrome contained in that string? I thought about it (actually thought about it, instead of just thinking I would like to think about it) for a few minutes this evening, and came up with a little something that pleased me. Python‘ll do that.

So the idea is to build an array whose n-th item is an array of all of the indices in the string where a palindrome of length n begins. I guess that means I think of this array as 1-based, instead of 0-based, but whatever. The reason I like this idea is that if you’ve found all of the palindromes of length less than n, you can easily pick out the indices where palindromes of length n start as those indices, i, where (1) the string has the same character at i and i+n-1, and (2) i+1 is a palindrome of length n-2. The other reason I like this idea is that it’s really easy in python:

# assume text is a string of only lowercase letters, nothing else
text = "astringiwouldreallyliketofindlongpalindromesin"
tlen = len(text) # convenience, it's a few characters shorter to type
# keep track of where palindromes of all lengths are
# longs[n][m] = 1 if there is an (n+1)-digit pali beginning at index m
# and longs[n][m] is undefined otherwise
# prep longs with the easy cases, 1- and 2-digit palindromes
longs = [ [n for n in xrange(0,tlen)],
          [n for n in xrange(0,tlen-1) if text[n] == text[n+1]] ]

# iterate
while len(longs[-1]) > 0 or len(longs[-2]) > 0:
    curlen = len(longs)
    longs.append([n for n in xrange(0, tlen-curlen)
                  if text[n] == text[n+curlen]
                  and n+1 in longs[-2]])

winners = longs[-3] # [-1] and [-2] are empty, after all
winlen = len(longs)-2
print "\n".join([text[n:n+winlen] for n in winners])

So, there’s that. All of the interesting work gets done in that one line in the loop. You gotta be a little bit careful, watching for off-by-ones, but this seems to work. I thought about trying some other algorithms to compare efficiency. But I like this one, even if I’d eventually find it isn’t the best in terms of efficiency. I had another little bit in there that I eventually realized was unnecessary, but I still thought was fun:

# build a dictionary, character: array, called locs, with
# locs[c] = [locations of occurences of 'c' in text]
locs = dict([(c,[]) for c in string.lowercase])
map(lambda nc:locs[nc[1]].append(nc[0]), enumerate(text))

There may be a built-in python-y way to do this, but I like this anyway. I guess I’m just a sucker for map. And list comprehension.

A homotopy limit description of the embedding functor

February 25, 2010

(Approximate notes for a talk I gave this afternoon.)


So according to the title, I should be telling you about {\text {Emb}(M,N)}, as a functor of manifolds {M} and {N}. That’s perhaps a bit ambitious. I’ll only be thinking about {M=\coprod _{m} D^{n}}, a disjoint union of closed disks, and I’ll actually fix {M}. And instead of letting {N} range over any manifolds, it’ll just be {V}, ranging over real vector spaces.

By taking derivatives at centers, we obtain a homotopy equivalence from {\text {Emb}(M,V)} to something I’ll maybe denote {\text {AffEmb}_{0}(\coprod _{m} \mathbb {R}^{n},V)}. This is componentwise affine (linear followed by translation) maps {\coprod _{m} \mathbb {R}^{n}\to V} whose restriction to {\epsilon }-balls around the 0s is an embedding. I may use {\underline{m}=\{1,\ldots ,m\}}, and write {\underline{m}\times \mathbb {R}^{n}}. And I’ll actually send everything to spectra, instead of topological spaces, via {\Sigma ^{\infty }}.

So really I’ll be talking about

\displaystyle  \Sigma ^{\infty }\text {AffEmb}_{0}(\underline{m}\times \mathbb {R}^{n},V)

as a functor of {V}. I’ll be lazy and write it {\Sigma ^{\infty }\text {Emb}(M,V)}, having fixed an {m} and {n} to give {M=\underline{m}\times D^{n}}.

Useful Cases

The first useful case is when {M=\underline{m}} (i.e., {n=0}). Then embeddings are just configuration spaces, {F(m,V)}. I’ve talked before about a homotopy limit model in this case, but let me remind you about it.

The category I have in mind is something I’ll denote {\mathcal{P} (m)}. Objects will be non-trivial partitions of {\underline{m}}, and I’ll probably denote them {\Lambda }, perhaps writing {\Lambda \vdash \underline{m}}. Non-trivial means that some equivalence class is larger than a singleton. I’ll write \Lambda\leq \Lambda' if {\Lambda } is finer than {\Lambda'}, meaning that whenever {x\equiv y\pmod{\Lambda }}, then {x\equiv y\pmod{\Lambda '}}.

The functor I want is something I’ll denote {\text {nlc}(-,V)} and call “non-locally constant” maps. So {\text {nlc}(\Lambda ,V)} is the set (space) of maps {f:\underline{m}\to V} such that there is an {x\equiv y\pmod{\Lambda }} where {f(x)\neq f(y)}. Equivalently, maps which don’t factor through {\underline{m}/\Lambda }.

Depending on which order you make your poset {\mathcal{P} (m)} go, {\text {nlc}(-,V)} is contravariant, and you can show

\displaystyle  \Sigma ^{\infty }F(m,V)\simeq \text {holim}_{\mathcal{P} (m)}\Sigma ^{\infty }\text {nlc}(-,V).

The second useful case is when {M=D^{n}} (i.e., {m=1}). Then the space of embeddings is homotopy equivalent to the space of injective linear maps. You can obtain a homotopy limit model in this case that looks strikingly similar to the previous case. Namely, you set up a category of “linear” partitions (equivalent to modding out by a linear subspace), and take the holim of the non-locally constant maps functor, as before.

I like to think of both cases as being holims over categories of kernels, and the non-locally constant maps of some kernel are maps that are known to fail to be not-injective for some particular reason. Embeddings fail to be not-injective for every reason.

But there’s another model I want to use in what follows. My category will be denoted {\mathcal{L} (n)}, and objects in the category will be vector spaces {E} with non-zero linear maps {f:E\to \mathbb {R}^{n}}. Morphisms from {f:E\to \mathcal{R}^{n}} to f':E'\to \mathbb{R}^n will be surjective linear maps \alpha:E\to E' with f=f'\circ \alpha. You might think of the objects as an abstract partition ({E}) with a map to {\mathbb {R}^{n}}, which then determines a partition of {\mathbb {R}^{n}}, by taking the image.

The functor out of this category is something I’ll still denote {\text {nlc}(-,V)}. On an object {f:E\to \mathbb {R}^{n}} it gives all non-constant affine maps {E\to V}. Arone has shown

\displaystyle  \Sigma ^{\infty }\text {Inj}(\mathbb {R}^{n},V)\simeq \text {holim}_{\mathcal{L} (n)}\Sigma ^{\infty }\text {nlc}(-,V).

Product Structure

The space of embeddings we are considering splits, as

\displaystyle  \text {Emb}(M,V)\simeq F(m,V)\times \text {Inj}(\mathbb {R}^{n},V)^{m}.

We know a homotopy limit model of each piece of the splitting, and might hope to combine them into a homotopy limit model for the product. This can, in fact, be done, using the following:

Lemma: If {\Sigma ^{\infty }X_{i}\simeq \text {holim}_{\mathcal{C}_{i}}\Sigma ^{\infty }F_{i}}, {i=1,\ldots ,k}, then

\displaystyle  \Sigma ^{\infty }\prod _{i} X_{i} \simeq \text {holim}_{*_{i}\mathcal{C}_{i}}\Sigma ^{\infty }*_{i} F_{i}.

Here {*} denotes the join. For categories, {C*D} is {(C_{+}\times D_{+})_{-}}, obtained by adding a new initial object to each of {C} and {D}, taking the product, and removing the initial object of the result.

Proof (Sketch): Consider the case {k=2}. The idea is to line up the squares:


Both of which are homotopy pullbacks. The equivalence of the lower-right corners follows because join is similar enough to smash, which plays nicely with {\text {holim}}.

So, anyway, applying this lemma and perhaps cleaning things up with some homotopy equivalences, we obtain an equivalence

\displaystyle  \Sigma ^{\infty }\text {Emb}(M,V)\simeq \text {holim}_{\mathcal{P} (m)*\mathcal{L} (n)^{*m}}\Sigma ^{\infty }\text {nlc}(-,V).

Objects in the category consist of a partition {\Lambda \vdash \underline{m}} along with, for {i\in \underline{m}}, linear {f_{i}:E_{i}\to \mathbb {R}^{n}}. To tidy up a little bit, I’ll denote this category {\mathcal{J}=\mathcal{J}(M)}, for join. The functor takes an object as above and returns the set of componentwise affine maps {\coprod _{i} E_{i}\to V} such that either (a) the map is non-constant on some component, (b) when restricted to the image of {0:\underline{m}\hookrightarrow \coprod E_{i}}, the map is non-locally constant with respect to {\Lambda }.

There you have it, a homotopy limit description for the embedding functor.

But not a particularly nice one. If we had an embedding {M\hookrightarrow M'}, then we’d have map {\text {Emb}(M,V)\leftarrow \text {Emb}(M',V)}. It’d be really swell if this map was modelled by a map {\mathcal{J}(M)\to \mathcal{J}(M')} of the categories we are taking homotopy limits over. But that’s not going to happen. What can go wrong? Non-trivial partitions of {M}, when sensibly composed with the map to {M'}, may become trivial, and thus not part of the category. This is, essentially, because several components of {M} might map to a single component of {M'}. If {M} has two components, and {M'} one, say, where do you send the object consisting of the non-trivial {\Lambda \vdash \underline{2}} paired with some 0 vector spaces?

A More Natural Model

We sort of need to expand the category we take the homotopy limit over, and make it a more natural construction. We actually have an indication on how to do this from the discussion, above, in the case of linear injective maps from a single component. Perhaps we can find a proper notion of “abstract partition”, pair such a beast with a map to {M}, sensibly define non-locally constant, and get what we want. Let’s see how it goes…

An affine space is, loosely, a vector space that forgot where its 0 was. There is, up to isomorphism, one of any given dimension, just like for vector spaces; I’ll denote the one of dimension {d} by {A^{d}}, say. That should be enough of a description for now.

Let me define a Complete Affine Partition (CAP), {\rho }, to be a partition of a disjoint union of affine spaces, such that equivalence classes contain components. That is, everybody that’s in the same component is in the same equivalence class. Given a {\rho }, I’ll denote by {A(\rho )} the underlying component-wise affine space. The data that determines a {\rho } is: a finite set {s} (the set of components), a partition of {s}, and a dimension function, {d:s\to \mathbb {N}_{0}} (non-negative integers). With this information, {A(\rho )} is {\coprod _{i\in s}A^{d(i)}}.

By a refinement {\alpha } from {\rho } to {\rho'}, denoted {\alpha :\rho \to \rho'}, I will mean an affine map {\alpha :A(\rho )\to A(\rho')} so that the “affine closure” of the partition {\alpha (\rho )} is coarser than {\rho'}. I don’t want to spend too much time talking about the affine closure operation, on partitions of a component-wise affine space. If {\rho } and {\rho'} have a single component, a refinement is just a surjective affine map (recall before we had surjective linear maps in {\mathcal{L} (n)}). If {\rho } and {\rho'} have dimension function 0, so basically {\rho =\Lambda } and {\rho'=\Lambda'} (partition of possibly distinct finite sets), a refinement just means {\alpha (\Lambda ) \geq \Lambda'}.

We’re now ready to define a category, which I’ll denote {\mathcal{C}=\mathcal{C}(M)}. The objects will be pairs of: a CAP, {\rho }, along with a non-locally constant affine {f:A(\rho )\to M} (subsequently denoted {f:\rho \to M}). A morphism from {f:\rho \to M} to {f':\rho'\to M} will be a refinement {\alpha :\rho \to \rho'} such that {f=f'\circ \alpha }. This should look familiar to the {\mathcal{L} (n)} construction.

The functor I’ll consider still deserves to be called {\text {nlc}(-,V)}, and it takes {f:\rho \to M} to the set of non-locally constant affine maps {A(\rho )\to V}. We’d really like to be able to say

\displaystyle  \Sigma ^{\infty }\text {Emb}(M,V)\simeq \text {holim}_{\mathcal{C}(M)}\Sigma ^{\infty }\text {nlc}(-,V).

It seems sensible to try to do so by showing that

\text {holim}_{\mathcal{C}(M)}\Sigma ^{\infty }\text {nlc}(-,V)\simeq \text {holim}_{\mathcal{J}(M)}\Sigma ^{\infty }\text {nlc}(-,V),

since {\mathcal{J}(M)\subseteq \mathcal{C}(M)}, and we know the homotopy limit over {\mathcal{J}(M)} has the right homotopy type. This is our goal.

Semi-direct Product Structure

I’ll use the semi-direct product notation for the Grothendieck construction, as follows. Recall that for a category {D}, and a functor {F:D\to E}, the Grothendieck construction is a category, which I’ll denote {D\ltimes F}, whose objects are pairs {(d,x)} where {x\in F(d)}. Morphisms {(d,x)} to {(d',x')} are morphisms {h:d\to d'} such that {F(h)(x)=x'}. Of course, my functors are all contravariant as defined, so you have to mess about getting your arrows right. Best done in private.

I claim that {\mathcal{C}} can be written as a Grothendieck construction. Actually, it can in a few ways. The obvious way is to set {\mathcal{U}} to be the category of CAPs {\rho }, paired with refinements. The functor you need is then {\text {nlc}(-,M)}. You find that {\mathcal{C}=\mathcal{U}\ltimes \text {nlc}(-,M)}.

But there’s another way to slice it. Let {\mathcal{U}_{0}} be the category of CAPs {\rho }, along with functions {f_{0}:\rho \to \underline{m}}. Now the functor you need is not all non-locally constant maps to {M}, but only those that are lifts of {f_{0}}. You might denote this set {\text {nlc}_{f_{0}}(\rho ,M)}. I’m tired of all the notation, so let me let {\mu } denote this non-locally constant lifts functor. We have, then {\mathcal{C}=\mathcal{U}_{0}\ltimes \mu }.

While I’m simplifying notation, let me also write {\nu } for {\Sigma ^{\infty }\text {nlc}(-,V)}. Notice that it is actually a functor from {\mathcal{U}}, and thus from {\mathcal{U}_{0}}.

Let’s return to the category {\mathcal{J}(M)} again. It has the same structure. In fact, we just need to pick out of {\mathcal{U}_{0}} the subcategory of CAPs whose set of components is {\underline{m}}, and where {f_{0}} is the identity on {\underline{m}}. Calling this subcategory {\mathcal{R}}, we have {\mathcal{J}=\mathcal{R}\ltimes \mu }.

Summarizing all the notation, our goal is to show that

\displaystyle  \text {holim}_{\mathcal{U}_{0}\ltimes \mu }\nu \simeq \text {holim}_{\mathcal{R}\ltimes \mu }\nu .

The first thing I’d like to do is use twisted arrow categories to re-write things, so perhaps I should tell you about these categories first. If {D} is a category, the twisted arrow category, {{}^{a}D} has objects the morphisms of {D}. Morphisms from {d_{1}\to d_{2}} to {d_{1}'\to d_{2}'} are commuting squares

If {F} and {G} are contravariant functors from {D}, one can check that {(d_{1}\to d_{2})\mapsto \text {mor}(F(d_{2}),G(d_{1}))} is a covariant functor from {{}^{a}D}. I’ll denote it {G^{F}}. One can show that

\displaystyle  \text {holim}_{D\ltimes F}G\simeq \text {holim}_{{}^{a}D}G^{F}.

Using this, we’re hoping to show

\displaystyle  \text {holim}_{{}^{a}\mathcal{U}_{0}}\nu ^{\mu }\simeq \text {holim}_{{}^{a}\mathcal{R}}\nu ^{\mu }.

Proof Outline

We’ve got {{}^{a}\mathcal{U}_{0}\supseteq {}^{a}\mathcal{R}}. Between them lies a category I’ll denote {\mathcal{U}_{0}\to \mathcal{R}}, consisting of arrows {u\to r} with {u\in \mathcal{U}_{0}}, {r\in \mathcal{R}}. Morphisms are “twisted” commuting squares, as they should be, as part of the twisted arrow category. One can reduce the holim over {{}^{a}\mathcal{U}_{0}} to one over {\mathcal{U}_{0}\to \mathcal{R}}, and from there to one over {{}^{a}\mathcal{R}}.

To reduce from {\mathcal{U}_{0}\to \mathcal{R}} to {{}^{a}\mathcal{R}}, one can show that for all {u\to r\in \mathcal{U}_{0}\to \mathcal{R}}, the over-category {{}^{a}\mathcal{R}\downarrow (u\to r)} is contractible. In fact, this result seems to rely very little on our particular {\mathcal{R}} and {\mathcal{U}_{0}}, and doesn’t depend on the functors, {\mu }, {\nu }, or {\nu ^{\mu }}.

For the reduction from {{}^{a}\mathcal{U}_{0}} to {\mathcal{U}_{0}\to \mathcal{R}}, one shows that for all {u_{1}\to u_{2}\in {}^{a}\mathcal{U}_{0}}, we have

\displaystyle  \text {mor}(\mu (u_{2}),\nu (u_{1}))\stackrel{\sim }{\rightarrow } \text {holim}_{\substack {(u_1\to u_2)\to (u\to r)\\ \in (u_1\to u_2)\downarrow (\mathcal{U}_0\to \mathcal{R})}}\text {mor}(\mu (r),\nu (u)).

Essentially this shows that {\nu ^{\mu }}, as a functor from {{}^{a}U_{0}}, is equivalent to the right Kan extension of it’s restriction to a functor from {\mathcal{U}_{0}\to \mathcal{R}}. And the homotopy limit of a right Kan extension is equivalent to the homotopy limit of the restricted functor.

It is in this second reduction, {{}^{a}\mathcal{U}_{0}} to {\mathcal{U}_{0}\to \mathcal{R}}, that we rely on information about our categories and functors ({\mu }, in particular). Pick your object {u_{1}\to u_{2}\in {}^{a}\mathcal{U}_{0}}. You can quickly reduce the crazy over-category above to just {u_{2}\downarrow \mathcal{R}}. Now remember {u_{2}} is a CAP with a function to {\underline{m}}. I’ll denote it {f_{0}:u_{2}\to \underline{m}}. If this function is locally constant (all objects within an equivalence class get sent to the same point), then you sort of replace {u_{2}} with an object obtained by taking affine and direct sums of it’s components. The result is an object of {\mathcal{R}}, but from the perspective of {\mu }, the two objects give equivalent spaces of lifts. Alternatively, if {f_{0}:u_{2}\to \underline{m}} is non-locally constant, then every lift {u_{2}\to M} is non-locally constant, and so {\mu (u_{2})\simeq *}.

This all works out to be useful in the whole proof. But I’ll maybe save all that for another day.

Approximating Functions of Spaces

February 25, 2010

The branch of mathematics known as topology is concerned with the study of shapes. Whereas shapes in geometry are fairly rigid objects, shapes in topology are much more flexible; topologists refer to them as “spaces.” If one space can be flexed and twisted and not-too-drastically mangled into another space, topology deems them to be the same. It becomes much more difficult, then, to tell if two spaces are different. A primary goal in topology is to find ways to distinguish spaces.

Another fundamental question in topology is concerned with the ways to put one space into another space – to understand the functions between spaces. Each space is a collection of points. A function from space X to space Y is a way to assign points in X to points in Y. If X is a collection of students, and Y a collection of tables, then a function from X to Y is a way to assign each student to a table. In topology, we don’t allow just any function from X to Y. While the spaces are flexible, we have to be careful not to separate points from X that are close to each other. Using the students and tables example, we might think about two students holding hands as being close. These students could be placed at the same table, or perhaps neighboring tables, but cannot be separated across the room. A function that doesn’t separate points too much is called “continuous,” and these are the types of functions topologists consider; topologists tend to call them “maps.”

It turns out that these two primary questions of topology are actually related. If one wants to determine how similar shapes X and Y are, one might begin by introducing a third space, Z, and asking about the maps from Z to X and from Z to Y. If the collection of maps are the same in both cases, one expects that X and Y are similar, at least somewhat. More information can be obtained by replacing Z by another space W, and repeating the process. Typically the spaces Z and W are fairly well-understood spaces, like circles and spheres.

Spaces, and the maps between them, can be quite complicated in general. By restricting to various types of spaces, or types of maps, one is able to make significant progress. One important class of spaces consists of what are called “manifolds.” Intuitively, a manifold is a space which, when viewed from quite close, looks flat (like a line, or a plane), and has no corners. If you were a tiny ant, walking along on a mathematician’s idealized sphere, for example, you might get the impression that you were walking on a giant sheet of paper. Indeed, a similar viewpoint of our own world was common in the not too distant past.

Circles and spheres, and lines and planes themselves, make good examples of manifolds to keep in mind. In fact, lines and planes, and the higher dimensional “Euclidean” spaces, are the fundamental building blocks for manifolds. The defining property of a manifold is that when you get close enough, you are looking at a Euclidean space. Manifolds are essentially spaces obtained by gluing together Euclidean spaces. An interesting example, known as the Möbius strip, can be modeled by taking a strip of paper, introducing a half-twist, and taping the ends together. A tiny ant crawling along on the resulting object would have a hard time noticing that it isn’t just crawling along a strip of paper.

If one’s attention is restricted to studying manifolds, instead of more general spaces, it makes sense to also restrict the types of maps under consideration. General continuous maps need not respect the information about manifolds that makes manifolds a nice class of spaces (they are reasonably “smooth”). We replace, then, all continuous maps with a more restricted class of maps which preserve the structure of manifolds. A particularly nice such class consists of those maps known as “embeddings.” An embedding will not introduce corners in manifolds, and also will not send two points to the same point (embeddings would place only one student at each table, in the earlier example).

When studying manifolds, then, a topologist may be concerned with the collection of embeddings between two manifolds. If the manifolds are called M and N, then we might denote the embeddings of M into N by E(M,N). This is then a function itself – a function of two variables, M and N. If we fix one of the variables, say we only think about M being a circle, we still have a function of one variable, and have made our study somewhat easier.

Leaving M fixed, how do the values E(M,N) change as N changes? Said another way, if we modify N slightly, what is the effect on E(M,N)? If it is difficult to find E(M,N), how can it be approximated? How can the function itself be approximated?

These questions are strikingly similar to questions asked in calculus. Given a function that takes numbers in and spits numbers out (y=e^x, for example) what happens to the output values (y) if the input value (x) is changed slightly? If we know about the value at a particular point (e^0=1), what can be said about values nearby (e^{1/2}, say)? The answers to these questions lie with the derivative, and its “higher” analogues (the derivative of the derivative, and so on). If one knows about the derivatives of a function at a point, one can create “polynomial” approximations to the function, near that point.

It turns out that something quite similar happens when studying the embedding function (and other functions like it). Some sense can be made of derivatives, polynomials, and best approximations, all in the context of functions of spaces (instead of functions of numbers).

I have been studying the embedding function, and its polynomial approximations, when M is fixed. I let M be a collection of disjoint Euclidean spaces of any dimension; so I might take M to be 3 lines and 2 planes, all separate from each other. I also restrict my attention to E(M,N) only when N itself is a Euclidean space. Since any manifold is built out of Euclidean spaces, the cases I consider are important building blocks to understanding more general embedding functions.

Previous work has already covered some of the cases I consider. If M is a finite collection of points, the collection of embeddings is called a “configuration space.” Loosely, this case covers the idea that embedding may not bring two points together, and is somewhat of a “global” situation. Another case is when M only has one piece, say a single line. Here, one is exploring more the notion that embeddings may not introduce corners, a “local” situation. In both of these cases, the best polynomial approximations for the embedding functions have been identified. Moreover, useful descriptions of the approximations have been obtained.

In the more general situation I consider, I have been interacting with both aspects of embeddings. Since my spaces, M, may have many pieces, I am involved in global aspects of embeddings. Since my M may have pieces of any dimension, I am involved in local aspects of embeddings. Unifying the description of the approximations in these two cases has been my task.


A somewhat different, perhaps more elementary version of this is also available.

The Steinhaus Conjecture

December 23, 2009

Or, perhaps more appropriately, “A” Steinhaus conjecture, he/she (I’m guessing Hugo, so he. Perhaps I’ll look into it) seems to have made a couple. This conjecture (theorem) also goes by the name “The Three Gap Theorem”, or “The Three Distance Theorem”. Which is all a little annoying, I think. It makes looking for references 3 times as hard, I reckon. But it’s a pretty cool result, and I’m glad Dave Richeson brought it to my attention via his blog post on Three cool facts about rotations of the circle.

To write down the theorem, I’ll first introduce the notation \{x\} for the “decimal part” of a real number, defined as x-\lfloor x\rfloor, \lfloor x\rfloor being the largest integer no bigger than x. Since I’ll be thinking about positive x, it is the value of x if you ignore digits to the left of the decimal point. This seems to be fairly common notation. Anyway…

The theorem goes something like this:

Theorem: Suppose that 0<\alpha<1 is irrational. Let N be a positive integer bigger than 1, and consider the N points \{m\alpha\} for 0\leq m <N. These points partition the interval [0,1] into N subintervals. If the distances of these subintervals are calculated, there will be either 2 or 3 distinct distances.

The circle comes in by thinking of the interval [0,1] as a circle with circumference 1. To help visualize it, Dr. Richeson made a pretty sweet GeoGebra applet.

I think this is a pretty initially surprising theorem. My initial shock has worn off just slightly, now that I’ve played with pictures and dug through a proof, but it’s still a wonderful result. I mean, irrational values are supposed to do weird things, right? Their multiples should bounce all over the place in the unit interval. And yet, they partition the circle into just 2 or 3 differently-sized gaps? Crazy talk. Also, the theorem as stated above isn’t as strong as it could be… you can say a bit more about the distances. I think I’ll talk more about it in another post.

I started reading about this theorem, after Dr. Richeson’s post, in the paper by Tony van Ravenstein. As I was reading the proof I got hung up on some details, and found that consulting the paper by Micaela Mayero got me over those difficulties. The paper by Mayero is essentially a formal proof written for the Coq formal proof system, so it sort of makes sense that details will be pretty fully explained in it. Either way though, it’s really not a long or particularly difficult proof (you mostly play with some inequalities).

I may return, in a future post, to talking about the proof, and I’ll certainly come back and tell you as I read more about further consequences and generalizations, and whatever else I find in some other papers I’m planning on looking at. But for now, let me mention a result in van Ravenstein’s paper. He proves that in going from the picture with N points to the picture of N+1 points, the N+1-th point will break up the oldest of the intervals with the largest length. The “age” of an interval is pretty intuitive. If a particular interval, say between multiples \{p\alpha\} and \{q\alpha\} comes in when there are N_0 points, and those two points are still neighbors when there are N_1 points, then the age of that interval, at stage N_1, is N_1-N_0 (plus 1, if you want, it doesn’t matter).

To help picture what’s going on, I made an interactive Sage notebook. If you have an account on, or have Sage installed on your own computer and want to just copy my code over, you can look at my code and play with the notebook. I had hoped that publishing my notebook there would let you play with the interactive bits without you needing an account, but no dice. Sorry.

To give some sense of my notebook, and the theorem, I’ve got some pictures for you.

First, let’s take \alpha=0.3826 or so (basically 1 minus the golden ratio, nice and irrational). I’ve set up my notebook so that points travel from 0, at the top of the circle, clockwise, because that’s how it was done in the papers I was reading, and I thought it’d be less confusing. So here’s the starting picture, when there’s just the points 0 and \alpha:

Along the outside of the circle, at each dot, I list which multiple it is. The “newest” dot is magenta, instead of red (probably not great color choices… mess with my code and make your own :)). In the center of the circle I list the lengths of the intervals, in decreasing order. Along each interval, I also write the age of that interval, and color-code the text to the list of distances. I’ve decided to always have the largest length be red-orangeish, the smallest length blue-ish, and the middle length (if there is one) green-ish.

In the picture above, the interval on the left is clearly the oldest of the longest length intervals, so the theorem is that when we add another point, this interval will get broken up by that point. Which is clearly true in this case.

Here’s another picture, using the same \alpha, but slightly more points, showing that three gaps occur:

And, finally, 20 points:

Here’s a picture using a starting \alpha a little bigger than 0.6, showing 20 points:

I like how the points seem to develop in clusters (also evidenced by Dr. Richeson’s app).

I guess that’s probably enough for now. Like I said, I’m hoping to have plenty more to say about things related to all of this soon…

Postscript: I want to make a few shout-outs. I thought putting them at the end of this post might interrupt any sort of flow of the article (if there is any) a little less.

mixedmath pointed out in the comments that public sagenb notebooks are currently (20130623) down. The code looks terrible in the comments, so I figured I’d just add it here:

defalpha = 0.38197 # golden ratio, ish
maxN = 20 # maximum number of points to put in the circle
tolerance = 10^(-7) # to decide when two floats are equal

# some colors, lower index corresponds to bigger distance
segcolor = [(0.86,0.28,0.06),(0.52,0.80,0.06),(0.19,0.11,0.60)]

# the unit circle
basepic = circle((0,0),1,rgbcolor=(0,0,0))

# floating part of a number
flpart = lambda v: v-int(v)

# a point v units along the circumpherence (of length 1 unit) at radius R
coords = lambda v,R: (R*sin(2*pi*v), R*cos(2*pi*v))

# draw dots on the circle, distinguish the "newest" by color
olddot = lambda v:circle(coords(v,1),0.02,rgbcolor=(1,0,0),fill=True)
newdot = lambda v:circle(coords(v,1),0.02,rgbcolor=(1,0,1),fill=True)

# storage
picturestore = {}

def addtopicturestore(alphaval):
    """ Make the picture for alphaval and all (up to maxN) numbers of points """
    picture = [basepic for m in range(0,maxN+1)] # to go into storage
    multiple = [flpart(m*alphaval) for m in range(0,maxN+1)] # the points
    # we care most about which distances are longest/shortest, and how
    # long each interval has been a certain distances
    # we'll build up these next few arrays as we increment the number of points
    # disttosucc[m] = actual distance to next point
    # agethisdist[m] = how long the interval after the point has been this distance
    # distsize[m] = 0,1,2 if the interval after point m is a big,med,or small interval
    disttosucc = [1] + [-1 for m in range(0,maxN)]
    agethisdist = [1] + [-1 for m in range(0,maxN)]
    distsize = [0] + [-1 for m in range(0,maxN)]
    # now, build up to having all of the points
    # currently, we suppose we only know the 0 point
    for N in xrange(2,maxN+1):
        newpt = multiple[N-1]
        # the new point breaks the oldest interval among those with biggest length
        # so find that interval
        oldestbigdist = distsize.index(0)
        for idx in xrange(oldestbigdist + 1, N-1):
            if distsize[idx] == 0 and agethisdist[idx] > agethisdist[oldestbigdist]:
                oldestbigdist = idx
        # newpt is the successor of oldestbigdist
        # update the only distances that change when adding this point
        splitdist = disttosucc[oldestbigdist]
        disttosucc[oldestbigdist] = newpt - multiple[oldestbigdist]
        disttosucc[N-1] = splitdist - disttosucc[oldestbigdist]
        # reset the age counts for these two new distances
        agethisdist[oldestbigdist] = 1
        agethisdist[N-1] = 1
        # now we recompute which distances are biggest/smallest
        # first, what are the 2 or 3 distances?
        distances = [disttosucc[oldestbigdist], 0, disttosucc[N-1]]
        if disttosucc[oldestbigdist] < disttosucc[N-1]:
            distances[0] = disttosucc[N-1]
            distances[2] = disttosucc[oldestbigdist]
        for idx in xrange(0,N):
            # we're using the fact that there are only 3 distances,
            # and that we already know two of them
            if disttosucc[idx] - distances[0] > tolerance:
                distances[1] = distances[0]
                distances[0] = disttosucc[idx]
            elif distances[0] - disttosucc[idx] > tolerance:
                if distances[2] - disttosucc[idx] > tolerance:
                    distances[1] = distances[2]
                    distances[2] = disttosucc[idx]
                elif disttosucc[idx] - distances[2] > tolerance:
                    distances[1] = disttosucc[idx]
            # while we're at it, update age of un-changed intervals
            if not idx == oldestbigdist and not idx == N-1:
                agethisdist[idx] += 1
        # now that we know the 2-3 distances, we can tell which points have which dist.
        for idx in xrange(0,N):
            smidx = 0
            while abs(distances[smidx]-disttosucc[idx]) > tolerance:
                smidx += 1
            distsize[idx] = smidx
        # finally, build the picture
        dots = [olddot(multiple[m]) for m in xrange(0,N-1)] + [newdot(multiple[N-1])]
        labels = [text(str(m),coords(multiple[m],1.1),rgbcolor=(0,0,1))
                  for m in xrange(0,N)]
        agelabels = [text(str(agethisdist[m]),
                      for m in xrange(0,N)]
        distancelegend = text(str(distances[0]),(0,.1),rgbcolor=segcolor[0])
        distancelegend += text(str(distances[2]),(0,-.1),rgbcolor=segcolor[2])
        if distances[1]:
            distancelegend += text(str(distances[1]), (0,0), rgbcolor=segcolor[1])
        picture[N] += sum(dots)+sum(labels)+sum(agelabels)+distancelegend
    # outside the loop, all pictures have been computed, just store them
    picturestore[alphaval] = picture

# set up the interactive bits
def _( alpha=slider(0.0001,0.9999,0.0001,default=defalpha,label='Distance'),
       N=slider(2,maxN,1,default=2,label='Number of Points') ):
    if alpha not in picturestore:
    show(picturestore[alpha][N], axes=False, aspect_ratio = 1)