Archive for November, 2011

Printing in foreach’s dopar

November 5, 2011

I’ve recently been doing some work in R, and using the foreach package for some easy parallelization. I’d like to be able to spit out logging messages throughout my code, but the print lines always disappear when they are in the code executed by the %dopar% operator. So I’ve been working on trying to come up with a nice way to still capture/print those logging messages without having to change much of my existing code. I think my ultimate goal would be to write another binary operator which I can use as a drop-in replacement for %dopar%, which uses dopar under the covers, and which gets my print lines to the console. I haven’t quite gotten to that point yet, but I might be close-ish, and, either way, what I’ve got seems like it could either be useful for others, or something others could suggest how to fix.

My Setup

Following this post on closures in R, I set up most of my code as a bunch of ‘objects’ (environments). For example, I construct a logger object by calling the following function:

get.logger <- function() {
  logger <- new.env()
  logger$message <- function(str) {
    print(paste(date(), str))
  }
  environment(logger$message) <- logger
  logger
}

In sort of my ‘main’ code, I then do something like logger <- get.logger(), and then pass this logger around to my other objects which do more heavy lifting. For example, I might have a function:

guy.using.logger <- function(lgr) {
  guy <- new.env()
  guy$.logger <- lgr
  guy$do.stuff <- function() {
    foreach(i=1:10,
            .inorder=TRUE,
            .export='.logger',
            .init=c(),
            .combine=function(left, right) { c(left, right) }) %dopar% {
      .logger$message(paste('Working on piece', i))
      i^2
    }
  }

  environment(guy$do.stuff) <- guy

  guy
}

And then I’d have something like the following in my main code (in addition to setting up the parallel backend and such):

logger <- get.logger()
gul <- guy.using.logger(logger)
gul$do.stuff()

That final line return a vector with 10 elements, the squares of the integers 1 through 10, as expected. However, the thing I’m trying to overcome is that none of the expected print lines show up on the console.

Misguided Quest

I admit that this may be somewhat of a misguided quest. Changing the %dopar% to simply %do%, all of the print lines show up. And maybe I shouldn’t be doing objects the way I have them. I like to think an argument could be made for still trying to make my logging work within %dopar%. I’m not going to try to make an argument though. If you think it’s a useless mission, go find something else to read – there’s plenty of other interesting things online. Or stick around anyway, perhaps you’ll find something interesting here, or be able to help understand whatever it is I’m missing to make it work the way I want.

Buffering Messages

The basic idea I’ve been working with is to replace the logger with one that doesn’t actually print lines, but stores them up so they can be retrieved and printed later. So, for starters, I made a buffering logger:

get.buffering.logger <- function() {
  logger <- new.env()
  logger$.log <- c()
  logger$message <- function(str) {
    .log <<- c(.log, paste(date(), str))
  }
  logger$get.log <- function() {
    rv <- .log
    .log <<- c()
    rv
  }
  environment(logger$message) <- logger
  environment(logger$get.log) <- logger
  logger
}

Now, if I just make logger <- get.buffering.logger() and pass that to guy.using.logger, I can’t successfully ask for the messages after a call to do.stuff(). My guess is that the issue is the lack of thread-safety with vector/c(). This isn’t too upsetting, because I don’t really want to be buffering my messages all that long anyway. If I return the logger’s buffered messages as part of the results at the end of the expression evaluated in %dopar%, then in the .combine of the foreach, I could print those messages there. More explicitly, I could do:

complicated.guy.using.logger <- function(lgr) {
  guy <- new.env()
  guy$.logger <- lgr
  guy$do.stuff <- function() {
    combiner <- function(left, right) {
      cat(right$msgs, sep='\n')
      c(left, right$ans)
    }
    foreach(i=1:10,
            .inorder=TRUE,
            .export='.logger',
            .init=c(),
            .combine=combiner) %dopar% {
      .logger$message(paste('Working on piece', i))
      list(ans=i^2, msgs=.logger$get.log())
    }
  }

  environment(guy$do.stuff) <- guy

  guy
}

logger <- get.buffering.logger()
cgul <- complicated.guy.using.logger(logger)
cgul$do.stuff()

And I get all my print lines, and the same return value from the %dopar% call (the vector of the first ten squares) as before. Admittedly, I’m a little surprised that this seems to work, now that I think about it. The shared logger is buffering messages for several threads, which seems sort of troublesome. I guess they’re kept separate enough by the threads, or I’ve just gotten lucky the twice I’ve run the code above.

Jazzing Things Up

If I only had one or two foreach calls in my code, I’d probably just fix them as above and be content enough (if the last thread-safety concerns calmed down). However, there’s clearly room for some automation here. What has to happen?

  1. The expression evaluated by %dopar% needs to return whatever it did before, as well as the accumulated messages during the execution.
  2. The combine function needs to return what it did before, and print out the logging messages.

The easy part, from my perspective, is the second part. Let’s assume we can solve number 1, as we did in the previous section. That is, the new block will return a list, with an ‘ans’ element which has whatever was returned previously, and a ‘msgs’ element. Recall that foreach() actually returns an object of class foreach. Using summary() to get a sense of that object, it seems easy enough to write a function which takes a foreach object and returns a new foreach object which is basically the same as the original, but has a modified .combine function. Let’s try

jazz.fe <- function(fe) {
  r.fe <- fe
  r.fe$combineInfo$fun <- function(left, right) {
    cat(right$msgs, sep='\n')
    fe$combineInfo$fun(left, right$ans)
  }
  r.fe
}

We need a similar function which modifies the expression to be evaluated (the block of code on the right of %dopar%), so that the return values are manipulated as described above. I messed around a little bit with passing unevaluated expressions around, and came up with the following:

jazz.ex <- function(ex) {
  parse(text=c('r<-', deparse(ex), 'list(msgs=.logger$get.log(), ans=r)'))
}

And then these bits can be used as in the following guy:

jazzed.guy.using.logger <- function(lgr) {
  guy <- new.env()
  guy$.logger <- lgr
  guy$do.stuff <- function() {
    jazz.fe(foreach(i=1:10,
            .inorder=TRUE,
            .export='.logger',
            .init=c(),
            .combine=function(left, right) { c(left, right) })) %dopar% eval(jazz.ex(quote({
      .logger$message(paste('Working on piece', i))
      i^2
    })))
  }

  environment(guy$do.stuff) <- guy

  guy
}

This code should be compared with the very first guy.using.logger. The only difference between the two is that we wrapped the foreach in a function call, and also wrapped the expression in… a few calls. The ultimate goal of a drop-in %dopar% replacement is tantalizingly close. If all I need to do is call some function on the foreach object, and some other function on the expression, and then I can run %dopar%, that’s easy:

'%jdp%' <- function(fe, ex) {
  jazz.fe(fe) %dopar% eval(jazz.ex(ex))
}

And then I could just do

failing.ultimate.guy.using.logger <- function(lgr) {
  guy <- new.env()
  guy$.logger <- lgr
  guy$do.stuff <- function() {
    foreach(i=1:10,
            .inorder=TRUE,
            .export='.logger',
            .init=c(),
            .combine=function(left, right) { c(left, right) }) %jdp% quote({
      .logger$message(paste('Working on piece', i))
      i^2
    })
  }

  environment(guy$do.stuff) <- guy

  guy
}

logger <- get.buffering.logger()
fgul <- failing.ultimate.guy.using.logger(logger)
fgul$do.stuff()

No dice:

> fgul$do.stuff()
Error in e$fun(obj, substitute(ex), parent.frame(), e$data) : 
  unable to find variable ".logger"

Rats. I haven’t yet found the right combination of quote, eval, substitute, … to make this work, and I’ve tried several. I’ve been reading about environments and frames and lexical scoping and closures, and still haven’t gotten it right. If I put the definition of %jdp% in the do.stuff function, it actually works. But that’s ugly. Nothing else I’ve come up with works and involves as few changes as wrapping the two arguments to the %dopar% operator, as in the jazzed.guy.using.logger above.

So, if anybody’s got any suggestions, just let me know. In the mean time, I’ll either poke around in the %dopar% source for inspiration, or move on to other things.