Episode #135: Rake Multitask

Upgrade to download episode video.

Episode Script

This miniseries on Rake is winding its way to a close. I hope that over the course of the last several videos you've come to appreciate the power of Rake as I have. But perhaps you're still skeptical about the benefit of using Rake over plain-old Ruby or shell scripts. If so, I think today might just change that impression. I want to show you an amazingly powerful capability that Rake gives us more or less for free.

Let's say we're putting an ebook together. We have a directory full of several hundred code listings which we've stripped out of the text in preparation to be turned into syntax-highlighted HTML using the pygmentize utility.

Here's a little Makefile to take care of this task. It defines a list of listings, a list of “highlights”, which are the HTML end products, and a task to produce all of them called highlight. Finally, it defines a rule to produce a .html file from a listing file by running pygmentize on it. We've also defined the default task to depend on the highlight task.

require "rake/clean"

task :default => :highlight

LISTINGS   = FileList["listings/*"]
HIGHLIGHTS = LISTINGS.ext(".html")
CLEAN.include(HIGHLIGHTS)

task :highlight => HIGHLIGHTS

rule ".html" => ->(f){ FileList[f.ext(".*")].first } do |t|
  sh "pygmentize -o #{t.name} #{t.source}"
end

Highlighting source code with pygmentize takes time. When we have a lot of source files, it takes a lot of time. If we run rake under the time command, it tells us that the process takes about 48 seconds.

$ time rake
...
pygmentize -o listings/fd673484d50a66ea67fcd20e0c55f038a729e4d7.html listings/fd673484d50a66ea67fcd20e0c55f038a729e4d7.rb
pygmentize -o listings/ff6e24090e794c4db847b10ca993c872ca804101.html listings/ff6e24090e794c4db847b10ca993c872ca804101.rb

real    0m47.961s
user    0m41.912s
sys     0m4.852s

Currently these highlighted files are being built one at a time. But this is 2013, and I have a computer with two physical cores and, through hyperthreading, four virtual cores. Why can't we build more than one file at a time?

As it turns out, we can. And all we have to do is change one line of the rakefile, from task to multitask.

multitask :highlight => HIGHLIGHTS

This tells Rake that it can process the prerequisites of the :highlight task in parallel. Note that we make this change to the task which depends on the task we want to be parallelized; not to the parallelizable task itself

We run rake again. We see some rather messy output, as Rake fires up a few hundred parallel Rake subprocesses simultaneously and they all talk to the same STDOUT.

A little over 25 seconds later, the build is done. With this one change, we've cut the processing time nearly in half!

$ time rake
...

real    0m25.701s
user    1m13.492s
sys     0m8.272s

If we want to fine tune how many tasks are run in parallel, we can use the -j option to Rake to tell it the maximum number of processes to run at once. I'll specify 4, one for each virtual core.

Interestingly, this actually takes a little bit longer. I'm not sure why.

$ time rake -j 4

real    0m26.752s
user    1m10.300s
sys     0m7.208s

Earlier I said that all it takes is a one-line change to the code to parallize execution, but that was a bit of a fib. In truth, we can tell Rake to run tasks in parallel with no changes to the code whatsoever. Let's change the multitask back to a task. Then we'll run rake with the -m option. This tells Rake to treat ever task as if it is a multitask.

Again, we see distorted output. And when the dust settles, we once again see a total time of a little over 25 seconds.

$ time rake -m

real    0m25.606s
user    1m13.060s
sys     0m7.856s

Rake's parallelization is smart, too: if other tasks were dependent on the :highlight task, it would still wait until all the pygmentize processes finished before moving on to the next phase.

So what do we gain from automating our builds with Rake? Not just an easy way to declare complex dependencies and rules for accomplishing tasks. Not just a set of convenience methods for file operations. Not just a handy command-line front-end. In addition to all that, we get parallelization of repetitive tasks for free. And that's what I call happy hacking!