Episode #133: Rake File Operations

Upgrade to download episode video.

Episode Script

Here's the Rakefile we've been working on for the last few episodes. It finds Markdown source files in a “sources” subdirectory of a project, and produces a parallel hierarchy of HTML files in an “outputs” subdirectory.

SOURCE_FILES = Rake::FileList.new("sources/**/*.md", "sources/**/*.markdown") do |fl|
  fl.exclude("**/~*")
  fl.exclude(/^scratch\//)
  fl.exclude do |f|
    `git ls-files #{f}`.empty?
  end
end

task :default => :html
task :html => SOURCE_FILES.pathmap("%{^sources/,outputs/}X.html")

rule ".html" => ->(f){source_for_html(f)} do |t|
  sh "pandoc -o #{t.name} #{t.source}"
end

def source_for_html(html_file)
  SOURCE_FILES.detect{|f| f.ext('') == html_file.ext('')}
end

Now that we are recreating the input file hierarchy in an “outputs” directory, we need to ensure that the destination directory exists before generating any HTML files. An easy way to do this in Rake is to use a directory task. This is like a file task, except for a directory. But unlike a file task, we don't have to supply any code for how to make the directory appear if it doesn't already exist. Simply by specifying the task, we are giving Rake implicit instructions to create the directory if it is needed.

We add this directory to the list of dependencies for the “.html” rule.

rule ".html" => [->(f){source_for_html(f)}, "outputs"] do |t|
  sh "pandoc -o #{t.name} #{t.source}"
end

Now when we run rake, we can see that it creates the directory before beginning to generate the HTML files. Unfortunately, it runs into a problem as it tries to build the appendix.html file. Since this file is in a subdirectory of the sources directory, we want the HTML output file to be in a corresponding subdirectory of the outputs directory. But this subdirectory doesn't yet exist.

t$ rake
mkdir -p outputs
pandoc -o outputs/backmatter/appendix.html sources/backmatter/appendix.md
pandoc: outputs/backmatter/appendix.html: openFile: does not exist (No such file or directory)
rake aborted!
Command failed with status (1): [pandoc -o outputs/backmatter/appendix.html...]
/home/avdi/Dropbox/rubytapas/133-rake-file-operations/project/Rakefile:16:in `block in <top (required)>'
Tasks: TOP => default => html => outputs/backmatter/appendix.html
(See full trace by running task with --trace)

To ensure this or any other intermediate directory exists before producing an HTML file, we could execute a mkdir -p shell command, using #pathmap to pass just the directory portion of the target filename.

sh "mkdir -p #{t.name.pathmap('%d')}"

But Rake gives us a shortcut for this. Instead of running a shell command, we can use a mkdir_p method right in the task:

SOURCE_FILES = Rake::FileList.new("sources/**/*.md", "sources/**/*.markdown") do |fl|
  fl.exclude("**/~*")
  fl.exclude(/^sources\/scratch\//)
  fl.exclude do |f|
    `git ls-files #{f}`.empty?
  end
end

task :default => :html
task :html => SOURCE_FILES.pathmap("%{^sources/,outputs/}X.html")

directory "outputs"

rule ".html" => [->(f){source_for_html(f)}, "outputs"] do |t|
  mkdir_p t.name.pathmap("%d")
  sh "pandoc -o #{t.name} #{t.source}"
end

def source_for_html(html_file)
  SOURCE_FILES.detect{|f| 
    f.ext('') == html_file.pathmap("%{^outputs/,sources/}X")
  }
end

Now when we run rake, it ensures the target directory exists before each markdown-to-HTML transformation.

$ rake
mkdir -p outputs/backmatter
pandoc -o outputs/backmatter/appendix.html sources/backmatter/appendix.md
mkdir -p outputs
pandoc -o outputs/ch1.html sources/ch1.md
mkdir -p outputs
pandoc -o outputs/ch2.html sources/ch2.md
mkdir -p outputs
pandoc -o outputs/ch3.html sources/ch3.md
mkdir -p outputs
pandoc -o outputs/ch4.html sources/ch4.markdown
SOURCE_FILES = Rake::FileList.new("sources/**/*.md", "sources/**/*.markdown") do |fl|
  fl.exclude("**/~*")
  fl.exclude(/^sources\/scratch\//)
  fl.exclude do |f|
    `git ls-files #{f}`.empty?
  end
end

task :default => :html
task :html => SOURCE_FILES.pathmap("%{^sources/,outputs/}X.html")

directory "outputs"

rule ".html" => [->(f){source_for_html(f)}, "outputs"] do |t|
  mkdir_p t.name.pathmap("%d")
  sh "pandoc -o #{t.name} #{t.source}"
end

def source_for_html(html_file)
  SOURCE_FILES.detect{|f| 
    f.ext('') == html_file.pathmap("%{^outputs/,sources/}X")
  }
end

Often when writing build scripts it's convenient to have an easy way to quickly blow away all of the generated files. Let's add a task to handle this. Once again, instead of running a shell, we'll use a Rake helper method called rm_rf. This mirrors the shell rm -rf command, which recursively deletes files and directories without any warnings or confirmation.

task :clean do
  rm_rf "outputs"
end

Rake has a long list of these file operation helper methods, all of them named after their UNIX shell equivalents. They are handy for several reasons. For one thing, since they are native Ruby methods we can pass files and file lists to them directly without any kind of string interpolation.

They are also sensitive to the Rake “quiet” flag. We can run a Rake command with the -q flag, and it will do any work needed, but this time without logging to STDOUT.

$ rake -q
$

Almost all of these helpers are inherited straight from the Ruby FileUtils standard library. So if you want to see a list of all that's available, just check out the FileUtils documentation.

That's all for today. Happy hacking!