Episode #132: Rake Pathmap

Upgrade to download episode video.

Episode Script

Today, as we continue our deep dive into the Rake tool, I want to show you one of Rake’s most powerful features: the #pathmap method.

Here is the FileList we came up with a few episodes ago. It comes up with a list of Markdown files to be built into HTML files inside a project directory.

require "rake"
Dir.chdir "project"
SOURCE_FILES = Rake::FileList.new("**/*.md", "**/*.markdown") do |fl|
  fl.exclude("~*")
  fl.exclude(/^scratch\//)
  fl.exclude do |f|
    `git ls-files #{f}`.empty?
  end
end

SOURCE_FILES
# => ["ch1.md", "ch3.md", "ch2.md", "subdir/appendix.md", "ch4.markdown"]

We’ve already seen that we can use the #ext method to convert this from a list of input files to a list of target files by converting the Markdown extensions to .html.

require './source_files'
SOURCE_FILES.ext('html')
# => ["ch1.html", "ch3.html", "ch2.html", "subdir/appendix.html", "ch4.html"]

When working with software builds, software packaging, or system administration tasks, we often want to take a list of files and generate a second, modified list from it. Converting from one set of file extensions to another is just one specific example of this kind of problem. When we want to do something other than substitute file extensions, we need to turn to Rake’s power tool for munging file names: #pathmap.

#pathmap takes as its argument a specification, which is a string containing codes corresponding to different parts of the original filenames. Let’s try some of these codes out.

First of all, %p gives us the whole original path, which isn’t that exciting.

%f gives us the filename without any directory portion. Notice that subdir/appendix.md has become simply appendix.md using this code.

%n renders the file base name without either extension or directory portions.

%d gives us the directory but no file name.

%x yields only the file extensions.

And %X gives everything but the extension.

require './source_files'
SOURCE_FILES.ext('html')
SOURCE_FILES.pathmap("%p")
# => ["ch1.md", "ch3.md", "ch2.md", "subdir/appendix.md", "ch4.markdown"]
SOURCE_FILES.pathmap("%f") 
# => ["ch1.md", "ch3.md", "ch2.md", "appendix.md", "ch4.markdown"]
SOURCE_FILES.pathmap("%n")
# => ["ch1", "ch3", "ch2", "appendix", "ch4"]
SOURCE_FILES.pathmap("%d")
# => [".", ".", ".", "subdir", "."]
SOURCE_FILES.pathmap("%x")
# => [".md", ".md", ".md", ".md", ".markdown"]
SOURCE_FILES.pathmap("%X")
# => ["ch1", "ch3", "ch2", "subdir/appendix", "ch4"]

The string we pass to #pathmap can contain more than just placeholder codes. We can include arbitrary text as well. Let’s say we wanted to start a subsidiary Ruby process with a specially configured library load path. We have a list of directories we’d like to include in that load path in a FileList.

In order to convert our list of directories into command line arguments that tell Ruby to add each directory to its load path, we can use #pathmap to prepend a -I in front of each directory name. (-I is Ruby command-line flag that specifies an extra directory for the load path.)

When we interpolate this list into our command string, we get a -I argument for each directory in the list.

require "rake"
load_paths = FileList["mylibs", "yourlibs", "sharedlibs"]
ruby_args  = load_paths.pathmap("-I%p")
command    = "ruby #{ruby_args} myscript.rb"
# => "ruby -Imylibs -Iyourlibs -Isharedlibs myscript.rb"

This also demonstrates another feature of FileList: unlike an array, when converted to a string it formats itself as a space-separated list of elements.

load_paths.to_s                 # => "mylibs yourlibs sharedlibs"
load_paths.to_a.to_s            # => "[\"mylibs\", \"yourlibs\", \"sharedlibs\"]"

Another thing we can do with #pathmap is text replacement. Let’s go back to our list of Markdown files. Let’s assume, however, that we’ve moved all the source markdown to a project subdirectory called sources. Let’s also say that rather than put the generated HTML files next to the source files, we want to put them in a separate output directory which mirrors the structure of the source directory tree.

In order to get a list of HTML files to be built, we start our #pathmap pattern with a % sign as usual, but then instead of a letter code we insert an open brace. Inside the brace, we specify a simple regular expression which will look for the sources/ directory at the beginning of the string. Then we put a comma, and follow it with a replacement string: the name of the outputs/ directory. Then we add a closing brace. Next we need to tell #pathmap what part of the path it should be performing this replacement on. We use capital X, which as you recall from earlier is the code for “everything but the file extension”. Finally we add add .html for the new file extension.

require "rake"
Dir.chdir "project2"
SOURCE_FILES = Rake::FileList.new("sources/**/*.md", "sources/**/*.markdown")

SOURCE_FILES 
# => ["sources/ch1.md", "sources/ch3.md", "sources/ch2.md", "sources/subdir/appendix.md", "sources/ch4.markdown"]

OUTPUT_FILES = SOURCE_FILES.pathmap("%{^sources/,outputs/}X.html")
OUTPUT_FILES
# => ["outputs/ch1.html", "outputs/ch3.html", "outputs/ch2.html", "outputs/subdir/appendix.html", "outputs/ch4.html"]

When we evaluate this code we can see that the result is a list of to-be-created HTML files in an outputs subdirectory. Notice that the appendix file is in a subdirectory of outputs, mirroring its placement in the source file tree.

Actually, let’s take a closer look at that file. When we try to build this file, we’re going to run into a problem: even if we’ve already created the outputs directory, we probably won’t have created subdir. If we told pandoc to generate the file, it would complain that it can’t open the output file because the directory doesn’t exist.

We know we can create the directory with a mkdir -p command. But we need to give mkdir just the directory, not the filename. For this, we use another #pathmap call with %d as the specification, telling it to only return the directory part of the filename.

Notice that we’ve called #pathmap on a string here, rather than on a FileList. Like the #ext method we saw in the previous episode, Rake adds #pathmap to the String class, so that we can use file lists and individual filename strings interchangeably.

require "rake"
Dir.chdir "project2"
SOURCE_FILES = Rake::FileList.new("sources/**/*.md", "sources/**/*.markdown")
OUTPUT_FILES = SOURCE_FILES.pathmap("%{^sources/,outputs/}X.html")

f = OUTPUT_FILES[3]
f                               # => "outputs/subdir/appendix.html"
cmd = "mkdir -p #{f.pathmap('%d')}"
cmd                             # => "mkdir -p outputs/subdir"

Believe it or not, we’ve only seen a little of #pathmap‘s full power in this episode. If you want to see more of what it can do, check out the Rake documentation. You’ll find out how to do multiple replacements at once, how to use a block to perform arbitrary calculations for replacement text, and more.

But even with what we’ve covered today, you should know enough to make effective use of #pathmap in a wide variety of Rake scenarios. Happy hacking!