Understanding Ruby Thread-Local Variables

You've had it hammered into your head over and over as a developer: global data is bad. Global, mutable data is even worse. And when threads enter into the mix, global mutable data is a disaster waiting to happen.

But global data isn't always obvious. Sometimes, it takes the form of library-wide or class-wide configuration. Configuration that should never change at runtime… until the day that you find that it does need to change.

When that happens, updating your code to be thread-safe can be a nightmare. Fortunately, in some cases there's a way around all that work. But to use it, you need to understand how to use thread-local variables in Ruby.

(This post is adapted from RubyTapas Episode 161)

 

Supposing we have a library which implements an extremely basic disk-based persistent object store. We'll call it “ShmactiveRecord”.

We use ShmactiveRecord by inheriting business model classes from it.

Those classes are then able to store records to disk, read records back in, find records by their ID, and remove records.

All of these CRUD operations depend for their implementation on a “repository” object. In a proper ORM library, this repository might be an abstraction over a database connection. In our case, the repository is simply an instance of PStore. If you're not familiar with PStore, it's a simple file-based key-value storage library that comes with Ruby. We'll talk about it more in another episode.

Every ShmactiveRecord class or instance delegates back to the base class for its repository. By default, this global repository is set to a that writes a file called .

But the can also be specified explicitly with a setter method.

A simple application

Let's say that we are, however improbably, using this barebones persistence framework to store business data in a web application.

Let's say further that there is rather a lot of data on the production server, and we want to skim off a smaller subset of the records into a sample database, which we can then use for tests and development.

In order to do this, we write a script. It selects the sample records, delegating to a helper method on the record class for the details of how to grab a sample. Then it changes the global over to the sample database, and tells all the sampled records to save themselves. With the updated, the records are written back to the new sample file rather than the original database.

We put this script in a Rake task and use it whenever we want a fresh sample database.

Then one day, the folks in QA ask if we could make it possible for them to regenerate the sample database just by hitting a button in the app's admin area. Eager to please, we take our script and put it into a new code path in the app itself.

Just to make sure it doesn't change anything permanently, we update the code to reset the to be after it is finished.

This change is nothing short of disastrous. Let's take a look at why.

Threads and global settings

Many modern web servers are multithreaded, including ours. That means within a single application process, multiple requests are being serviced at once, each within its own thread, each making its own database requests.

When our sample dumping code is triggered as the result of one request, when it switches the on the base class, it is switching that attribute for all threads running on the server. Which means that for a period of time, all the unrelated requests being made by users of the application suddenly started reading from and writing to the database instead of the main database file.

Repository switched for all threads

Repository switched for all threads

Which means that whenever someone hits the “dump sample” button, some users don't see the data they were looking for; some user data gets lost; and we wind up with some unexpected extra records in our sample database.

We back this change out, apologize to our users, and reconsider our options.

A new approach

Fortunately, we control the ShmactiveRecord library, so we can make changes to it. First we think about making it possible to pass in an optional parameter to ShmactiveRecord calls.

But this turns out to be a huge headache. Just to make one small part of our script work, we have to modify several methods in order to make it possible to pass this extra information down to where it is needed. We realize that in order for this approach to be generalized, we'd need to modify every single method, most methods on -derived classes, and probably a number of other helper methods as well. If clean design is only having to change the code in one place to add a feature, this is about as dirty as it gets.

What we really need is a kind of “side band”—a channel of information that enables us to pass options down through arbitrary layers of method call without actually changing the methods in between.

Thread-local variables

Ruby gives us such a sideband in the form of thread-local variables.

A thread-local variable is what it sounds like: a variable that is localized to a single thread. We can set a thread-local variable using by treating a thread as if it were a , using the square brackets to set and retrieve values using symbol keys. There are some other hash-like methods, such as to find out if a particular key is set, and , to get a list of all the variables set on this thread.

For some reason there is no for thread-locals. Instead, rather bizarrely, we assign the value to a variable in order to get rid of it.

By the way, since Ruby 1.9.3, these aren't technically thread-local variables; they are fiber-local. We'll talk about what that means in another episode. For our purposes today, they behave indistinguishably from thread-local variables.

We can show that these variables are thread-local by starting a second thread.

But first, we'll set a thread-local on the main thread.

Then, inside the new thread, we try to print the value of the that thread-local, and then set it to a different value.

Then when this thread has finished, we check the values of the variables in both the main thread and the child thread.

As we can see, the child thread shares none of the main thread's variables.

Using thread-locals for library settings

Let's apply this knowledge to ShmactiveRecord. We change the getter and setter methods to set a thread-local variable instead of a class instance variable.

Note that we prefix the variable name with the name of the library. Since thread-local variables are visible to all code running in the current thread, we need to take precautions to make it unlikely that the variable names we pick will conflict with names chosen by other libraries that also use thread-locals.

That's the only change we have to make. Our code for taking samples will now only alter the repository for the current thread. All other threads, with their own repository objects in their own thread-local variables, will remain blissfully unaware of the change.

Only one thread is affected

Only one thread is affected

Conclusion

Thread-local variables are tools to be used sparingly. From the perspective of a single thread they have all the problems of any other kind of global mutable state. But occasionally we need a way to alter the environment of all the code running in a single thread, without affecting any other threads. For those cases, thread local variables give us exactly what we need.

Happy hacking!