Understanding Ruby Thread-Local Variables

You’ve had it hammered into your head over and over as a developer: global data is bad. Global, mutable data is even worse. And when threads enter into the mix, global mutable data is a disaster waiting to happen.

But global data isn’t always obvious. Sometimes, it takes the form of library-wide or class-wide configuration. Configuration that should never change at runtime… until the day that you find that it does need to change.

When that happens, updating your code to be thread-safe can be a nightmare. Fortunately, in some cases there’s a way around all that work. But to use it, you need to understand how to use thread-local variables in Ruby.

(This post is adapted from RubyTapas Episode 161)

Supposing we have a library which implements an extremely basic disk-based persistent object store. We’ll call it “ShmactiveRecord”.

require "pstore"

class ShmactiveRecord
  def self.repository
    @repository ||= PStore.new("database.pstore")
  end

  def self.repository=(new_repo)
    @repository = new_repo
  end

  def self.all
    transaction(read_only: true) do |store|
      Array(store[name])
    end
  end

  def self.find(id)
    all.detect{|r| r.id == id.to_i}
  end

  attr_reader :id

  def ==(other)
    id == other.id
  end

  def save
    @id ||= self.class.all.map(&:id).max.to_i + 1
    transaction do |store|
      store[self.class.name] ||= []
      store[self.class.name].delete(self)
      store[self.class.name] << self
    end
  end

  def delete
    self.class.repository.transaction do |store|
      Array(store[self.class.name]).delete(self)
    end
  end

  private

  def self.transaction(read_only: false, &transaction_body)
    ShmactiveRecord.repository.transaction(read_only, &transaction_body)
  end
  def transaction(**options, &transaction_body)
    self.class.transaction(**options, &transaction_body)
  end
end

We use ShmactiveRecord by inheriting business model classes from it.

class Purchase < ShmactiveRecord
  attr_reader :product, :time, :total

  def initialize(product, time, total)
    @product = product
    @time    = time
    @total   = total
  end
end

Those classes are then able to store records to disk, read records back in, find records by their ID, and remove records.

require "./purchase"

File.delete("database.pstore")

p = Purchase.new("Confident Ruby", Time.now, 55)
p.save
Purchase.all
# => [#<Purchase:0x000000010b6a70
#      @id=1,
#      @product="Confident Ruby",
#      @time=2013-11-20 18:41:04 -0500,
#      @total=55>]
Purchase.find(1)
# => #<Purchase:0x00000000ccb5a0
#     @id=1,
#     @product="Confident Ruby",
#     @time=2013-11-20 18:41:04 -0500,
#     @total=55>

All of these CRUD operations depend for their implementation on a “repository” object. In a proper ORM library, this repository might be an abstraction over a database connection. In our case, the repository is simply an instance of PStore. If you’re not familiar with PStore, it’s a simple file-based key-value storage library that comes with Ruby. We’ll talk about it more in another episode.

Every ShmactiveRecord class or instance delegates back to the ShmactiveRecord base class for its repository. By default, this global repository is set to a PStore that writes a file called database.rb.

def self.repository
  @repository ||= PStore.new("database.pstore")
end

But the repository can also be specified explicitly with a setter method.

def self.repository=(new_repo)
  @repository = new_repo
end

A simple application

Let’s say that we are, however improbably, using this barebones persistence framework to store business data in a web application.

class Purchase < ShmactiveRecord
  attr_reader :product, :time, :total

  def initialize(product, time, total)
    @product = product
    @time    = time
    @total   = total
  end

  def self.take_sample(n)
    all.sample(n)
  end
end

Let’s say further that there is rather a lot of data on the production server, and we want to skim off a smaller subset of the records into a sample database, which we can then use for tests and development.

In order to do this, we write a script. It selects the sample records, delegating to a helper method on the record class for the details of how to grab a sample. Then it changes the global repository over to the sample database, and tells all the sampled records to save themselves. With the repository updated, the records are written back to the new sample file rather than the original database.

# dump_sample.rb
def dump_sample  
  sample = Purchase.take_sample(500)
  ShmactiveRecord.repository = PStore.new("sample.pstore")
  sample.each(&:save)
end

We put this script in a Rake task and use it whenever we want a fresh sample database.

Then one day, the folks in QA ask if we could make it possible for them to regenerate the sample database just by hitting a button in the app’s admin area. Eager to please, we take our script and put it into a new code path in the app itself.

# dump_sample.rb
def dump_sample  
  sample = Purchase.take_sample(500)
  ShmactiveRecord.repository = PStore.new("sample.pstore")
  sample.each(&:save)

end

Just to make sure it doesn’t change anything permanently, we update the code to reset the repository to be database.pstore after it is finished.

# dump_sample.rb
def dump_sample  
  sample = Purchase.take_sample(500)
  ShmactiveRecord.repository = PStore.new("sample.pstore")
  sample.each(&:save)
  ShmactiveRecord.repository = PStore.new("database.pstore")
end

This change is nothing short of disastrous. Let’s take a look at why.

Threads and global settings

Many modern web servers are multithreaded, including ours. That means within a single application process, multiple requests are being serviced at once, each within its own thread, each making its own database requests.

When our sample dumping code is triggered as the result of one request, when it switches the repository on the ShmactiveRecord base class, it is switching that attribute for all threads running on the server. Which means that for a period of time, all the unrelated requests being made by users of the application suddenly started reading from and writing to the sample.pstore database instead of the main database file.

Repository switched for all threads

Which means that whenever someone hits the “dump sample” button, some users don’t see the data they were looking for; some user data gets lost; and we wind up with some unexpected extra records in our sample database.

We back this change out, apologize to our users, and reconsider our options.

A new approach

Fortunately, we control the ShmactiveRecord library, so we can make changes to it. First we think about making it possible to pass in an optional repository parameter to ShmactiveRecord calls.

But this turns out to be a huge headache. Just to make one small part of our script work, we have to modify several methods in order to make it possible to pass this extra information down to where it is needed. We realize that in order for this approach to be generalized, we’d need to modify every single ShmactiveRecord method, most methods on ShmactiveRecord-derived classes, and probably a number of other helper methods as well. If clean design is only having to change the code in one place to add a feature, this is about as dirty as it gets.

require "pstore"

class ShmactiveRecord
  # ...
  def save(repository: self.repository)
    @id ||= self.class.all.map(&:id).max.to_i + 1
    transaction(repository: repository) do |store|
      store[self.class.name] ||= []
      store[self.class.name].delete(self)
      store[self.class.name] << self
    end
  end

  def self.transaction(
      read_only: false, 
      repository: ShmactiveRecord.repository, 
      &transaction_body)
    repository.transaction(read_only, &transaction_body)
  end

  def transaction(**options, &transaction_body)
    self.class.transaction(**options, &transaction_body)
  end
  # ...
end

# dump_sample.rb
def dump_sample  
  sample = Purchase.take_sample(500)
  sample_repository = PStore.new("sample.pstore")
  sample.each do |purchase|
    purchase.save(repository: sample_repository)
  end
end

What we really need is a kind of “side band”—a channel of information that enables us to pass options down through arbitrary layers of method call without actually changing the methods in between.

Thread-local variables

Ruby gives us such a sideband in the form of thread-local variables.

A thread-local variable is what it sounds like: a variable that is localized to a single thread. We can set a thread-local variable using by treating a thread as if it were a Hash, using the square brackets to set and retrieve values using symbol keys. There are some other hash-like methods, such as #key? to find out if a particular key is set, and #keys, to get a list of all the variables set on this thread.

thread = Thread.current
thread[:foo] = 42
thread[:foo]                    # => 42
thread.key?(:foo)               # => true
thread.keys                     # => [:foo]

For some reason there is no #delete for thread-locals. Instead, rather bizarrely, we assign the nil value to a variable in order to get rid of it.

thread.keys                     # => [:foo, :bar]
thread[:bar] = nil
thread.keys                     # => [:foo]

By the way, since Ruby 1.9.3, these aren’t technically thread-local variables; they are fiber-local. We’ll talk about what that means in another episode. For our purposes today, they behave indistinguishably from thread-local variables.

We can show that these variables are thread-local by starting a second thread.

But first, we’ll set a thread-local on the main thread.

Thread.current[:bar] = "main value"

Then, inside the new thread, we try to print the value of the that thread-local, and then set it to a different value.

child = Thread.new do
  puts "Starting value in child: #{Thread.current[:bar].inspect}"
  Thread.current[:bar] = "child value"
end
child.join

Then when this thread has finished, we check the values of the variables in both the main thread and the child thread.

puts "Value in main: #{Thread.current[:bar]}"
puts "Value in child: #{child[:bar]}"
# >> Starting value in child: nil
# >> Value in main: main value
# >> Value in child: child value

As we can see, the child thread shares none of the main thread’s variables.

Using thread-locals for library settings

Let’s apply this knowledge to ShmactiveRecord. We change the repository getter and setter methods to set a thread-local variable instead of a class instance variable.

def self.repository
  Thread.current[:shmactive_record_repository] ||= PStore.new("database.pstore")
end

def self.repository=(new_repo)
  Thread.current[:shmactive_record_repository] = new_repo
end

Note that we prefix the variable name with the name of the library. Since thread-local variables are visible to all code running in the current thread, we need to take precautions to make it unlikely that the variable names we pick will conflict with names chosen by other libraries that also use thread-locals.

That’s the only change we have to make. Our code for taking samples will now only alter the repository for the current thread. All other threads, with their own repository objects in their own thread-local variables, will remain blissfully unaware of the change.

Only one thread is affected

Conclusion

Thread-local variables are tools to be used sparingly. From the perspective of a single thread they have all the problems of any other kind of global mutable state. But occasionally we need a way to alter the environment of all the code running in a single thread, without affecting any other threads. For those cases, thread local variables give us exactly what we need.

Happy hacking!

CategoriesFreebies, Language

Tagsasynchrony, concurrency, threads, variables