Repeatedly hitting an HTTP API for new data can slow down your programs. And if you do it too frequently, can get you banned from making further requests.
A solution is to cache the data you receive. But how do you add caching in a well-tested way? How do you keep the caching layer from disrupting existing tests? And what are best practices for caching API response data?
In this episode, you’ll learn answers to all these questions.
Episode Script
Let’s say we have a class that wraps remote weather API. When we call the report
method with a location query, it makes a request to the service, parses the response, fills in a Weather::Report
object with the returned values, and returns the object.
require 'open-uri'
require 'json'
class Weather
Report = Struct.new(:temperature)
def report(query)
key = ENV['WUNDERGROUND_KEY']
url = "http://api.wunderground.com/api/#{key}/conditions/q/#{query}.json"
body = open(url).read
data = JSON.parse(body)
Report.new(data['current_observation']['temp_f'])
end
end
Weather.new.report(17361)
# => #<struct Weather::Report temperature=34.4>
In a production application, hitting an external service every time the program needs data is often a bad idea. Weather reports don’t change on a second-by-second basis. Making a request every time the #report
method is called could slow the program down, and if requests are made often enough they might exceed service-imposed limits, causing future requests to fail.
For all these reasons, we’d like to cache the weather reports we get back from this service. Let’s add a cache to this class, using tests to guide the way. First, we’ll write tests for how the class should interact with a cache collaborator. To do this we’ll pass in a test version of the cache. But what sort of test double should this be? What interface should it support?
#<<v1>>
require 'rspec/autorun'
describe Weather do
describe '#report' do
it 'uses a cached value when available' do
weather = Weather.new(cache: ???)
end
end
end
What if it isn’t a test double at all? What if we just passed a Hash in as as the cache? Let’s see where this takes us. We’ll pass in a hash with one key, an area code, that maps to a Weather::Report
object containing a temperature that the real service is not likely to report.
#<<weather1>>
require 'rspec/autorun'
describe Weather do
describe '#report' do
it 'uses a cached value when available' do
weather = Weather.new(cache: {'17361' => Weather::Report.new(-60.0) })
weather.report.temperature.should eq(-60.0)
end
end
end
We make this pass with a few modifications to the Weather
class. We give it the ability to accept a hash of options on initialization, and have it look for a @cache
in those options. If it doesn’t find one it uses an empty Hash
. We then update the #report
method to to check the cache for an entry for the current query before using the data returned from the service.
Note that we aren’t saying that the cache must always be a Hash
. All this test asserts is that the code can use any object that behaves like a Hash as a cache. For the moment, that just means it has to respond to #fetch.
class Weather
Report = Struct.new(:temperature)
def initialize(options={})
@cache = options.fetch(:cache){ {} }
end
def report(query)
key = ENV['WUNDERGROUND_KEY']
url = "http://api.wunderground.com/api/#{key}/conditions/q/#{query}.json"
body = open(url).read
data = JSON.parse(body)
@cache.fetch(query) {
Report.new(data['current_observation']['temp_f'])
}
end
end
Next, we test that the #report
method refrains from making an HTTP request if it finds an entry in the cache. We do this by requiring the WebMock gem, which fakes out web connections. By default it disables all web connections when it is required, so we don’t actually have to add any new test code. Running the test now fails because the code tries to hit the weather service even though there is a cached report.
require 'rspec/autorun'
require 'webmock/rspec'
describe Weather do
describe '#report' do
it 'uses a cached value when available' do
weather = Weather.new(cache: {'17361' => Weather::Report.new(-60.0) })
weather.report('17361').temperature.should eq(-60.0)
end
end
end
# >> F
# >>
# >> Failures:
# >>
# >> 1) Weather#report uses a cached value when available
# >> Failure/Error: Unable to find matching line from backtrace
# >> WebMock::NetConnectNotAllowedError:
# >> Real HTTP connections are disabled. Unregistered request: GET http://api.wunderground.com/api/d6aaea598a0e4508/conditions/q/17361.json with headers {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}
# >>
# >> You can stub this request with the following snippet:
# >>
# >> stub_request(:get, "http://api.wunderground.com/api/d6aaea598a0e4508/conditions/q/17361.json").
# >> with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
# >> to_return(:status => 200, :body => "", :headers => {})
# >>
# >> ============================================================
# >> # -:14:in `report'
# >> # -:29:in `block (3 levels) in <main>'
# >>
# >> Finished in 0.00167 seconds
# >> 1 example, 1 failure
# >>
# >> Failed examples:
# >>
# >> rspec -:27 # Weather#report uses a cached value when available
We fix this by moving the entire body of the method inside the alternative block for the cache #fetch.
def report(query)
@cache.fetch(query) {
key = ENV['WUNDERGROUND_KEY']
url = "http://api.wunderground.com/api/#{key}/conditions/q/#{query}.json"
body = open(url).read
data = JSON.parse(body)
Report.new(data['current_observation']['temp_f'])
}
end
Now we have code that can use a pre-populated cache, but won’t populate the cache itself. Before we move on to cache population though, let’s take a look at our design.
Right now we are caching a Report
object. There are several potential problems with this:
- Right now we’re using in-memory hashes, but we’ll eventually be
serializing cached data in some kind of persistent key-value
store. Consider what would happen if we made a change to the
Report
class, perhaps renaming thetemperature
field to
temp_f
to indicate that it’s in Fahrenheit. Unless we were
careful to flush all caches when rolling out the new code, we’d
risk causing crashes when new code tried to load and use old-style
Report
objects found in the cache. - Even if we simply added a new attribute to the
Report
class, for
instancewind_speed
, we’d still have to expire our caches and
rebuild them, or risk gettingnil
values for the added field. - Finally, storing an object means we have to be careful never to
make a change to theReport
object which renders it
non-serializable—for instance, storing a lambda in it.
In my experience it’s better to cache raw response bodies—or sometimes even entire responses, including headers—than to cache domain objects. It prevents object version conflicts, since the domain objects are recreated every time. Storing the entire response means that if we start using more of the response, the data will already be available in the cache. And storing the response raw ensures that the data stored is a simple, serialization-friendly String.
Let’s change the code to store the raw response body instead of a Report
object. We’ll also need to update the test to provide raw JSON in the pre-populated cache.
class Weather
# ...
def report(query)
key = ENV['WUNDERGROUND_KEY']
url = "http://api.wunderground.com/api/#{key}/conditions/q/#{query}.json"
body = @cache.fetch(query) {
body = open(url).read
}
data = JSON.parse(body)
Report.new(data['current_observation']['temp_f'])
end
end
# ...
describe Weather do
describe '#report' do
it 'uses a cached value when available' do
json = '{ "current_observation": { "temp_f": -60.0 } }'
weather = Weather.new(cache: {'17361' => json })
weather.report('17361').temperature.should eq(-60.0)
end
end
end
Now let’s add an example that shows the code populating the cache when no match is found. This test starts by setting up a fake web response using WebMock. When the code under test tries to make a request for a weather report, WebMock will intercept it and return our snippet of test JSON data as the response body.
We then set up a cache, a Hash that starts out empty. We instantiate a Weather
object, passing in the cache, and then request a weather report. After the method returns, we check the contents of the cache to verify that it now contains our fake JSON data, keyed under the given query.
Making this test pass simply requires changing the code so that it updates the cache after making a request.
class Weather
# ...
def report(query)
key = ENV['WUNDERGROUND_KEY']
url = "http://api.wunderground.com/api/#{key}/conditions/q/#{query}.json"
body = @cache.fetch(query) {
@cache[query] = open(url).read
}
data = JSON.parse(body)
Report.new(data['current_observation']['temp_f'])
end
end
# ...
describe Weather do
describe '#report' do
# ...
it 'populates the cache with new values' do
json = '{ "current_observation": { "temp_f": -60.0 } }'
expected_url =
%r(http://api.wunderground.com/api/.*/conditions/q/17361.json)
stub_request(:get, expected_url).to_return(body: json)
cache = {}
weather = Weather.new(cache: cache)
weather.report('17361')
cache['17361'].should eq(json)
end
end
end
We now have support for basic caching, using a Ruby Hash as the model for the cache interface. In the next episode we’ll look at how to plug in arbitrary key-value stores as the cache implementation, as well as how to expire cache entries. Until then, happy hacking!
what a great classic!