Wiretel


Testing linear elastic caching

To ensure our theory holds up in the real world, we conducted extensive experiments using two primary sources:

  1. Production workloads: We integrated the system into Spanner.
  2. Public traces: We tested against a variety of publicly available cache traces from industry benchmarks to ensure the results weren’t specific to Google’s infrastructure.

Production workloads

We developed a practical algorithm that assigns a time-to-live (TTL) to the cached page on each page request based on the page’s access patterns and costs. Because Spanner handles billions of requests per second, this TTL prediction model has to be incredibly lightweight. We opted for a shallow decision tree that can be translated into a few lines of C++ code. The resulting code is also easily interpretable and provides valuable insights on the workload characteristics. This model considers features such as the size of the data, the cost of a cache miss (when data isn’t in the cache and the system needs to retrieve it from some other, slower system like a disk), and the type of database operation being performed to predict the optimal TTL for each page.

We integrated the elastic caching policy into Spanner’s production servers over several months. Compared to a standard fixed-size cache, the results were substantial:

  • Memory usage: Reduced by 15.5%.
  • Cache misses: Increased by only 5.5%.
  • Total cost of ownership (TCO): Reduced by approximately 5%.

Crucially, because the algorithm is “cost-aware,” the small increase in cache misses was concentrated on data that is cheap to fetch from storage, meaning the impact on actual I/O costs was a negligible 0.5%.

Public traces

We also evaluated our elastic caching approach using several publicly available cache traces. We used an optimized implementation of the greedy dual size frequency (GDSF) eviction algorithm — a generalization of the well-known LRU policy that allows for pages of different sizes — as a fixed cache size baseline policy.

We considered four variants of elastic caching depending on which ski rental algorithm we used and whether or not we used a machine learned model. Since the available public traces don’t have application-level features available for training, we didn’t implement decision trees for prediction. Instead, we developed a simple learning strategy that splits each trace in half and uses the first half for training. For each individual page in the training trace, we computed the best TTL for the page that minimizes the cost over the training trace.

Since the behavior of the cache changes depending on what’s initially in the cache, a common practice, known as “warming up”, is to use some prefix of the cache trace to populate the cache but not actually measure performance on it. We warmed up all caches with one day’s worth of requests from the second half of the trace and used the rest for testing and measurements. During the test trace, if we encountered a page that was seen during training, we set the TTL to be the best precomputed TTL for that page. Otherwise, we set the TTL using either the breakeven or randomized policies.



Source link