Using Datadog APM to Improve the Performance of Homebrew | Datadog

Using Datadog APM to improve the performance of Homebrew

Author Andrew Robert McBurney
@armcburney

Published: April 24, 2018

As a Software Engineering Intern on the open source team at Datadog, I contribute to various open source projects using Datadog tools to track down bugs and performance related bottlenecks.

One of the tools I use is Application Performance Monitoring (APM). APM allows you to instrument your code with a client tracing library, letting you view the breakdown of method performance across your application. It’s effective at finding performance bottlenecks, and for comparing the performance of mutually exclusive implementations when making a design decision.

Homebrew

The first open source project I contributed to at Datadog was Homebrew: a popular package management system for macOS. Being a macOS user for many years, I’ve used Homebrew on numerous occasions; I wanted to give back to the project which saved me countless hours of installation and configuration headaches.

Homebrew is used extensively by the engineering teams at Datadog, as it’s part of the standard developer toolkit. Because of this, improving the performance of Homebrew saves engineers time across the entire organization.

brew linkage is a development-specific Homebrew command used to check the library links of an installed formula (Homebrew’s term for a package).

There was an open GitHub issue with a help-wanted label seeking to make brew linkage faster. A performant brew linkage command could be run automatically to detect when a brew reinstall is needed because the user is using an optional dependency.

From the GitHub issue, the requirements for the solution were:

Ideally on an SSD and modernish hardware it should be able to scan ~50 packages (with some large ones like boost) in under 5 seconds.

Investigation of performance bottleneck

I instrumented Homebrew’s code using Datadog’s ddtrace gem for Ruby.

Gem usage

The gem allows users to track the performance of blocks of code they trace:

require "ddtrace"

def some_function
  # The time spent to complete this 'block' is tracked by the APM
  Datadog.tracer.trace("some_function", service: "my_app") do
    sleep 2
    Datadog.tracer.trace("some_expensive_operation") { sleep 3 }
    sleep 1
  end
end

From here, users may navigate to the APM user interface to view traces in a flame graph:

Datadog flame graph for Ruby function 'some_function'.

Usage in Homebrew

To pinpoint the bottleneck, I instrumented Homebrew from the ‘root’ point of execution (in Library/Homebrew/dev-cmd/linkage.rb) to account for the command’s total execution time and all the subsequent methods called by the linkage function.

Analyzing the trace data in a flame graph, I was able to pinpoint the source of the performance issues to the check_dylibs function in the LinkageChecker class.

Datadog flame graph showcasing Homebrew's instrumentation.

Flame graph showcasing the breakdown of time spent for methods instrumented during execution.

Optimization solutions

I investigated a solution utilizing multi-threading to speed up the processing of the LinkageChecker::check_dylibs function. I used Ruby’s Thread primitives to see if concurrency would get the desired performance. However, the limitations of the GIL made this solution inviable and did not meet the required performance specification.

I looked into viable alternatives to improve performance and decided to implement a caching mechanism using SQLite3: an on-disk light database used for persistent storage.

Initial solution

SQLite3 caching implementation

The most expensive operation for the brew linkage command was the processing required in the LinkageChecker::check_dylibs function. The function appends path locations to arrays corresponding to the type of library or dependency (i.e., system_dylibs, broken_dylibs, undeclared_deps, etc.).

I modeled a single-table schema for caching relevant linkage information required by the LinkageChecker. The data fetched could be post-processed to meet one of the two formats expected by the printer function. The schema was defined as follows:

CREATE TABLE IF NOT EXISTS linkage (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL,
  path TEXT NOT NULL,
  type TEXT NOT NULL CHECK (type IN (
    'system_dylibs', 'variable_dylibs', 'broken_dylibs', 'undeclared_deps',
    'unnecessary_deps', 'brewed_dylibs', 'reverse_links')),
  label TEXT CHECK (label IS NULL OR (type IN ('brewed_dylibs', 'reverse_links'))),
  UNIQUE(name, path, type, label) ON CONFLICT IGNORE
);

Internally, the application could insert, mutate or fetch data from the cache using a SQL-syntax:

db = SQLite3::Database.new "#{HOMEBREW_CACHE}/linkage.db"

# Storing path data in the cache for a particular key and linkage type
db.execute(
  <<~SQL
    INSERT INTO linkage (name, path, type)
      VALUES ('key', 'path_name', 'linkage_type');
  SQL
)

# Fetching path data from the cache for a particular key and linkage type
db.execute(
  <<~SQL
    SELECT path FROM linkage
      WHERE type = 'linkage_type'
      AND name = 'key';
  SQL
).flatten

Performance enhancements

I noticed significant performance gains with the caching in-place. Below are two flame graphs showcasing the breakdown of execution time for the command with 106 packages (including larger packages such as boost). The command ran in under 5 seconds with the caching mechanism utilized, as requested in the requirements.

Flame graphs

Without the optimization

Without the caching, the command took 11.5 seconds to complete processing 106 packages. The library boost took 1.01 seconds to complete checking the dynamic libraries.

Datadog flame graph showcasing SQLite3 implementation with no caching.

With the optimization

With the caching, the command took 182 milliseconds to complete processing 106 packages. The library boost took 1.38 milliseconds to complete checking the dynamic libraries.

Datadog flame graph showcasing SQLite3 implementation with caching.

Second solution

pstore caching implementation

After the initial implementation, I submitted a Pull Request to Homebrew, asking for a code review from the maintainers. It was recommended I try a different implementation using Ruby’s pstore, a file-based persistence mechanism based on Ruby’s Hash. The principle advantage of using pstore over SQLite3 is that it doesn’t require a third-party gem dependency.

I created a branch off of my SQLite3 implementation to compare the performance of using pstore over SQLite3 as the caching mechanism. The abstractions from the first implementation made it easy to swap out the SQLite3-coupled database code for the new pstore implementation.

My goal was to keep the public interface of the LinkageStore class as similar as possible for the pstore implementation for comparison reasons. However, I did not need to define a schema for the cache since pstore stores Ruby objects to the data store file with Marshal: a library which converts Ruby objects into a byte stream.

Below is an example of fetching and mutating data for the pstore implementation:

db = PStore.new "#{HOMEBREW_CACHE}/linkage.pstore"

# Storing path data in the cache for a particular key and linkage type
db.transaction do
  db["key"] = { "linkage_type" => "path_name" }
end

# Fetching path data from the cache for a particular key and linkage type
db.transaction do
  db["key"]["linkage_type"]
end

Comparison of SQLite3 and pstore

The pstore implementation has nicer syntax for storing and fetching data using Ruby, since pstore transactions have a similar syntax to Ruby’s native Hash get/set syntax, whereas the SQLite3 implementation uses a SQL syntax.

However, the biggest downside to the pstore implementation is the performance for fetching and persisting data. Using pstore caching is slower than the original implementation without any caching. Persisting the Marshaled byte-stream data on-disk and fetching the data then loading it into a Ruby object are expensive operations.

With caching, the command took 42.2 seconds to complete for just two packages! One fetch_path_values! method call (illustrated in the graph below) took 3.51 seconds total. For comparison, this took more time fetching information from two packages than the SQLite3 implementation took fetching information from 106 packages.

Datadog flame graph showcasing pstore implementation with caching.

I recommended to the maintainers we use SQLite3 over pstore for the caching mechanism, as I believed the tradeoff of installing another gem was worth the performance benefits of using SQLite3 for caching.

Third solution

DBM caching implementation

After investigating the performance differences of the pstore and SQLite3 implementations, it was recommended by the maintainers to look into an implementation using a Berkeley DB. A Berkeley DB is another on-disk persistent storage database which is commonly used for caching. Ruby provides a module for interfacing with a Berkeley DB called DBM, which like pstore, doesn’t require a third-party gem dependency.

Inserting and mutating data from the cache was similar to the pstore implementation, but with some extra parsing needed for fetching, since all Ruby hashes and arrays are persisted as strings:

db = DBM.open("#{HOMEBREW_CACHE}/#{name}", 0666, DBM::WRCREAT)

# Storing path data in the cache for a particular key and linkage type
db["key"] = { "linkage_type" => "path_name" }

# Fetching path data from the cache for a particular key and linkage type
JSON.parse(db["key"].gsub("=>", ":"))["linkage_type"]

Comparison of SQLite3 and DBM

origin/master

Without the optimization

The baseline without caching (i.e., the time it takes for brew linkage to complete on origin/master) took 8.04 seconds to complete for 99 packages. The library boost took 704 milliseconds to complete checking the dynamic libraries and printing the output.

Datadog flame graph showcasing origin/master implementation.

SQLite3

Without the optimization

Without the caching, the command took 10.3 seconds to complete for 99 packages. The library boost took 777 milliseconds to complete checking the dynamic libraries and printing the output.

Datadog flame graph showcasing SQLite3 implementation performance with no caching.

With the optimization

With the caching, the command took 180 milliseconds to complete for 99 packages. The library boost took 1.77 milliseconds to complete checking the dynamic libraries and printing the output.

Datadog flame graph showcasing SQLite3 implementation performance with caching.

DBM

Without the optimization

Without the caching, the command took 8.14 seconds to complete for 99 packages. The library boost took 686 milliseconds to finish checking the dynamic libraries and printing the output.

Datadog flame graph showcasing DBM implementation performance with no caching.

With the optimization

With the caching, the command took 208 milliseconds to complete for 99 packages. The library boost took 2.15 milliseconds to complete checking the dynamic libraries and printing the output.

Datadog flame graph showcasing DBM implementation performance with caching.

Summary

I’ve summarized the performance comparison between the SQLite3 implementation and the DBM implementation below:

ImplementationNo OptimizationCache Optimization
origin/master8.04 secondsN/A
SQLite310.3 seconds180 milliseconds
DBM8.14 seconds208 milliseconds

Observations

  • It took longer to build the cache (i.e., the No Optimization column) for both SQLite3 and DBM implementations than it currently takes on origin/master
  • The DBM implementation built the cache faster than the SQLite3 implementation
  • It’s much faster to run the brew linkage command when the values are cached (i.e., the Cache Optimization column for DBM and SQLite3)
  • The SQLite3 implementation is faster at accessing values from the cache than the DBM implementation (but not by much)
  • The implementation met the requirements specification, by processing > 50 packages in under 5 seconds

Recommendations

I recommended proceeding with the DBM implementation due to the following:

  • The DBM solution doesn’t require a vendored gem like the SQLite3 implementation
  • The SQLite3 gem requires building native extensions from C with rake/extensioncompiler
  • Setting the cache is slightly faster than the SQLite3 implementation, and while the SQLite3 solution is faster at fetching data from the cache, it’s not faster by much
  • Cleaner to read the code and understand the cache structure/schema
  • Fetching and mutating data uses Ruby’s Hash get/set conventions over a SQL syntax

Conclusion

When most people think of Datadog, they probably think of cloud infrastructure monitoring, metrics, or alerting. Application Performance Monitoring opens up a variety of different use-cases including performance benchmarking, bug squashing, and finding bottlenecks in your applications.

Making data-driven decisions is an essential part of being an engineer. The performance data collected from the APM helped to guide my engineering judgment when implementing features and making design decisions.

APM provides a means to show both technical and non-technical stakeholders how an application performs. The graphs in the user interface make it easy to see which parts of your application take the most time, allowing for an engineer to optimize and compare the performance of their changes.


Update: Andrew returned to the University of Waterloo following his internship, but continued to work on caching in Homebrew. We’re thrilled to announce that his Pull Request was merged and should be included in the upcoming release of Homebrew (1.6.5).


Datadog 💜 open source

Datadog is a proud user of and contributor to many open source projects. As part of our continued commitment to open source, we’re always looking for ways we can give back to the community.

If you have an open source project or initiative that you feel would benefit from application performance metrics discussed in this article, we’d like to help. You may apply here for free access to Datadog.