How to use ApacheBench for web server performance testing

Paul Gottschling

When developing web services and tuning the infrastructure that runs them, you’ll want to make sure that they handle requests quickly enough, and at a high enough volume, to meet your requirements. ApacheBench (ab) is a benchmarking tool that measures the performance of a web server by inundating it with HTTP requests and recording metrics for latency and success. ApacheBench can help you determine how much traffic your HTTP server can sustain before performance degrades, and set baselines for typical response times.

While ApacheBench was designed to benchmark the Apache web server, it is a fully fledged HTTP client that benchmarks actual connections, and you can use it to test the performance of any backend that processes HTTP requests. In this post, we will explain how ApacheBench works, and walk you through the ApacheBench metrics that can help you tune your web servers and optimize your applications, including:

Metrics for request latency, which range from percentile breakdowns of all requests to durations of specific phases within TCP connections.
Metrics for request and connection status, including HTTP status codes and request failures.

apachebench-logs-spike — Web server access logs in Datadog showing requests from ApacheBench.

How ApacheBench works

Installing ApacheBench

ApacheBench is a command line tool that is bundled in the apache2-utils package. To install ab, run the following commands on Debian/Ubuntu platforms:

1
apt-get update
2
apt-get install -y apache2-utils

Configuring ApacheBench

ApacheBench allows you to configure the number of requests to send, a timeout limit, and request headers. ab will send the requests, wait for a response (up to a user-specified timeout), and output statistics as a report.

You can run an ApacheBench command with the following format:

1
ab <OPTIONS> <WEB_SERVER_ADDRESS>/<PATH>

The only required argument is the address of the web server, followed by a / (URLs without the trailing / will cause ab to return an error) and an optional URL path. If you don’t specify any options, ApacheBench will send a single request. ab’s options allow you to adjust the volume of requests, as well as (for specialized cases) their headers and request bodies. Some commonly used options include:

-n: The number of requests to send
-t: A duration in seconds after which ab will stop sending requests
-c: The number of concurrent requests to make

If you’re using both the -t and the -n flags, note that-t should always go first—otherwise ApacheBench will override the value of -n and assign the default -n value of 50000, or 50,000 requests. Additionally, ab runs on a single thread—the -c value tells ab how many file descriptors to allocate at a time for TCP connections, not how many HTTP requests to send simultaneously. The -c flag does allow ab to complete its tests in less time, and simulates a higher number of concurrent connections. The two timeseries graphs below, for example, show the number of concurrent connections as well as requests per second for the command:

1
ab -n 100000 -c 1000 <SERVER_ADDRESS>

Constraints

When using ApacheBench, you’ll want to understand its constraints, which stem from the fact that ab measures actual HTTP requests between two servers that may or may not run on the same host. First, you’ll want to make sure that the host conducting an ab test has enough CPU resources to make all TCP connections you’ve specified, as well as enough memory available to store the state of each connection (ab allocates memory based on the number of requests).

apachebench-live-proc-mem — Memory utilization by one invocation of the ab command, as seen in Datadog's Live Processes view.

Other constraints include the fact that ApacheBench’s send-everything-at-once approach may not reflect the way your application handles requests over time; and the distance between the server and ab client (e.g., the same host or different availability zones) will affect latency scores. As a result, ab requests may not reflect the sorts of latencies you can expect to see in your usual production load. You’ll get the most out of ApacheBench when using the relative results of your tests—not “How fast are responses?” but “Are responses getting faster?”—to see how effective your efforts are to optimize your application code, server configuration, load balancing architecture, and so on.

How to interpret ApacheBench

After finishing its tests, ApacheBench produces a report that resembles the code snippet below. Later in this section, we will explain the different metrics provided in this report, such as:

How long requests took to complete, including latency for different parts of the request-response cycle
How successful the requests were, including how many requests received non-2xx responses, and how many requests failed

1
Server Software:        AmazonS3
2
Server Hostname:       <SOME_HOST>
3
Server Port:            443
4
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
5
TLS Server Name:        <SOME_HOST>
6

7
Document Path:          /
8
Document Length:        45563 bytes
9

10
Concurrency Level:      2
11
Time taken for tests:   3.955 seconds
12
Complete requests:      100
13
Failed requests:        0
14
Total transferred:      4625489 bytes
15
HTML transferred:       4556300 bytes
16
Requests per second:    25.29 [#/sec] (mean)
17
Time per request:       79.094 [ms] (mean)
18
Time per request:       39.547 [ms] (mean, across all concurrent requests)
19
Transfer rate:          1142.21 [Kbytes/sec] received
20

21
Connection Times (ms)
22
              min  mean[+/-sd] median   max
23
Connect:       40   53   8.1     51      99
24
Processing:    12   24   9.4     23      98
25
Waiting:        5   14  10.6     12      95
26
Total:         57   77  15.0     75     197
27

28
Percentage of the requests served within a certain time (ms)
29
  50%     75
30
  66%     77
31
  75%     80
32
  80%     81
33
  90%     85
34
  95%     92
35
  98%    116
36
  99%    197
37
 100%    197 (longest request)

Request latency metrics

Top-level request latency metrics

Time taken for tests

Time taken for tests measures the duration between when ApacheBench first connects to the server and when it receives the final response or is interrupted with Ctrl-C. This metric is useful if you’re conducting repeated ab tests while optimizing your application or tuning your web server, and want a single score to consult for signs of improvement.

Time per request

ApacheBench provides two variations on this metric, and both depend on the number of responses that ab has finished processing (done), as well as the value of the metric Time taken for tests (timetaken). Both multiply their results by 1,000 to get a number in milliseconds.

The first Time per request metric doesn’t take the concurrency value into account:

1
timetaken * 1000 / done

The second version of Time per request accounts for the number of concurrent connections the user has configured ab to make, using the -c option (concurrency):

1
concurrency * timetaken * 1000 / done

If you set a -c value greater than 1, the second Time per request metric should (in theory) provide a more accurate assessment of how long each request takes.

apachebench-live-proc-cpu — ab as seen within Datadog's Live Processes view, demonstrating that ab uses a single CPU thread despite a high concurrency value.

You should treat both Time taken for tests and Time per request as rough indicators of web server performance under specific levels of load. Tracking these values over time can help you assess your optimization efforts, and unusually high or low values can point to CPU saturation on the web server, code changes, or other events that you’ll want to investigate in more detail.

Metrics for individual requests

For each connection it makes, ab stores five timestamps:

start: when ab began making a connection with the web server
connect: when ab finished making the connection and began writing the request
endwrite: when ab finished writing the request
beginread: when ab began reading from the response
done: when ab closed the connection

ab uses these timestamps to create a data object for each connection, containing the following properties:

starttime: the start property
waittime: How long ab spent waiting before it began reading from the response, after it finished writing the request (i.e., beginread - endwrite)
ctime: How long ab spent making a connection with the web server (i.e., connect - start)
time: How long ab spent handling the connection, from the beginning of its attempt to contact the server to when the connection closed (i.e., done - start)

From there, ab uses the data object to calculate the remaining latency-related metrics: aggregated connection times, percentiles, and, if you use the -g flag, data for individual requests.

Aggregated connection times

After inspecting ApacheBench’s top-level latency metrics, you can track the duration of different stages of the request-response cycle. ab aggregates properties from its per-request data objects to calculate the min, mean, sd (standard deviation), median, and max of each stage, all in milliseconds:

Connect: How long it takes ab to establish a TCP connection with the target server before writing the request to the connection (ctime)
Processing: How long the connection was open after being created (time - ctime)
Waiting: How long ab waits after sending the request before beginning to read a response from the connection (waittime)
Total: The time elapsed from the moment ab attempts to make the connection to the time the connection closes (time)

The output will resemble the following:

1
Connection Times (ms)
2
              min  mean[+/-sd] median   max
3
Connect:       42   71  24.3     63     198
4
Processing:    13   32  11.5     33      80
5
Waiting:        5   19  10.7     15      40
6
Total:         65  103  30.9     95     233

These metrics give you a more nuanced understanding than Time taken for tests and Time per request, allowing you to see which part of the request-response cycle was responsible for the overall latency. Further, these metrics are based on the data object for each request, rather than the start and end times of ab’s tests.

In the above example, we can see that Connect was on average (i.e., in both mean and median) the longest part of the cycle, though also the one with the highest standard deviation. Since the Connect metric depends on client latency as well as server latency, we could investigate each of these, determining which side of the connection is responsible for the variation. Then we’d consider focusing our optimization efforts on how our backend handles new TCP connections.

Percentiles

The final ApacheBench report also includes a breakdown of request latency percentiles, giving you a more detailed view of request latency distribution than the standard deviations within the Connection Times table. The percentile breakdown resembles the following. In the report, “certain time” refers to the value of the time property (defined above).

1
Percentage of the requests served within a certain time (ms)
2
  50%     95
3
  66%    102
4
  75%    107
5
  80%    115
6
  90%    138
7
  95%    163
8
  98%    229
9
  99%    233
10
 100%    233 (longest request)

Unlike the Connection Times table, these metrics are not broken down by stage of the request-response cycle. As a result, ab’s latency metrics are best viewed together to get a comprehensive view of web server performance:

Use the top-level latency metrics to get a quick read of the server’s overall performance, especially in comparison to other ab tests applied in similar conditions.
From there, use the higher-granularity aggregated connection times to find out which part(s) of the request-response cycle to investigate further.
Finally, percentiles tell you how much variation there is between the fastest and slowest requests, so you can address any long-tail latencies.

Latency of individual requests

ApacheBench can also display data about each connection in tab-separated values (TSV) format, allowing you to calculate values that are not available within the standard ab report, such as wait time percentiles. This data comes from the same data objects that ab uses to calculate Connection Times and percentiles. These per-request values are:

starttime: The time the connection began, printed in the ctime() format (from the starttime property of the connection)
seconds: The same value as starttime, but given as the number of seconds since the Unix Epoch
ctime: The time it takes to connect to the server before making this request, in milliseconds
dtime: How long an established connection was open, in milliseconds (i.e, time - ctime)
ttime: The total time spent on this connection (i.e., the time property) in milliseconds
wait: The time spent waiting between sending the request and reading from the response (i.e., the waittime property), in milliseconds)

To access per-request data in TSV format, use the -g flag in your ab command, specifying the path to the output file. For example, let’s say you run this command:

1
ab -n 100 -g /path/to/plot.tsv <WEB_SERVER_ADDRESS>

The first five rows of the data in the plot.tsv file will resemble the following.

1
starttime  seconds  ctime  dtime  ttime  wait
2
Tue Jun 18 15:56:10 2019  1560887770  46  14  60  6
3
Tue Jun 18 15:56:07 2019  1560887767  48  15  63  7
4
Tue Jun 18 15:56:11 2019  1560887771  51  13  63  6
5
Tue Jun 18 15:56:12 2019  1560887772  47  17  64  12
6
Tue Jun 18 15:56:06 2019  1560887766  46  20  66  9

The -g flag gives you flexibility in how you analyze request data. You could, for example, plot the dtime of each request in a timeseries graph (i.e., by using gnuplot, spreadsheet software, or a graphics library), indicating whether increased load on the server affected the performance of requests after a certain point.

Success metrics

Since ApacheBench makes real requests to an HTTP server, it’s important to understand the sorts of responses your web servers are returning. ab allows you to track both non-2xx response codes and errors that arise when making requests.

Non-2xx responses

One way to understand the responses from your web servers is to count those that return an error or a failure. This is helpful if your goal is to benchmark deliberately unsuccessful requests (e.g., the time it takes to load a 404 page after a redesign), or requests that are not guaranteed to return a successful response (e.g., in production environments).

It’s important to know that ab will classify 3xx, 4xx, and 5xx response codes as Non-2xx responses, and report this metric in a format similar to this:

1
Non-2xx responses:      10

You can see which status codes make up the “non-2xx” responses by adding the -v <INTEGER> option when running ab, where a value of 2 will show the headers and body of each response in STDOUT. You can also send HTTP logs from your server to a dedicated monitoring platform like Datadog to aggregate responses by status code.

apachebench-logs-view — ApacheBench logs aggregated by status code in Datadog.

Complete versus failed requests

Connections between ApacheBench and your web server can fail just like any TCP connection, and ab counts failures within four different categories:

Length: After the first response, a subsequent response displays varying content length (in bytes)
Connect: ab’s attempt to connect to a server results in an error
Receive: ab encounters an error when reading data from the connection
Exceptions: ab encounters an error while setting up file descriptors for making TCP connections

The following comes from running an ab command with the -n option set to 1000 and the URL set to www.google.com/.

1
Complete requests:      1000
2
Failed requests:        992
3
   (Connect: 0, Receive: 0, Length: 992, Exceptions: 0)

The high number of Length-related errors is likely due to Google’s dynamic content. In a well-known bug (since marked as “a bug which will never be fixed”),ab will treat requests as failures if the content length of their responses is variable. This is because ab stores the length of the first response it receives, and compares subsequent lengths to that value. This means that you should disregard length-related failures when you’re using ab to test dynamic sites.

In general, the Failed request metrics monitor activity at the transport layer, i.e., with your TCP connections, rather than at the application layer. If you see nonzero results for Failed requests, you’ll want to investigate your web server for possible issues with handling TCP connections.

ApacheBench metrics for better benchmarking

In this post, we’ve shown you how to use ApacheBench to measure the performance of your HTTP servers, explored some limitations of ApacheBench, and explained the metrics that ApacheBench can provide. You can also use Datadog to get full visibility into your HTTP servers, including Apache, NGINX, and Gunicorn, along with 1,000 other technologies. Datadog can provide complete visibility into HTTP response latency by analyzing server logs, tracing requests as they travel across distributed services, collecting network metrics from your hosts, and simulating HTTP requests from clients around the world—check out our free trial.

How to use ApacheBench for web server performance testing

How ApacheBench works

Installing ApacheBench

Configuring ApacheBench

Constraints

How to interpret ApacheBench

Request latency metrics

Top-level request latency metrics

Time taken for tests

Time per request

Metrics for individual requests

Aggregated connection times

Percentiles

Latency of individual requests

Success metrics

Non-2xx responses

Complete versus failed requests

ApacheBench metrics for better benchmarking

Related Articles

Analyzing Tomcat logs and metrics with Datadog

Key metrics for monitoring Tomcat

Collecting metrics with Tomcat monitoring tools

How to monitor Apache web server with Datadog

Start monitoring your metrics in minutes

Get Started with Datadog

Related Articles

Analyzing Tomcat logs and metrics with&nbsp;Datadog

Key metrics for monitoring&nbsp;Tomcat

Collecting metrics with Tomcat monitoring&nbsp;tools

How to monitor Apache web server with Datadog

Related jobs at Datadog

We're always looking for talented people to collaborate with

Start monitoring your metrics in minutes

Analyzing Tomcat logs and metrics with Datadog

Key metrics for monitoring Tomcat

Collecting metrics with Tomcat monitoring tools