Staying Out of Deep Water: Performance Testing Using HTTPD-Test's Flood

by Martin Brown

So you've set up your server and users are accessing your Web site; the last thing you want are performance problems with the site. If only you could have tested for them before going live. With the Flood component of the Apache HTTP Project's HTTPD-Test (so named because it floods an HTTP server with requests to test its response times) you can. Martin Brown explains how to install and configure Flood, and offers some real world scenarios.

Once you've set up your server and users are accessing your Web site, the last thing you want to hear about are performance problems with the site. You can test the system manually, but there are limitations to manual-based testing.

One major downside of manual testing (aside from the time investment) is that it doesn't reveal where the real problem with the site lies. Is it a configuration problem with the server, a problem with some dynamic elements, or a more fundamental network performance issue?

The Apache HTTP Project includes a sub-project called HTTPD-Test. As the name suggests, it's a test suite for Apache and HTTP in general. The suite contains a number of different elements, and this article will focus on the one known as Flood.Flood is so named because it is used to flood an HTTP server with requests to test its response times.

Flood uses an XML document with the necessary settings -- URLs and optional POST data -- to send requests to a given server or range of servers. Flood then measures the time it takes to:

  • Open the socket to the server
  • Write the request
  • Read the response
  • Close the socket

With these four criteria being measured, administrators can identify whether the problem is with the Apache configuration (or any other HTTP server), the sheer load and performance of the hardware, or a network bottleneck.

Installing Flood

You can download the packages -- httpd-test and apr/apr-util -- required for building from the CVS server at Apache. You'll need to log in first though (the password is 'anoncvs'):

$ cvs -d :pserver:anoncvs@cvs.apache.org:/home/cvspublic login
$ cvs -d :pserver:anoncvs@cvs.apache.org:/home/cvspublic co httpd-test/flood
$ cd httpd-test/flood
$ cvs -d :pserver:anoncvs@cvs.apache.org:/home/cvspublic co apr
$ cvs -d :pserver:anoncvs@cvs.apache.org:/home/cvspublic co apr-util

Once you have the source, you must build and compile the application using:

$ buildconf
$ configure
$ make all

You're now ready to go!

Configuring Flood

Flood is configured through an XML file that is used to define the parameters for testing the Web site. When testing, Flood uses a profile, which defines how a given list of URLs are accessed. Requests are generated by one or more farmers that are, in turn, members of one or more farms. You can see this more clearly in the illustration below.

As illustrated in the graphic, we have one farm, which specifies two sets of five farmers. Farmer Joe uses ProfileA and a list of five URLs, and farmer Bob uses ProfileB with a list of three URLs. The farmers request the URLs directly from the Web server. Flood uses threads to create the farmers, and then collates the information the farmers collect into a single data file for later processing.

The XML file contains definitions for four main elements: the URL lists, profiles, farmers and farms.

The URL list is just that, a list of URLs to be accessed. URLs can be straight requests or specific types (GET, HEAD, and POST types are supported, as is the capability to supply data accordingly for dynamic driven sites).

Profiles define which URL list to use, how they should be accessed, what type of socket to use, and how the information should be reported.

Farmers are responsible for the actual request process. The only configurable elements are the profile to use and the number of times to process the profile. Profiles are executed sequentially by each farmer but can be repeated, so you would end up accessing, for example, urla, urlb, urla, urlb, and so on.

Farms specify the number of farmers to create and when. By increasing the number of farmers created by a farm, the number of simultaneous requests is increased. Additional settings enable you to create a number of initial farmers, and then increase that number at regular intervals. For example, you could initially create two farmers, then add a new farmer every five seconds up to a maximum of 20. Depending on your URL list and server performance, this could result in a slow rise to 20 simultaneous accesses for a given period, and then a slow fall back to zero. Alternatively, it could give the effect of a regular number of users accessing a set number of pages for a longer duration, with peaks of five or six simultaneous requests.

Note: The current version of Flood supports only one Farm, and it must be called 'Bingo'. You can, however, specify multiple farmer definitions within the single farm, which achieves the same basic effect.

By tuning the farm, farmer, and URL list parameters you can control the number of requests, simultaneous requests, overall duration (as a function of the URL list, repeat count and number of farmers) and how the requests are spread over the duration of the test. This allows you to very specifically test for different situations.

The three basic (and rough) rules of configuration with Flood to remember are:

  • The URL list defines what your farmers visit.
  • The repeat count for a farmer defines a number of users accessing your site.
  • The farmer count for a farm defines the number of simultaneous users.

A sample configuration for Flood is in the examples folder that came with the distribution; round-robin.xml is probably the easiest to one to start with. This article, however, will not discuss the specifics of editing the XML, or even processing the data file generated.

Instead, we will examine how to tune the parameters to test different types of Web sites. To help understand the implications of the next section, here's a quick look at the results of the analyze-relative script from the examples directory. In this case it shows the results of a test on an internal server:

Slowest pages on average (worst 5):
   Average times (sec)
connect write   read    close   hits   URL
0.0022  0.0034  0.0268  0.0280  100    http://test.mcslp.pri/java.html
0.0020  0.0028  0.0183  0.0190  700    http://www.mcslp.pri/
0.0019  0.0033  0.0109  0.0120  100    http://test.mcslp.pri/random.html
0.0022  0.0031  0.0089  0.0107  100    http://test.mcslp.pri/testr.html
0.0019  0.0029  0.0087  0.0096  100    http://test.mcslp.pri/index.html
Requests: 1200 Time: 0.14 Req/Sec: 9454.08

From these results you can see the average connect, write (request), read (response), and close times for a single page. You also get a basic idea of the number of requests handled per second by the server.

Testing 'News' Style Web Sites

The majority of news-style Web sites -- New York Times, Slashdot, ServerWatch, and even many blog sites -- have a main index page, which everybody accesses, that contains the links to the main stores and typically one or more 'story' pages when a user deems the story interesting enough to read in full.

In general, this results in a fairly steady stream of people hitting the main page and a variable number hitting specific other pages. If a site publishes an RSS/RDF feed it will also see a fair number of accesses directly to story pages without first viewing the home page. Nearly all of these types of site use dynamic elements, and using Flood is also a good way to test the dynamic performance of your site, especially if you can compare that to a raw HTML-based response.

You can simulate the news style requests through Flood by using the following settings:

 Farmer Set AFarmer Set BFarmer Set C
URL ListHomepage OnlyHomepage +3 stories3 story pages
Repeat Count133
Start Count555
Start Delay155
NotesHome page onlyHomepage+StoriesStories only (RSS)

Testing Shopping Sites

Shopping sites, online product catalog, and other more interactive sites have a different usage profile. Although some people will hit your home page, some will come in directly to another page within your site. Most users will also spend more time browsing around the site -- they'll look at a product page for a number of products, perhaps do some searches and even click through to other related or similar products.

Thus, you should test with a higher number of URLs in the list, larger repeat counts (to simulate a larger number of users), and lower simultaneous access than with news site:

 Farmer Set
URL List10-15 pages
Repeat Count5
Start Count5
Start Delay5

Testing the "Slashdot" Effect

Occasionally, a situation will arise where you must check how your system copes with thousands of users trying to access the site at the same time. Many sites have already experienced the "slashdot" effect, whereby a mention of a particular page on the Slashdot Web site (www.slashdot.org) results in these large, simultaneous, requests.

Typically these requests are for only one page, and we can test for that with Flood by creating hundreds, or thousands, of farmers simultaneously accessing the server for just that one page. To simulate the rapid sequential access by a number of readers over a period of time, set a high repeat count and use the delay system to ramp up the requests to their high point.

 Farmer Set
URL List1 page
Repeat Count50
Start Count100
Start Delay1

Tips for Testing

For any of these tests to work properly, you must keep a few things in mind:

  • First and foremost, don't run Flood from the same machine as your HTTP server. If you do this you're only testing a single machine's capability to open network sockets and communicate with itself. Test from another machine on the network.
  • Keep in mind the technical limits of the machine you test from. Your machine is capable of only so many threads and network sockets; trying to create too many farmers may result in inefficient and misleading testing.
  • Flood is a client-side solution, so nothing is stopping you from executing the tests simultaneously on multiple machines. In fact, we recommend that, as it's the only way to reliably flood test a server to the point of saturation before reaching the client's own limits.
  • Flood is reliant on the client host for performance. If the client test machine is busy, Flood will have just as many hurdles as any other application in terms of processing requests. In dedicated Web farm environments I've set a single machine whose sole responsibility is testing. If you are using a machine normally devoted to other tasks, shut them down before starting Flood.

Future articles will examine ways to summarize the report information and how to test more complex sites and servers.

This article was originally published on Tuesday Jun 3rd 2003
Mobile Site | Full Site