The Perl You Need to Know: Benchmarking Perl

Tuesday Jan 23rd 2001 by Aaron Weiss

This installment of The Perl You Need To Know to learn covers the Benchmark module -- the handy Perl stopwatch with which we can time, optimize, and slim down on code.

Originally appearing in the Web Developer's Virtual Library.

One of the most popular events of the summer Olympics is the men's 100 meter sprint, an event that has appeared at every Olympics since the original at Athens. The competitors are often fiery tempered, and the victors declared among the fastest men on Earth. The millions who watch the 100 meter are captured, ultimately, by some 10 seconds worth of action. Actually, 9.79 seconds, considering Maurcie Greene's current world record. But what would Mr. Greene say if we told him that a Perl subroutine can split and sort a sentence into an alphabetically ordered list of letters 98,253 times in that same 9.79 seconds? What do you have to say for yourself now, Mr. Fast Pants? Well, he'd most likely pummel our soft and fleshy behinds, and those of anyone who knows this much about Perl. But it's still true. And we know it's true because of the Benchmark module - our handy Perl stopwatch with which we can time, optimize, and slim down on code. I think Mr. Greene would appreciate that. After the pummeling.

Quality is Job Number One

It's been said that there are a thousand ways to solve any single goal in Perl. Indeed, Perl is a very flexible language, liberating our quirky brains to find solutions peculiar to our individual personalities, although a thousand may be on the optimistic side. But let's not be fooled into complacency by the tolerance and open-minded nature of the cult of Perl -- not all solutions are created equal, especially when it comes to execution time.

Interpreted languages like Perl are often looked down upon by speed demons, who prefer the endorphine highs of compiled languages or the extremists spending their days inhaling assembly language and machine code. Despite their sneers, it's quite easy to conjure up several Perl subroutines all of which solve the same problem, and do so in wildly varying execution times. This may not matter for the script that is only run once, or on special occasions -- but with Perl backing many Web servers, it's entirely possible for Perl scripts to be executed millions of times a day. Like picking pennies from the floor, it all adds up over time.

Perl's Benchmark module is an extremely useful tool for measuring the speed of your scripts, whether in their entirety, or down to the level of particular subroutines, or single lines of code. While some may want to comb over a script to optimize the speed of every expression, benchmarking is also incredibly helpful in finding significant bottlenecks in a script - segments of code that eat up the majority of processing time. Often, focusing on optimizing one or two main bottlenecks can improve an entire script's execution time dramatically.

Out of the Starting Block

The Benchmark module provides a number of tools for timing and comparing the execution time of code. Depending how you use these tools, you can time single statements, entire subroutines, or the entire script. Often, you'll want to time all of these depending on the script and the course of your investigation. For starters, let's just say that you want to time a simple statement. Perhaps you have some variations in mind, and are wondering which would be fastest.

Imagine a string, perhaps a filename, and you want to extract the filename extension - defined here as anything that occurs to the right of the decimal; e.g. the string may be "filename.txt" and we want the "txt" portion.

The first solution that comes to mind uses a regular expression pattern match, such as:

In the above example, the desired result will appear in the special variable , since it is the first (and only) match possible from this regular expression. Then your mind starts to wander ... what if you lopped off the righthand portion using a split rather than a regular expression? What if you used a substr rather than a split? What if you get pizza tonight and swear that you'll cook something from scratch tomorrow?

Enter the Benchmark. Using the Benchmark module's timethis function, we can measure the execution time of one of these solutions.

        use Benchmark;
Same code as before, but this time supplied to the timethis function. The first parameter supplied tells Benchmark how many times to run this code. The more times it runs the code, the more accurate a measurement is revealed. In fact, if we run the line above, Benchmark will warn sternly: too few iterations for a reliable count. Indeed, we should supply a higher number of run-throughs. Nice and high, like 500000 -- that's right, five hundred thousand!
Once done, Benchmark reports:
        timethis 500000: 1 wallclock secs( 1.19 usr + 0.01
         sys = 1.20 CPU) @ 415973.38/s (n=500000)
The results? Over 500,000 iterations, this snippet of code runs at a rate of 415,973.38 times per second. Of course, this number may vary if you try this test on your own machine, depending on its resources, the version of Perl being used, and so on. These particular examples were generated on a 900Mhz Athlon Thunderbird running ActivePerl 5.6.0 under Windows 2000. Rather then get hung up on the execution time taken on its own, what we're really interested in is how this code fares relative to the alternative solutions.

Another way to grab the filename extension is using split on the decimal, and grabbing the second item of the resulting array:

        =@{[split /\./,]}->[1];
Let's time it!
        use Benchmark;
          =@{[split /\./,]}->[1]');
The timesheet says:
        timethis 500000: 3 wallclock secs ( 2.59 usr + 0.00
         sys = 2.59 CPU) @ 192752.51/s (n=500000)
A fascinating result - between the tortoise and the hare, the split function was clearly the tortoise. Over 500,000 iterations, this solution only clocked in at 192,752.51 times per second: less than half the speed of the regular expression!

One candidate remains on our brain, the idea of using the substr function to grab the text in question. Such a solution might look like:
        =substr (,index(,qq/./)+1);
In this approach, we use index to find the location of the decimal in the string, and then grab the text starting one place to the right of the decimal, through to the end of the string. Again, to the Benchmark:
        use Benchmark;
         =substr (,index(,qq/./)+1)');
        timethis 500000: 0 wallclock secs ( 0.65 usr + 0.00
         sys = 0.65 CPU) @ 768049.16/s (n=500000)
Whoa, nelly! Indeed, the substr solution whooped the regular expression at nearly twice the speed, and almost four times the speed of the split function. In all three of our cases, the code was itself was quite simple. But exactly how Perl implements these different functions internally - regular expressions, split, and substr - varies. While we shouldn't draw across-the-board conclusions from this test, we can draw the conclusion that for this particular case, the substr function is much faster than the alternatives as measured across 500,000 iterations. Also, ordering pizza tonight is probably a good idea, no matter how long it takes.

Apples to Apples, Oranges to Oranges

In learning to use Benchmark's timethis function, we wound up comparing the speed of several solutions. In fact, the Benchmark module provides a ready-made comparison function that can make measurements of several solutions even simpler. This time, let's imagine a case where we want to code a subroutine that will accept a string value, and return a string value consisting of the character codes for each character in the original string, separated by spaces.

For example, the string "A" would, when passed through this subroutine, be returned merely as "65". The string "AB" would be returned as "65 66", and so on.

Once again, thanks to the versatility of Perl, we can let our imagination run wild - but not too wild. Just wild enough, for now, to generate three plausible subroutines to solve this case.

The first, wordOrd_a, splits the incoming string into a list of letters, iterates through that list, pushing the character code for each letter (via the ord function) onto a new list. Finally, it returns a scalar by joining the elements of the new list.


         sub wordOrd_a {
          my ()=@_;
          my @ordStr=();
          foreach my  (split //,) {
           push (@ordStr,ord());
          return join(" ",@ordStr);
The second candidate, wordOrd_b, is nearly identical to the first candidate, but substitutes the Perl map function for the foreach loop that we relied on in the first attempt.
wordOrd_b sub wordOrd_b { my ()=@_; my @ordStr=(); @ordStr=map { ord sh } split //,; return join (" ",@ordStr); } The third candidate is super-compact, doing away with the split entirely and instead using Perl's unpack method in a technique gleaned gleefully from the ord synopsis on page 114 of Perl in a Nutshell (O'Reilly & Associates, 1998). We also dropped the interim variable and instead refer to the first item in the incoming parameter list directly.

         sub wordOrd_c {
                  return join (" ",unpack('C*',sh[0]));
Now, for the race. With all three subroutines revved up and raring to go, we can call upon Benchmark's cmpthese function to test and compare the execution times of the three candidates, resulting in a revealing summary table.
         use Benchmark;

         my =500000;
         my ="green apples and yellow oranges, oh my";
             wordOrd_a => q/my =&wordOrd_a()/,
             wordOrd_b => q/my =&wordOrd_b()/,
             wordOrd_c => q/my =&wordOrd_c()/,
The computer begins to whir, steam, and belch, and in an exhausted heap dutifully reports its tally:
         Benchmark: timing 500000 iterations of wordOrd_a,
                 wordOrd_b, wordOrd_c...

          wordOrd_a:  4 wallclock secs ( 3.55 usr +
                 0.00 sys =  3.55 CPU) @ 141003.95/s
          wordOrd_b:  2 wallclock secs ( 2.54 usr +
                 0.00 sys =  2.54 CPU) @ 196540.88/s
          wordOrd_c:  2 wallclock secs ( 1.64 usr +
                 0.00 sys =  1.64 CPU) @ 304506.70/s
                       Rate wordOrd_a wordOrd_b wordOrd_c
         wordOrd_a 141004/s        --      -28%      -54%
         wordOrd_b 196541/s       39%        --      -35%
         wordOrd_c 304507/s      116%       55%        --
The winner, not surprisingly, is the code that appeared most efficient: version c, the unpack solution. Indeed, from the numbers we can see that wordOrd_c ran 304,506.70 times per second, quite a bit faster than the closest competitor in wordOrd_b. The nifty table provided by cmpthese displays the results in startling percentages: wordOrd_c was 116% faster than wordOrd_a and 55% faster than wordOrd_b.

Once again, we see that there can be significant differences between approaches to the same problem in Perl - hopefully compelling enough numbers to convince the Web developer to take benchmarking seriously. Visitors to a Web site never like to wait, no matter how much magical hoo-hah the back-end script is performing. Given the causes of delay that are difficult to control, such as network traffic and latency, and the flexibility of Perl, it makes that much more sense to optimize the slowest portions of your own code.

Run-time Timing

While the benchmarking examples we've seen so far are compelling, they've also been measured under laboratory conditions - in this case, over many interations, outside of the context of a larger script. As well, if you already have a large script and want to start benchmarking, it may not be feasible to start coding alternative subroutines all over the place.

Sometimes, a simple stopwatch approach is enough to draw your focus to problem areas (no, not those problem areas). Using Benchmark, we can simulate a stopwatch where we simply want to trigger timing at a certain point in a script, and then stop timing, and spit out the results, without stopping the flow of the script.

So, imagine a back-end script that outputs some results to the browser. Maybe we simply want to time the execution of the entire script.

        #This is an amazing back-end script

        use Benchmark;
        use Time::HiRes;
        use LotsOfOtherStuff;

        my  = new Benchmark;

        ...all of the control code for the script...

        my  = new Benchmark;
        print "Script Benchmark: ".timestr(timediff(,));
Notice the use of Time::HiRes, which can help with reporting benchmark times for fast scripts, where small fractions of a second should be measurable. Like a stopwatch, we simply take note of the time at the start of the script, and again at the end, and report the difference. Assuming this is a CGI script the results will probably be printed to the Web page (also assuming proper headers, etc.), but you might alternatively wish to open a filehandle to a log file and send the print function to the log file.

Timing the entire script execution like this can be enlightening, but should also be interpreted with caution. For one, remember that this timer measures only the execution time of the script, not the time it takes for its output to reach your browser. Network performance is not being measured here. Web server performance is also not generally being measured, although in some cases -- such as Apache with mod_perl - these numbers are partially reflecting the Web server's performance at executing Perl code.

To get more use out of the stopwatch technique, consider adding several stopwatches to your script, wrapped around different portions of code. This way, you can get inside the machine's head and see a behind-the-scenes peek at which portions of code are taking how long to execute. A particularly effective technique is to wrap each subroutine inside a stopwatch. Simply:

        sub mySubroutine {
         my  = new Benchmark;
         ...original subroutine code...
         my  = new Benchmark;
         print "mySubroutine benchmark: ".

Due to the scoping of my variables, you can re-use and inside each subroutine without conflict. Remember to modify the label for each benchmark so that you can recognize which subroutine the measurement applies to. When the script runs, for instance in a Web environment, the resulting Web page may then contain a rundown for each subroutine and its execution time. Because this technique does not perform many iterations, you should invoke the script a number of times to get a sense of consistency from the measurements.

Ultimately, the goal of adding stopwatches to your script is to find any abnormally slow subroutines or sections of code. This quick diagnosis can point you to trouble spots quickly. In the long run, focusing on the unusually slow sections of code will yield greater efficiency than attempting to benchmark alternatives for every routine in the script.

Troubleshooting in Context

Finding a slow segment of code is one thing - identifying the hold-up, and a solution, can be quite another. As we saw in earlier examples, there are many times that the Perl language itself can be the source of slow performance. We've seen that there can be a variety of ways to address a problem, using Perl, and some are much faster than others. As a general rule of thumb, Perl will perform faster when you rely on built-in functions; for example, we saw that the map function performed much faster than building a foreach loop of our own. The lesson here is "don't re-invent the wheel" - let Perl do the work, whenever possible.

Equally important to identifying slow Perl syntax, though, are considerations beyond the language itself. For example, if a particular subroutine is accessing data, consider where the data is stored. Data retrieved from disk will likely be slower than data retrieved from memory. Beyond that, the retrieval time of data on disk can vary considerably, depending on whether the data is stored in a simple, slow database or a fast, indexed database. In any case, the delay in retrieving data will inflate the execution time of such a subroutine, without being the fault of the Perl code itself, but of the larger architecture.

Similarly, when a Perl subroutine retrieves data from a database, does it establish a connection to the database on each invocation, or can it rely on an existing connection already established (known as "persistent connections")? Again, this bottleneck can inflate the execution time of such a segment of code.

Portions of code that make network requests are going to be subject to all of the delays inherent in networks. Once again, reducing these bottlenecks may not involve changing the Perl code at all, but changing the architecture of the system, whether that includes other pieces of software or the hardware itself. Ultimately, the point is that measuring execution time of code is important, but intepreting it in context is even more important: the code itself may, or may not be, to blame for a bottleneck.

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved