Apache keeps extensive track of your server usage via logfiles. In this article, Rich Bowen discusses logfiles and how you can get more useful information from them.
Apache comes with built-in mechanisms for logging activity on
your server. In this series of articles, I'll talk about
the standard way that Apache writes log files, and some of
the tricks for getting more useful information and statistics
out of your server.
This week we'll talk about the information that appears in
your transfer log, and what it all means.
If you have done a default installation of Apache, when you
run your server, two log files will get written. These files
access.log on Windows) and
error.log on Windows). These files can be
found (again, if you did a default installation) in
/usr/local/apache/logs. On Windows, the logs will be in
logs subdirectory of wherever you installed Apache.
Various of the package managers put the log files in various
other places, and you'll have to poke around to find them,
or check in the configuration file for the configured location.
access_log is, as the name suggests, the log of all
accesses to your server. Typical entries in this file look like:
188.8.131.52 - - [19/Aug/2000:14:47:37 -0400] "GET / HTTP/1.0" 200 654
This line contains 7 pieces of information. Actually, two of them
are blank in this example, but there is space for 7 pieces of
The first piece of information is the address of the remote host.
That is, who is looking at your web site. In the example above,
the host visiting my web site is
184.108.40.206, which is,
incidentally, the IP address of the machine called
si3001.inktomi.com. (I figured that out by looking up the
address in DNS, with the
a company that makes web searching software. (I looked at their
web site.) Since this same IP address requested the file
robots.txt just a few seconds earlier, I suspect that this
is a web searching spider that was indexing my web site. I'll
talk about spiders in another column. So, just based on that
first piece of information, and a glance back in the log file,
I've already found out quite a bit of information about my visitors.
By default, this address is just the IP address of the remote
host. You can tell Apache to look up all the host names, and
put those host names in the log instead of the IP address. This is
probably not a good idea, since it greatly slows down the logging process,
and so slows down your entire server. And there are various tools
that will go through your log after the fact, and resolve all the IP
addresses to host names, so there's no real advantage to doing this
But, if you want to, you can tell Apache to do these lookups with
double, rather than
on, will cause
the logging process to do a reverse lookup on the name that it finds,
to verify that it points back to the IP address that you started with.
The value is set to
off by default.
The second slot, alas, is blank, and almost always will be. That's what that ''-''
is: a place-holder for the second piece of information. That is the
location where you're supposed to get the identity
of the visitor.
That's not just their login name, but their email address, or other
unique identifier. This information is supposed to be returned by
, or directly by the browser. And in the old days, back when
Netscape 0.9 was the dominant browser, you would usually have email
addresses in this spot. However, it did not take long for unsavory
marketing types to think that it would be a good idea to collect those
email addresses and send them unsolicited email (also known as spam).
So, before very long, this feature was removed from just about every
browser on the market. You will almost never find information in this
The third piece of information is also blank. The information
that would appear there is the username with which the visitor
authenticated. This will appear, of course, only when you have
required authentication for a particular resource. So for the majority
of entries in your log file, for most sites, this will be blank.
Next we have the time when the request was made. This information
is enclosed in square brackets, and is in what is called ''common
log format'', or ''standard english format.'' So the request in the above
example was made at 14:47:37 on Saturday, August 19. The
the end of the field means that the server is in the time zone
4 hours before UTC. This tells you
two things. One, that I tend to leave my column until the last
minute, and two, that I appear to have the wrong time-zone set
on my server. I'll have to make a note to take care of that ...
The next piece of information is probably the most useful piece
of information in the record. It tells what request was actually made
of the server. This is typically in the format
METHOD RESOURCE PROTOCOL.
In the example above, the
GET. The other most common methods
HEAD. There are a number of other valid methods, but those
three are what you will see most of the time.
RESOURCE is the actual
document, or URL, that was requested from the server. In this example,
the client requested ''/'', which is the root, or front page,
of the server. In most configurations, this corresponds to the
index.html in the
DocumentRoot directory, but could
be something else, depending on your server configuration.
PROTOCOL is usually going to be
HTTP, followed by a version
number. The version number will be either
1.1, with most
of the records being
1.0 As you probably know from other articles,
HTTP is the protocol that makes the web work. HTTP/1.0 was the earlier
version of this protocol, and 1.1 was the more recent version. However,
most web clients still speak version 1.0.
The sixth piece of information is a status code. This tells you whether
the request was successful, or encountered some problem. Most of the time,
200, which means that the transfer was successful, and everything
went well. Hopefully. I'm not going to give the whole list of the status
codes, and what they mean. You need to look in the documentation for that.
But, in general, a status code that starts with 2 was successful. Starting
with a 3 means that the request was redirected somewhere else for some
reason. Starting with a 4 means that the user did something wrong, and
starting with a 5 means that the server did something wrong.
The seventh and final piece of information is the total number of bytes
that were transferred to the client. This can tell you if a transfer
was interrupted (if the number is different from the size of the file).
Adding them up will tell you how much data your server transferred in a
day, or week, or whatever.
access_log is located is actually a configuration option.
If you look in your configuration file,
httpd.conf, you should see
a line that looks like the following:
CustomLog /usr/local/apache/logs/access_log common
Note: If you're running an older version of Apache, this line might
look a little different. It might be the
TransferLog directive instead
CustomLog directive. If that is the case, I really recommend
that you upgrade if at all possible.
CustomLog directive specifies where a particular log file should
be stored, and what format that log should be in. Next week we'll talk
about custom log formats. The log format described above is the
log format, which has been in use as the standard since the beginning of
web servers. That's why it still contains the ident information field,
even though almost no clients actually pass that information to the
The path specified there is the location of the log file. Note that this
location should be secured against random users writing to it, since the
log file is opened by the HTTP user (specified with the
and so this is potentially a security problem.
In my next few articles, I'll be talking about the following subjects: Custom
log format. Logging to a process, rather than to a file. The error log.
Getting useful statistics out of your log files. And whatever else you
fine readers suggest to me.
Thanks for reading. Please send me a note at ApacheToday@rcbowen.com if
you have any suggestions or comments.
Want to discuss log files with other Apache Today readers? Then check out the PHP discussion at Apache Today Discussions.