Apache Guide: Logging with Apache--Understanding Your access_log

Monday Aug 21st 2000 by Rich Bowen
Share:

Apache keeps extensive track of your server usage via logfiles. In this article, Rich Bowen discusses logfiles and how you can get more useful information from them.

Apache comes with built-in mechanisms for logging activity on your server. In this series of articles, I'll talk about the standard way that Apache writes log files, and some of the tricks for getting more useful information and statistics out of your server.

This week we'll talk about the information that appears in your transfer log, and what it all means.

The standard log files

If you have done a default installation of Apache, when you run your server, two log files will get written. These files are called access_log (access.log on Windows) and error_log (error.log on Windows). These files can be found (again, if you did a default installation) in /usr/local/apache/logs. On Windows, the logs will be in the logs subdirectory of wherever you installed Apache. Various of the package managers put the log files in various other places, and you'll have to poke around to find them, or check in the configuration file for the configured location.

access_log

access_log is, as the name suggests, the log of all accesses to your server. Typical entries in this file look like:

        216.35.116.91 - - [19/Aug/2000:14:47:37 -0400] "GET / HTTP/1.0" 200 654

This line contains 7 pieces of information. Actually, two of them are blank in this example, but there is space for 7 pieces of information.

The first piece of information is the address of the remote host. That is, who is looking at your web site. In the example above, the host visiting my web site is 216.35.116.91, which is, incidentally, the IP address of the machine called si3001.inktomi.com. (I figured that out by looking up the address in DNS, with the nslookup utility.) inktomi.com is a company that makes web searching software. (I looked at their web site.) Since this same IP address requested the file robots.txt just a few seconds earlier, I suspect that this is a web searching spider that was indexing my web site. I'll talk about spiders in another column. So, just based on that first piece of information, and a glance back in the log file, I've already found out quite a bit of information about my visitors.

By default, this address is just the IP address of the remote host. You can tell Apache to look up all the host names, and put those host names in the log instead of the IP address. This is probably not a good idea, since it greatly slows down the logging process, and so slows down your entire server. And there are various tools that will go through your log after the fact, and resolve all the IP addresses to host names, so there's no real advantage to doing this anyway.

But, if you want to, you can tell Apache to do these lookups with the directive:


        HostNameLookups on

Setting HostNameLookups to double, rather than on, will cause the logging process to do a reverse lookup on the name that it finds, to verify that it points back to the IP address that you started with. The value is set to off by default.

The second slot, alas, is blank, and almost always will be. That's what that ''-'' is: a place-holder for the second piece of information. That is the location where you're supposed to get the identity of the visitor. That's not just their login name, but their email address, or other unique identifier. This information is supposed to be returned by identd, or directly by the browser. And in the old days, back when Netscape 0.9 was the dominant browser, you would usually have email addresses in this spot. However, it did not take long for unsavory marketing types to think that it would be a good idea to collect those email addresses and send them unsolicited email (also known as spam). So, before very long, this feature was removed from just about every browser on the market. You will almost never find information in this field.

The third piece of information is also blank. The information that would appear there is the username with which the visitor authenticated. This will appear, of course, only when you have required authentication for a particular resource. So for the majority of entries in your log file, for most sites, this will be blank.

Next we have the time when the request was made. This information is enclosed in square brackets, and is in what is called ''common log format'', or ''standard english format.'' So the request in the above example was made at 14:47:37 on Saturday, August 19. The -0400 pn the end of the field means that the server is in the time zone 4 hours before UTC. This tells you two things. One, that I tend to leave my column until the last minute, and two, that I appear to have the wrong time-zone set on my server. I'll have to make a note to take care of that ...

The next piece of information is probably the most useful piece of information in the record. It tells what request was actually made of the server. This is typically in the format METHOD RESOURCE PROTOCOL.

In the example above, the METHOD is GET. The other most common methods will be POST and HEAD. There are a number of other valid methods, but those three are what you will see most of the time.

The RESOURCE is the actual document, or URL, that was requested from the server. In this example, the client requested ''/'', which is the root, or front page, of the server. In most configurations, this corresponds to the file index.html in the DocumentRoot directory, but could be something else, depending on your server configuration.

The PROTOCOL is usually going to be HTTP, followed by a version number. The version number will be either 1.0 or 1.1, with most of the records being 1.0 As you probably know from other articles, HTTP is the protocol that makes the web work. HTTP/1.0 was the earlier version of this protocol, and 1.1 was the more recent version. However, most web clients still speak version 1.0.

The sixth piece of information is a status code. This tells you whether the request was successful, or encountered some problem. Most of the time, this is 200, which means that the transfer was successful, and everything went well. Hopefully. I'm not going to give the whole list of the status codes, and what they mean. You need to look in the documentation for that. But, in general, a status code that starts with 2 was successful. Starting with a 3 means that the request was redirected somewhere else for some reason. Starting with a 4 means that the user did something wrong, and starting with a 5 means that the server did something wrong.

The seventh and final piece of information is the total number of bytes that were transferred to the client. This can tell you if a transfer was interrupted (if the number is different from the size of the file). Adding them up will tell you how much data your server transferred in a day, or week, or whatever.

Setting the location of your access_log

Where the access_log is located is actually a configuration option. If you look in your configuration file, httpd.conf, you should see a line that looks like the following:


        CustomLog /usr/local/apache/logs/access_log common

Note: If you're running an older version of Apache, this line might look a little different. It might be the TransferLog directive instead of the CustomLog directive. If that is the case, I really recommend that you upgrade if at all possible.

The CustomLog directive specifies where a particular log file should be stored, and what format that log should be in. Next week we'll talk about custom log formats. The log format described above is the common log format, which has been in use as the standard since the beginning of web servers. That's why it still contains the ident information field, even though almost no clients actually pass that information to the server.

The path specified there is the location of the log file. Note that this location should be secured against random users writing to it, since the log file is opened by the HTTP user (specified with the User directive), and so this is potentially a security problem.

Upcoming articles

In my next few articles, I'll be talking about the following subjects: Custom log format. Logging to a process, rather than to a file. The error log. Getting useful statistics out of your log files. And whatever else you fine readers suggest to me.

Thanks for reading. Please send me a note at ApacheToday@rcbowen.com if you have any suggestions or comments.

Want to discuss log files with other Apache Today readers? Then check out the PHP discussion at Apache Today Discussions.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved