Session Tracking With Apache

Wednesday May 11th 2005 by Martin Brown
Share:

Using HTTP logs to track the users who visit your site isn't all it could be. We look at mod_usertrack, an Apache module that uses cookies to identify a user's visit and records access to the site, as well as discuss solutions for dynamic sites.

Using HTTP logs to track the users who visit your site isn't always as useful as you think it's going to be. While metrics, like the total number of page hits and, within that, page hits over time or from a specific IP address, easily identify, they don't always tell how people are viewing your site or answer specific questions the marketing department may pose.

This article looks at how to track progress through a site using an Apache module and provides answers to some of the more complex marketing-led questions that may be posed.

Tracking Progress

Sometimes when monitoring site usage you want to ask more detailed questions than "how many users" or "how many hits." You may, for example, want to determine if users are visiting your new product pages because they saw an advertisement on the home page or because they've been sent a link directly to it. Also, with most companies using firewalls, Network Address Translation (NAT), and proxy servers, identifying individual users can be impossible. Actually "following" a user as he works through the site is difficult.

Table of Contents
Tracking Progress
Adding mod_usertrack to Your Apache Installation
Enabling Tracking
Choosing an Expiry Time
Configuring the Log
Analyzing the Results
Dynamic Solutions
The Performance Trade Off

If you look at a typical log, it's easy to determine general information, such as page hits, but it's difficult to determine what an individual user might have been doing. Although you can configure the logs to include an IP address, this is neither as unique nor as easy to use as it sounds. Many users access the Internet through a NAT or proxy solution, which means hundreds, even thousands of users are hidden behind a single IP address.

Even if you can identify a single user as using a single IP address, tracking what she does — beyond simply viewing different pages — is more difficult, especially if the site uses a combination of static and dynamic elements or is entirely dynamic.

One solution to this is mod_usertrack, which uses cookies to identify a user's visit and then records access to the site identified by this unique ID. When the user first visits the site, she is sent a cookie with a unique ID, which is kept (until a predetermined timeout is reached) so you can track an individual's usage on your site. It's not a complete solution, but it enables the identification of individual users, even if they appear to come from the same IP address.

Like other modules, mod_usertrack is configured through a series of directives. The first step however, is confirming mod_usertrack is in your Apache installation.

>> Adding mod_usertrack, Enabling Tracking

Adding mod_usertrack to Your Apache Installation

The mod_usertrack module is not part of the standard Apache installation, so to use it you must first add the module to your installation. If you are using a dynamic version of Apache, reconfigure your Apache installation (using configure) and add the --enable-usertrack=shared to the configuration command line. Then, rebuild Apache and copy the module to the modules directory within the installed Apache directory. Finally, edit your httpd.conf file to load the new module:

Table of Contents
Tracking Progress
Adding mod_usertrack to Your Apache Installation
Enabling Tracking
Choosing an Expiry Time
Configuring the Log
Analyzing the Results
Dynamic Solutions
The Performance Trade Off

LoadModule usertrack_module modules/mod_usertrack.so

For a static installation, rebuild your Apache executable (again using configure, but after that use the --enable-usertrack option). Because the module is built into the new executable, you don't need to specifically load the module through the configuration file.

With both solutions, be sure to restart Apache, which is best done with apachectl:

$ apachectl restart

Once everything is up and running, you should be able to go ahead with the next section of the configuration. To double check that the new module is correctly installed in the Apache installation, use the -L option (which lists available directives) to httpd and look for the cookie-tracking directive:

./bin/httpd -L|grep CookieTracking
CookieTracking (mod_usertrack.c)

Enabling Tracking

Tracking is turned on both globally and on a per-server or directory basis by adding the appropriate directives to the configuration file. The main directive is CookieTracking, which accepts one parameter either "on" or "off." The default is off.

The mod_usertrack module uses cookies, and for security, cookies are configured to be active only within a limited domain — this limitation means cookies are provided only to servers within that domain. For example, if you configure a cookie to exist within mcslp.com, your browser will send the cookie only to hosts within that domain. If you are using a single Web server, you'll probably want to specify only that host's name. For a site that employs multiple hosts, perhaps hiding behind the same host address, you must explicitly state the cookie's domain (and remember to specify a leading period to indicate hosts within the domain, rather than a single host). The CookieDomain directive is used to define the domain in which the cookie should be activated; the default is to use just the current host.

The CookieName directive configures the name of the cookie that will be stored. Because multiple cookies can be active for a given domain, a range of names is necessary to enable identification. In general, the name used should be as descriptive as possible (e.g., MCSLP Tracker).

The final configuration for a directory will look something like:

<Directory "/export/http/webs/com.mcslp">

CookieTracking on
CookieName "MCSLP Intranet"
CookieDomain .mcslp.com
                
</Directory>

>> Expiry Times, Configuration, Analysis

Choosing an Expiry Time

The final directive you may want to configure is the expiry time for the cookie. The expiry time helps determine the difference between "sessions" of the same user. As long as the user keeps using the Web site, the session will be updated, but when he leaves the site, the cookie will eventually expire and a new one (and corresponding session ID) will be generated on his next visit the site.

You must therefore configure the expiry time to enable a user who visits the site to take breaks, if necessary, while he answers the phone or attends a short meeting. Choose a timeout period suitable to identify the difference between visits to your site. For example, for a news or shopping site, a timeout of an hour would probably suffice, but an intranet site intended to be used all day might benefit from a 12-hour timeout.

Table of Contents
Tracking Progress
Adding mod_usertrack to Your Apache Installation
Enabling Tracking
Choosing an Expiry Time
Configuring the Log
Analyzing the Results
Dynamic Solutions
The Performance Trade Off

The default option is for the cookie to remain active until the current browser session is terminated (i.e., when the user quits the application). Alternatively, you can set a specific time either in seconds or by using an explicit declaration in years, months, weeks, days, hours, and minutes (e.g., "2 weeks 7 days 9 hours"). If you use this format, be sure the directive parameter is enclosed in double quotes.

The CookieExpires directive configures this option. To set a timeout in seconds, specify the number of seconds as an integer:

CookieExpires 3600

The above sets an expiry time of an hour.

Configuring the Log

The final configuration directive is where the information will be logged. Our preference is to record this information as part of the standard access log. The %{Cookie}n log field contains the cookie information (hostname and cookie ID), which can be added to the standard common log format definition:

LogFormat "%h %l %u %t \"%r\" %>s %b %{Cookie}n" common

We recommend putting the cookie information at the start of the log, rather than the end:

LogFormat "%{Cookie}n %h %l %u %t \"%r\" %>s %b " common

Alternatively, you can configure a customized log for cookie data by putting a CustomLog directive in the site definition:

CustomLog logs/clickstream "%{cookie}n \"%r\" %t"

Analyzing the Results

Once the user track information is configured, try using the site and viewing the log. The log below is a standard access log with the cookie detail added.

polarbear.mcslp.com.1106159183940914 polarbear.mcslp.com - - [19/Jan/2005:18:26:24 +0000] "GET /mcslp.css HTTP/1.1" 
304 -
polarbear.mcslp.com.1106159183940914 polarbear.mcslp.com - - [19/Jan/2005:18:26:23 +0000] "GET / HTTP/1.1" 200 7970
polarbear.mcslp.com.1106159183940914 polarbear.mcslp.com - - [19/Jan/2005:18:26:24 +0000] "GET /weather/images/symbo
l_10.gif HTTP/1.1" 200 2019
polarbear.mcslp.com.1106159183940914 polarbear.mcslp.com - - [19/Jan/2005:18:26:24 +0000] "GET /weather/images/symbo
l_24.gif HTTP/1.1" 304 -
polarbear.mcslp.com.1106159183940914 polarbear.mcslp.com - - [19/Jan/2005:18:26:26 +0000] "GET /contacts/ HTTP/1.1" 
        200 14671

It's easy pull out the necessary information from the log to identify which pages this individual has been viewing — we now have a suitable tag (the unique cookie ID number) to start summarizing and reporting information. The basics of log reporting have been covered in previous articles.However, at its most simple, extract the cookie ID (as shown here), and then use it as another key for summarizing log data.

>> Dynamic Solutions, Performance Trade Off

Dynamic Solutions

If you are developing a dynamic site (i.e., using PHP, Perl, or Java), then the mod_usertrack module will not be very useful. This is because much of the site's functionality may be embedded into the dynamic components. This can make the limited output mod_usertrack provides useless because the form elements and components are not stored in the logs.

Also, it's highly likely that your dynamic solution may already be using some kind of user session tracking tool that can help provide the sought-after information. The fewer cookies supplied to a client, the less likely users will be over-conscious of the information stored, and will thus be more likely to use the site.

Table of Contents
Tracking Progress
Adding mod_usertrack to Your Apache Installation
Enabling Tracking
Choosing an Expiry Time
Configuring the Log
Analyzing the Results
Dynamic Solutions
The Performance Trade Off

How information is logged and tracked is entirely up to you, but for the most part it depends on the type of site created and how the information is monitored. For simple monitoring, we recommend a very simple log that generates simple sentences like this (obviously with each session ID appended to each line):

User searched for books with 'Apache' in the title
User viewed the book 'Pro Apache' User added the book 'Pro Apache' 
   to their basket
User viewed basket

Although simplistic, it has the benefit of being very readable, while not permitting easy summary processing. For example, when using this method it is difficult to see how many people searched for books with 'Apache' without doing some more extensive text processing. When that level of detail is needed, write the information to a database in a rigid format, like so:

Tracking More Data

SessionID Operation Parameter ID
98374 Search Apache  
98374 View   3475
98374 Buy   3475
98374 ViewBasket    

As the contents of this table illustrate, it would be possible to obtain the number of times a page for a particular book was viewed and the book was purchased, and how frequently various search terms were used. This is how, for example, Amazon.com is able to provide a user with 'recommended' titles.

The Performance Trade Off

Regardless of whether you use mod_usertrack or a system embedded into your dynamically driven site, there will be some performance trade offs. The trade off comes at a number of levels, but just the act of exchanging cookie data and looking up cookie information between browser requests is time consuming, not to mention the overhead of writing the information to a log file or database.

This will present an issue (if only a small one), however you look at it, and deciding whether to go ahead will rely entirely on how much value there is in the information and how much of a trade off you are willing to make between the value of that information and the effect on your site.

Generally, for dynamic sites, the extra information gained — especially if that information is used to help guide the user to other products or items he may be interested in — is significant. Most dynamic sites are already reading and writing to databases or files, so one more will not hurt.

Share:
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved