Apache Session Management Within Dynamic Sites

Thursday Mar 2nd 2006 by Martin Brown
Share:

Apache, combined with Perl and PHP, is one option for managing sessions involving user identification on dynamic Web sites.

As Web sites get more complicated and more dynamic, developers want to give users a more cohesive environment. This cohesion can provide all sorts of functionality, from a simple method of tracking a shopping basket to providing full-blown customization of stories, templates, and information shown to users as they use the Web site. The key to this system is the session — a unique identifier that enables developers to identify users, either for relatively short periods (e.g., in shopping baskets) or longer (full customization).

In a previous article "Session Tracking with Apache," we described how to use cookies and the sessions system within Apache to track user access for the purposes of monitoring site usage in the logs and recording which pages were viewed. We can adapt the same basic principles — primarily cookies — through programmable components, such as Perl and PHP to provide customized Web sites.

This article will look at how Apache can help with session management and how that information can be used with Perl and PHP scripts.

Defining a Session

A session is typically defined as a single visit to a Web site where one might conduct one or more transactions, but that is not persistent. For example, a session might be used to track a user's progress through a store and record his shopping basket. At the end of the process, he buys the goods, and that is typically the end of the 'session'.

Two facts can be expanded from this. First, the session is typically unique to a particular visit. If the same user visited the store again, he would get a new session ID and therefore a new shopping basket. Second, because we cannot tell whether a user has actually stopped using our Web site, we must use a timeout value to identify when the user is no longer visiting the site. The value of the timeout is important in that it must be long enough that users can continue with their session, even if they are interrupted, but short enough that a new session is triggered when they visit later in the day or week.

However, we can also use the principles of a session to enable a user to visit a site and get a customized or personalized environment without requiring users to log in or set their preferences again. In this case, the persistence of the session becomes a way of identifying the user, and the timeout value for the session may be a longer period (such as weeks or months), to enable the user to re-visit the site without having to login again and re-create her session.

Whether you use the principles of the session to track single visits (often suitable for simple e-commerce sites) or longer-term identification of users (ideal on larger well-used e-commerce and community sites that are visited regularly), the fundamentals of a session system are actually quite straightforward.

Session Fundamentals

There are two elements to a session. The first is the requirement that the browser supply a unique (but consistent) session identity when it accesses objects from the Web server. It must be unique to identify the user or his session, and consistent so that individual object requests use the same ID. The second is that the session is used to store information about the user (e.g., her shopping basket or site preferences). The former requires interaction from the browser (which we can control). The latter requires programming and storage on the Web server to control the content returned to the browser.

For communicating a unique session ID, two mechanisms are available: cookies and URL rewriting. This article will concentrate on the former, as it is the most practical, but URL rewriting has its benefits and advantages. PHP includes built-in support for cookies and the retention information across object accesses. Within Perl and Python, standard and third-party modules make using sessions and retaining session data much easier.

Using Cookies

Cookies write the session ID information into the user's browser cookie database; this cookie is supplied automatically to the site when the user accesses a page. Cookies are a practical way of sharing information between page views without requiring complicated scripting to embed session information into the URL. The browser automatically supplies cookies, regardless of whether you are using dynamic or static components. As a result, they can be used with combination sites without having to worry about how to exchange the information.

A cookie is generally used to store a single piece of information, for example the session ID or a shopping basket item. Each cookie is specific to a site, optionally to a path within the site, and each has a specific name. You can therefore create multiple cookies to store multiple pieces of information.

Some users are wary of enabling cookies, mostly because of some early bad press and problems with the implementations that allowed cookie information to be read by different sites. Today, cookie security is much tighter.

Cookies are secure because of a number of parameters that should be defined when creating the cookie. The three elements of a cookie control its duration, the site or domain on which it is valid, and the path where it is valid. The effect of the different settings controls how and where the information in a cookie is made available.

>> More Session Fundamentals

Cookie Lifetimes

Each cookie is given a lifetime; when the period expires, the browser deletes the cookie. Time periods can be set literally (i.e., a specific date/time), duration (number of seconds from the time the cookie is set), or simply to exist only during the current browser session (the cookie is deleted once the browser exits).

As described earlier, getting the timeout value correct is critical. Make it too brief, and and the cookie will be removed before the user has finished her business. When used for a shopping basket or other 'visit' based purpose, setting too large a value can mean that the information is retained for too long and starts to interfere.

Of course, you can specifically delete and existing cookie by re-supplying the cookie data and specifying a very short lifetime.

Domain or Site Limitations

The browser will send a cookie only to a Web site that matches the URL of the requested object. When you create a cookie, you must specify the hostname or domain in which the cookie is valid. If you do not specify it, the browser will set the value for you. That information is then used to validate when the cookie is used.

For example, if a cookie is configured to be valid within the mcslp.com domain, then only Web servers within the mcslp.com domain will be supplied with the cookie. This simple mechanism prevents another Web server from accessing the data in your cookie.

To set a domain, prefix the domain with a period — for example, for a cookie to be supplied to any Web server within the mcslp.com domain, you would specify the domain as .mcslp.com. To set it only for a specific Web server you would use the full name, for example www.mcslp.com. Using a specific name in this example would prevent cookie data from being shared between www.mcslp.com and maps.mcslp.com. Browsers will not allow a cookie to be set within a top-level domain — you cannot set a cookie for .com, for example.

Path Limitations

A path specification further limits when a browser sends a cookie to a Web site. Most cookies set a path of '/', which means the cookie will be supplied to all areas of a Web site, but you can be more restrictive about the path where the cookie is valid. For example, using '/cgi-bin/' as a cookie path specification limits the browser to supplying the cookie only to objects within this directory. When accessing a file from '/images', the browser would not supply the cookie to the Web servers.

Cookies are Web server driven, but client browser bound. That is, the only way to create a cookie is for the Web server to supply the cookie specification and contents, and the only way for the Web server to receive the cookie is for the browser to send it. The server does not request the cookie; it is automatically sent with each request.

Modifying the URL

The URL rewriting method does not use cookies. Instead, you provide the session ID as part of the URL request; for example, the page http://www.mcslp.com/index.cgi might be re-written as http://www.mcslp.com/index.cgi?sess_id=297472.

The URL method is more complex, and it relies on having a completely dynamically generated Web site because we must always supply, and process, the session information for each page or object accessed. This complicates development but obviously has the benefit of not requiring users to set cookies on their browsers.

The URL method is comparatively less secure. Because the session ID is exchanged in plain text as part of the URL request, the information can be seen. Viewing a user's basket can then be a case of modifying the URL with the alternative session ID. Without further checking or verification, anybody could use anybody else's session. Sessions used in this fashion will also require manual management to ensure the session expires.

There is no easy way to resolve the security issue; you can embed verification information (such as the host IP address into the session data held on the server), which is then verified, but since an IP address is not unique to a machine it offers little real protection. You can also require secondary verification (perhaps requesting a password) when viewing sensitive information. There are, however, obvious limitations.

>> Session Control Within Perl

Creating a Unique ID

A critical step in making use of sessions is to create a session ID unique enough to singularly identify the session. There are many ways of doing this, the most obvious is in a dynamic environment using the unique sequence ID generated by a database (which is particularly useful if you are storing the session in the DB, anyway). There are also solutions that are not too reliable:

  • Current Time — even if you use an accuracy level of seconds or hundredths of a second, it is still possible for two users to connect at the same time and create the same ID.
  • Host IP Address or Name — IP addresses and names are sometimes shared, especially if the user is connecting through an ISP or firewall.
  • Random Numbers — a single random number is not as random as you would expect. Just as tossing a coin three times will produce at least two results that are the same, even large random numbers can occur multiple times on a server, especially a busy one.

There are many solutions to these problems, and most of them rely on combining information from each of the sources above to produce a unique string. For example, the following code (in Perl):

my $uniqueid = sprintf("%02d%04d%02d-%02d%02d%04d-%d%d%d",
                $sec,rand(9999),$hour,$month,$min,rand(9999),rand(9999),$mday,$year);

relies on a combination of three random numbers and date/time components. Thus, the chances of duplicating the string is significantly reduced, although it is still theoretically possible to generate the same ID based on this process.

For more extensive solutions, we can use generated hashes (based on the same principles, but using a wider range of source data) or even longer, hexadecimal, or alphabetical IDs.

Session Control Within Perl

With Perl, we have limitless choices about how to implement a session system, but there are some standard modules you can use that make the process significantly easier. The primary of these is the CGI module that provides a simple interface for reading and writing cookie data as part of the CGI processing. You may already be using CGI in your scripts, so adapting them to include cookies is quite easy.

For storing information that relates to a particular session ID we can use the Apache::Session module that interfaces either to a database or file to read and store information against our session ID.

Using Cookies

Using cookies within Perl is simplified by using the CGI module because it provides a function to create a cookie and write the cookie data out as part of the HTTP header, something you would do with the CGI module anyway.

To create a cookie with the CGI module, use the cookie() function to create the cookie string. The format of the function is as follows:

 $cookie = $query->cookie(-name=>'sessionID',
      -value=>'262177',
      -expires=>'+1h',
      -path=>'/cgi-bin/',
      -domain=>'.mcslp.com')

The parameters should be self-explanatory; set the name of the cookie, its value and the expiry, domain, and path information where the cookie will be valid. The resulting value is actually a string, and it must be supplied to the browser as part of the HTTP headers:

 print $query->header(-cookie=>$cookie);

Now we have the session ID. Next up, a way of associating data to the ID to support the dynamic elements of a Web site.

>> Combining Apache::Session and CGI

Using Apache::Session

Although cookies can be used to store all sorts of information, it is generally not a good idea to use them to store all the data. In particular, shopping basket information, user identification data (e.g., address, e-mail, and especially password and credit card information) should never be stored within a cookie.

This is because cookie data exchanged in HTTP headers can be snooped. Most browsers also include cookie viewing systems, which may further expose data to prying eyes. Exposing this information when there is no need to do so is obviously a bad idea.

Instead, you should store the session ID in a cookie and then associate the session ID with a store of information about the user that you want to keep. This information can be stored in a file or a database. You should always be able to find the user information because it has been identified with the session ID.

Storing the information can be complex, but another module can simplify the procedure: Apache::Session. With Apache::Session a hash is 'tied' to a set of data associated with a specific sessionid — the actual storage behind the scenes can be through a file or a database, with Apache::Session handling the reading and writing of the information for you.

At its simplest, Apache::Session works like this code fragment:

tie(%session_data, 'Apache::Session::File', 
    $sessionid, {Directory => '/tmp/'});
$session_data{basketitem} = 'Computer';
$session_data{CC_data} = '1234 5678 9012 3456';

The Apache::Session module creates a new session ID if the variable holding the session ID is undefined. If the session ID is defined, the module assumes it is a previously created session ID.

Data stored this way is persistent across connections, provided the same session ID is supplied by the browser each time. The data associated with an ID is recorded in a file on the server. The example in the next section combines the modules to provide session IDs and storage.

Combining Apache::Session and CGI

Below is a small CGI application that determines whether a cookie has already been defined. The CGI script should work in three stages:

  1. If the cookie is not defined, a new session ID is created, a new cookie for the session is sent to the browser, and we announce what we are doing.
  2. If the session ID exists, but there is no message written in the session data, we save a message.
  3. If the session ID exists and a message exists in our session data, then we print out the message.
#!/usr/bin/perl
use CGI;
use Apache::Session::File;
my %session_data;
my $query = new CGI;
my $session = $query->cookie('SESSIONID') || '';
if ($session =~ m/[a-zA-Z0-9]/)
{
    print($query->header(),
          $query->start_html('A Cookie Example'),
          $query->h1('A Cookie Example'));
    print $query->p("You have a cookie set ($session)");
    eval {
        tie(%session_data,
            'Apache::Session::File',
            $session,
            { Directory => '/tmp/sessions'});
    };
    if ($@)
    {
        die "Couldn't tie: $@";
    }
    if (exists($session_data{message}))
    {
        print($query->p('Have a message for you:'),
              $query->p($session_data{message}));
    }
    else
    {
        print "Recording a message for you";
        $session_data{message} = 'Aint nobody here but us chickens';
    }
    untie %session;
}
else
{
    my $session = undef;
    eval {
        tie(%session_data,
            'Apache::Session::File',
            $session,
            { Directory => '/tmp/sessions'});
    };
    if ($@)
    {
        die "Couldn't tie: $@";
    }
    $cookie = $query->cookie(-name=>'SESSIONID',
                             -value=> $session_data{_session_id},
                             -expires=>'+24h',
                             -path=>'/',
                             );
    print($query->header(-cookie=>$cookie),
          $query->start_html('A Cookie Example'),
          $query->h1('A Cookie Example'),
          "Setting a new cookie");
}

You can see here that the structure is comparatively straightforward — we supply the cookie and read the cookie data if it exists. Adding data to our session is just a case of assigning the information we want to store to a key within the hash. You should easily be able to adapt the above script for your own applications.

Summary

This article covered the fundamentals of the session process — the definition of a session, the semantics and theory of how to use cookies to create and manage a session, followed by a detailed look at a Perl script that handles both the session and session data. It is the combination of using cookies to create a session and using session data to hold information, such as purchases or preferences, that provide the personalization in many Web sites, including stores like Amazon or customized environments like My Yahoo.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved