Looking at Apache 2.0 Alpha 4

Friday Jun 30th 2000 by Ryan Bloom
Share:

Development continues to roll along on Apache 2.0. In his latest column, Ryan Bloom details what's new in the recently released Apache 2.0 Alpha 4.

Since my last article, the Apache Software Foundation has released the fourth alpha version of Apache 2.0. In this article, I will review some of the features new to the 2.0 series and explain why they were added and how they will help site administrators.

Piped and Reliable Piped Logs
Piped logs are a feature that Apache has had for some time, but they have just been added back into 2.0. Because they are a useful feature and are brand new to the 2.0 series, I will discuss them here.

Logs are very important to every Apache installation. They tell the administrator who is accessing the site and if something has gone wrong with the server. An easily apparent use for logs is to determine if somebody has tried to break in to the server. Logs are obviously not something to be taken lightly; however, there are also some drawbacks to using logs. The first problem with logs is that they can grow very large. Every time a person accesses a page on a site, a message is written to a log. A basic Apache installation does not do anything with logs other than write to them, which means the logs are going to get very large unless something is done about them. Piped and reliable piped logs provide a way to handle this problem.

The second issue with logs is that they can be slow. If an Apache configuration is setup to log the hostname of every machine that requests a page from a site, logging is likely to be very slow on your machine. This is because Apache, like all network programs, uses IP addresses instead of hostnames for all network communication. Apache relies on the local machine's hostname resolver to convert IP addresses into hostnames. This can be a slow process because of the protocol used by the Domain Name Service. The whole time that a thread or process is trying to convert an IP address to a hostname, that thread or process is not doing its primary job, serving web pages. On a heavily loaded site, this can become a very large performance bottleneck. Piped and reliable piped logs can also provide a method for a server to not be affected by this problem.

Now that two real-world issues that piped logs can solve have been identified, we can talk about what they are and how they work. Reliable piped logs and piped logs move the responsibility of writing the log to the file away from the Apache server to some other external process. When Apache starts, if the configuration file specifies that the logs are to be piped, Apache creates a new process and sets up a pipe between that process and the Apache parent process. When the child processes are created, they inherit that pipe and use it to send log messages to the logging process. This happens for each piped log, which means if piped logs are specified for the error, transfer, and access logs, the server will create three separate processes, one for each log. Apache takes advantage of a property of the size of the log messages to ensure that the logging process does not need to synchronize reading the logs. This allows a logging process to read one line from its standard input (the pipe), perform some operation on that string, and write it out to the log file. The log process then reads the next message from the pipe.

How does this help the two problems mentioned above? It allows people to write small programs that solve these problems easily and efficiently. In every Apache distribution there is a small program called rotatelogs. This program reads log messages from the pipe for a specified amount of time and then closes the real log file and renames it. Afterwards, it opens a new log file and begins the process over again. This keeps logs from getting too large, and allows the administrator to easily archive all logs in one convenient place. There is another program called logresolve which will perform the conversion from IP addresses to hostnames.

Now we know what piped logs, both reliable and not, can do for sites running Apache. But what is the difference between the two? Reliable piped logs try to ensure that the log process is always running. It is unusual for a piped log process to die unexpectedly because it is usually a very small program that performs only one function. However, if a log process does die, an Apache installation that takes advantage of reliable piped logs will restart it. Unfortunately, reliable piped logs are not available on all platforms supported by Apache. If Apache does support reliable piped logs on a platform, it will be compiled in by default. To determine if a platform is supported, run Apache with the command line argument -V. This will output all of the options that have been compiled into Apache. Search for the line "-D HAVE_RELIABLE_PIPED_LOG".

The final question is how to configure Apache to use either piped or reliable piped logs. In the configuration file, find the log that should be piped through an external program then simply replace the name of the log file with a command such as:


      "| log_program program_arguments"

The "|" tells Apache this will be a piped log and the commands that follow tell Apache what program to use and how to start it. There is one security problem with piped logs that administrators should be aware of: the log program will be run as the user that started the web server. For most servers this is the root user. For this reason, great care must be taken when writing a logging program to ensure that there are no buffer overflows or other weaknesses that can be exploited.

A New Way to Run CGI Scripts & Programs
CGI programs allow sites to run external programs to produce the results for a request. CGI's are used on most sites on the web and are a very common way to produce dynamic data. Apache has always provided support for CGI programs through the mod_cgi module. When a CGI request is received, the child process that accepts that request creates a new process and runs the CGI. The data output from the CGI program is then sent back to the client as the response to the original request. This works fine with Apache 1.3, but it has serious performance implications with Apache 2.0.

On some Unix systems, when a threaded process forks to create a child process, all of the threads are created in the child process and then all but one is killed. This is obviously not very good for performance. When Apache 2.0 is configured to use threaded child processes, this problem is instantly encountered when running CGI's. To solve this performance problem, Apache 2.0 provides two CGI modules. The first is mod_cgi, which should be used either on non-Unix platforms or on Unix with the prefork MPM1. The second is mod_cgid, which should be used for all threaded MPM's on Unix.

Mod_cgid avoids the performance problem by creating a new CGI daemon process. Before any of the child process are started in the parent Apache process, the mod_cgid module creates a new process, which will become the CGI daemon. This process creates a Unix domain socket to communicate with the Apache child processes. When a child process gets a CGI request, it will send the request to the CGI daemon. The CGI daemon will then create a new process to run the CGI program. This process will be set up to communicate directly with the child process that originally received the request. As the CGI process outputs the response, it will be sent to the child process, which forwards it along to the client. Because the CGI daemon process is a single threaded process, Apache can avoid the performance problems that the original mod_cgi causes.

Describeme

There are issues with using mod_cgid that are not present with mod_cgi. It is possible for the CGI daemon process to die unexpectedly, although it is unlikely because the daemon is a very small process that does very little. On platforms that support reliable piped logs, Apache uses the same technology to restart the CGI daemon if anything happens to it. However, on other platforms it is not possible to restart the CGI daemon process from within Apache.

Bug Fixes
Of course, any alpha release of Apache 2.0 is going to include many bug fixes. I am not going to take much time discussing any of these in great detail, but I do want to run through some of the more important bug fixes.

Better error reporting: In previous alphas if Apache failed, very often it wasn't clear what had caused the problem. This problem has been solved and the error reporting is in much better shape now. If Apache fails for some reason, errors reported in the log should be meaningful.

CGI error reporting: If a CGI reports errors to stderr, those errors will now be written to the error_log. This is a necessity for debugging CGI programs.

Portable build environment: One of Apache's best features is that it works on almost every platform. This has not been true for Apache 2.0 until now. The build system was very finicky about which platforms it worked on. This has been fixed with the fourth alpha. (If you have waited to try 2.0 because it didn't support your platform I suggest trying it again.)

Config.nice is created: Apache 1.3 used the APACI configuration scheme to generate the build environment for various platforms. One of the best features of APACI is that it created config.status, which had the exact command used to configure the server. Apache 2.0 has switched to autoconf. As a result, config.status was missing in earlier alphas. With the latest alpha, Apache 2.0 generates config.nice, which replaces config.status.

Apache works on OS/390: While this isn't of interest to most people, it does prove that Apache is an incredibly portable program. Imagine being able to run the same program on a Windows 95 machine and on a OS/390 mainframe!

How To Help
With the release of this fourth alpha, Apache 2.0 is closer than ever to a beta release. What can you do to help? Even if you are not a programmer, there are things that you can do to help improve Apache 2.0. Apache has a group of developers that are very committed to this release, but we can't possibly test everything in the server. Download the latest alpha and try it out. If you find a bug with the program, please let the developers know. There have been issues with reporting Apache 2.0 bugs recently, but those should all be handled now. If you have a bug to report, please visit the Apache web site (http://www.apache.org/httpd) and read the page about submitting bug reports. All of the instructions for submission are there. The 2.0 developers take bugs very seriously when they are reported. If a bug is reported, it is likely that the developers can fix it, but we must know about it first. We work very hard to ensure that known bugs do not last long. If you think you have nothing to offer the Apache developers, think again. Your experience is vitally important to the success of Apache.

For a discussion of the different MPMs, please see my previous article, "An Introduction to Apache 2.0."

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved