An Introduction to Apache 2.0

Sunday May 28th 2000 by Ryan Bloom
Share:

Apache 2.0 has already been through three alpha releases. In this preview, Ryan Bloom of the Apache Group previews Apache 2.0 and explains why it will make life easier for every Webmaster on the Internet.

Most people who follow Apache development with any regularity know that the Apache Group has recently been focusing on Apache 2.0. There have been many changes to the Apache code since Version 1.3. Some of these changes make an administrator's job easier and some make it harder; however, the changes are all designed to make Apache the most flexible and portable Web server available. This column will try to explain some of the new concepts that Apache 2.0 introduces and how it differs from 1.3.

Multi-Processing Modules
The first major change in Apache 2.0 is the introduction of Multi-Processing Modules (MPMs). To really understand the need for MPMs, it is important to look at how Apache 1.3 works. Apache 1.3 is a pre-forking server, meaning that when Apache is started the original process forks a specified number of copies of itself, which actually handle the requests. As more requests come in, more copies are forked. The original process doesn't actually do anything other than monitor the new processes to make sure there are enough of them. This model works well on Unix variants and most mainframes but it doesn't work as well on Windows. The original support for Windows actually re-wrote the section of code that created the child processes. On Windows this section created just one child process, which then had multiple threads to serve the requests. This separation between Unix and Windows was done with #ifdefs, making the code very hard to maintain.

When work started on Apache 2.0, the Apache Group had many goals. One of these goals was to include support for every platform that was supported by 1.3; another was to add new platforms. As work began, the developers realized that these goals were impossible if all of the code was shared between platforms. An abstraction layer was necessary if the project was going to be manageable. From this realization, MPMs were born. The basic job of an MPM is to start the server processes and map incoming requests onto an execution primitive. Whether that execution primitive should be a thread or a process is left up to the MPM developer. This decision should be made based on which primitive is supported best by the platform the MPM is designed for.

Each of these MPMs has strengths and weaknesses. For example, the prefork MPM will be more robust than either of the two hybrid MPMs on the same platform. The reason for this is simple: if a child process terminates unexpectedly, connections will be lost. How many connections are lost is up to which MPM is used. If the prefork MPM is used, one connection will be lost. If the mpmt_pthread MPM is used, no more than 1/n connections, where n is the number of child processes used, will be lost. If the dexter MPM is used, the number of lost connections will depend on the OS the server is being run on. However, the trade-off in robustness comes at a price: scalability. The prefork MPM is the least scalable MPM, followed by mpmt_pthread, and then dexter. Which MPM is used will depend on what the site requires. If a given site must use a lot of third-party nontrusted modules, then that site should most likely be using the prefork MPM because if the module is unstable, it will affect the site the least. However. if all a site is going to do is serve static web pages and doesn't require any modules but will need to serve thousands of hits per second, then dexter is probably the correct choice.

This abstraction had an unexpected benefit as well: it allows Apache to be configured for a specific site depending on the Webmaster's requirements. When the work began on UNIX, MPMs the developers couldn't agree on the correct design. Three MPMs have been developed so far prefork, mpmt_pthread, and dexter. The prefork MPM does exactly what Apache 1.3 does it acts as a pre-forking server. Mpmt_pthread stands for multi-threaded multi-process pthread server. When this MPM was first written, it required that pthreads was used as the threading library. However, it is now taking advantage of the Apache Portable Run-Time to abstract out the threading library. The name has unfortunately remained. This is the initial version of a hybrid thread/process server for UNIX. This MPM also preforks a specified number of processes. Each of these processes then creates a specified number of threads which serve the requests. In this MPM, the number of threads per process is static. Finally, the dexter MPM was created. This MPM creates a specified number of child processes, which then each create a small pool of threads. As more requests come in, the size of that pool of threads grows and shrinks to accommodate the requests.

Modifications to Modules
Since this article started with a new type of module, the obvious next topic is a module that Apache users everywhere are familiar with, the standard Apache module. These are the modules that people use to add capabilities such as authorization checking to a server. There have been a few modifications made to standard modules which module writers need to be aware of.

One of the most frequent complaints by new module authors about Apache 1.3 is that the initializer hook is called twice. This issue has been resolved in Apache 2.0, as the initializer hook has been removed completely. Instead of using one hook, Apache 2.0 has provided two hooks: pre_config and post_config. In the third alpha of 2.0, only MPMs could take advantage of pre_config hooks. That issue is currently being resolved and all modules should have access to the pre_config hook with the release of Apache 2.0 alpha 4.

The configuration method for Apache has also changed with Apache 2.0. This affects how some directives are defined and how some modules must use the new pre_config hook.

The biggest change between 1.3 and 2.0 when it comes to configuration is that in 1.3, Apache configures the server one line at a time as it is read from the configuration file. In Apache 2.0, each line is read from the configuration file and is stored in a configuration tree. The tree is then traversed and each directive is executed.

If the module you are writing has a directive that modifies how the server interprets the configuration it should be declared with the EXEC_ON_READ flag on in the req_override mask. This tells Apache 2.0 to execute the appropriate function when this directive is read instead of when walking the tree. Very few functions should use this flag, but if the directive needs to read in raw text from the configuration file or process a whole block of configuration text, this flag provides a way to do that.

Once the configuration is read into the tree structure, the pre_config hook is called. This provides a way for modules to modify the tree before it is traversed. Once all of the modules have run their pre-config hooks, the core walks the tree and finishes configuring the server.

The last big change to modules is how a module registers a function to be called for a specific hook. In 1.3, functions are registered by adding them to a table at the bottom of the module. This causes the problem that every time a new hook is added to the server, modules written for the previous version need to be modified before they will work. Admittedly, this didn't happen often. This scheme also wastes space in the server because every module had the entire module structure declared even though most of the structure was empty. In Apache 2.0, the size of the module structure has shrunk significantly. This is done by replacing most of the structure with one function which allows modules to register functions for the rest of the hooks.

The ability to register individual hooks independent of the rest of the hooks has an important benefit besides saving some memory. In Apache 1.3, all of a module's functions are tied together. This means that if mod_include executes before mod_cgi, then for each hook mod_includes function will execute before mod_cgi's. This restriction has been removed with 2.0. When registering a function in 2.0, it is possible to specify when it should run relative to other modules, either first, last, or in the middle. Any number of modules can register for those positions and within each section the modules are executed in random order. This means that if two modules both register a function for the check_auth hook and they both want their functions to run first, it is impossible to determine which of these two will actually run first. All that is known about those two functions is that they will both run before any function registered to run either last or in the middle. If the module's function must run either before or after another module's function for the same hook, it is also possible to specify that when registering for the hook.

Apache Portable Run-Time
Perhaps the biggest change in Apache 2.0 is the Apache Portable Run-Time (APR). In previous versions of Apache, portability for Apache was handled internally, which kept Apache developers from doing what they really wanted, making the most popular web server. With Apache 2.0, portability is handled by APR. APR is a project that is currently tied to Apache, but should be spun off into its own project soon after the official 2.0 release. The goal of APR is to provide a single C interface to platform- specific functions so that code can be written once and compiled and run everywhere.

This approach has a couple of advantages over handling portability internally. First of all, the Apache code becomes much more manageable and maintainable. Secondly, and more importantly, APR uses native calls whenever possible. As a result, when Apache is running on a Windows system, it looks like a native Windows program, but when Apache is running on a Unix system, it looks like a UNIX program. On Windows this advantage has provided Apache with a major performance boost.

APR has been designed with C programmers in mind. Whenever possible APR functions have been made to mimic POSIX functions in order to make it easy for programmers to quickly port current programs to APR. This has already been done with some of Apache's support programs. For example, ApacheBench, which has never run on Windows, has been ported with minimum effort to all APR-supported platforms including Windows, OS/2, and BeOS.

Conclusion

Apache 2.0 has already had three alpha releases and everyday more work is done to make it more reliable, more secure, and faster. This column will continue to be a place to find the latest news about Apache 2.0 as well as some tips for migrating your existing setup to 2.0 as soon as possible. The next version of 2.0 will be released when it is ready. Until then, keep checking back here. By the time 2.0 is released, you'll be ready for it.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved