Apache 2.0 has already been through three alpha releases. In this preview, Ryan Bloom of the Apache Group previews Apache 2.0 and explains why it will make life easier for every Webmaster on the Internet.
people who follow Apache development with any regularity know that
the Apache Group has recently been focusing on Apache 2.0. There
have been many changes to the Apache code since Version 1.3. Some of
these changes make an administrator's job easier and some make it
harder; however, the changes are all designed to make Apache the most
flexible and portable Web server available. This
column will try to explain some of the new concepts that Apache 2.0
introduces and how it differs from 1.3.
first major change in Apache 2.0 is the introduction of
Multi-Processing Modules (MPMs). To really understand the need for
MPMs, it is important to look
at how Apache 1.3 works. Apache 1.3 is a pre-forking server, meaning
that when Apache is started the original process forks a specified
number of copies of itself, which actually handle the requests. As
more requests come in, more copies are forked. The original process
doesn't actually do anything other than monitor the new processes to
make sure there are enough of them. This model works well on Unix
variants and most mainframes but it doesn't work as well on Windows.
The original support for Windows actually re-wrote the section of
code that created the child processes. On Windows this section
created just one child process, which then had multiple threads to
serve the requests. This separation between Unix and Windows was
done with #ifdefs, making the code very hard to maintain.
work started on Apache 2.0, the Apache Group had many goals.
One of these goals was to include support for every platform that
was supported by 1.3; another was to add new platforms. As work
began, the developers realized that these goals were impossible if
all of the code was shared between platforms. An abstraction
layer was necessary if the project was going to be manageable. From
this realization, MPMs were born. The basic job of an MPM is to
start the server processes and map incoming requests onto an
execution primitive. Whether
that execution primitive should be a thread or a process is left up
to the MPM developer. This decision should be made based on which
primitive is supported best by the platform the MPM is designed for.
these MPMs has strengths and weaknesses. For example, the prefork
MPM will be more robust than either of the two hybrid MPMs on the
same platform. The reason for this is simple:
if a child process terminates unexpectedly, connections will
be lost. How many connections are lost is up to which MPM is used.
If the prefork MPM is used, one connection will be lost. If the
mpmt_pthread MPM is used, no more than 1/n connections, where n is
the number of child processes used, will be lost. If the dexter MPM
is used, the number of lost connections will depend on the OS the
server is being run on. However, the trade-off in robustness comes
at a price: scalability. The prefork MPM is the least scalable MPM,
followed by mpmt_pthread, and then dexter. Which MPM is used will
depend on what the site requires. If a given site must use a lot of
third-party nontrusted modules, then that site should most likely be
using the prefork MPM because if the module is unstable, it will
affect the site the least. However. if all a site is going to do is
serve static web pages and doesn't require any modules but will need
to serve thousands of hits per second, then dexter is probably the
abstraction had an unexpected benefit as well:
it allows Apache to be configured for a specific site
depending on the Webmaster's requirements. When the work began on
UNIX, MPMs the
developers couldn't agree on the correct design. Three MPMs have
been developed so far prefork,
mpmt_pthread, and dexter. The prefork MPM does exactly what Apache
1.3 does it acts as a
pre-forking server. Mpmt_pthread stands for multi-threaded
multi-process pthread server. When
this MPM was first written, it required that pthreads was used as the
threading library. However, it is now taking advantage of the Apache
Portable Run-Time to abstract out the threading library. The name
has unfortunately remained. This is the initial version of a
hybrid thread/process server for UNIX. This MPM also preforks a
specified number of processes. Each of these processes then creates a
specified number of threads which serve the requests. In this MPM,
the number of threads per process is static. Finally, the dexter MPM
was created. This MPM creates a specified number of child processes,
which then each create a small pool of threads. As more requests
come in, the size of that pool of threads grows and shrinks to
accommodate the requests.
Modifications to Modules
this article started with a new type of module, the obvious next
topic is a module that Apache users everywhere are familiar with, the
standard Apache module. These are the modules that people use
to add capabilities such as authorization checking to a server.
There have been a few modifications made to standard modules which
module writers need to be aware of.
of the most frequent complaints by new module authors about Apache
1.3 is that the initializer hook is called twice. This issue has
been resolved in Apache 2.0, as the initializer hook has been removed
completely. Instead of using one hook, Apache 2.0 has
provided two hooks: pre_config and post_config. In the third alpha
of 2.0, only MPMs could take advantage of pre_config hooks. That
issue is currently being resolved and all modules should have access
to the pre_config hook with the release of Apache 2.0 alpha 4.
configuration method for Apache has also changed with Apache 2.0.
This affects how some directives are defined and how some modules
must use the new pre_config hook.
biggest change between 1.3 and 2.0 when it comes to configuration is
that in 1.3, Apache configures the server one line at a time as it is
read from the configuration file. In Apache 2.0, each line is read
from the configuration file and is stored in a configuration tree.
The tree is then traversed and each directive is executed.
If the module you are writing has a directive that modifies how the
server interprets the configuration it should be declared with the
EXEC_ON_READ flag on in the req_override mask. This tells Apache 2.0
to execute the appropriate function when this directive is read
instead of when walking the tree. Very few functions should use this
flag, but if the directive
needs to read in raw text from the configuration file or process a
whole block of configuration text, this flag provides a way to do
configuration is read into the tree structure, the pre_config hook is
called. This provides a way for modules to modify the tree before it
is traversed. Once all of the modules have run their pre-config
hooks, the core walks the tree and finishes configuring the server.
big change to modules is how a module registers a function to be
called for a specific hook. In 1.3, functions are registered by
adding them to a table at the bottom of the module. This
causes the problem that every time a new hook is added to the server,
modules written for the previous version need to be modified before
they will work. Admittedly, this didn't happen often. This
scheme also wastes space in the server because every module had the
entire module structure declared even though most of the structure
was empty. In Apache 2.0, the size of the module structure has
shrunk significantly. This is done by replacing most of the
structure with one function which allows modules to register
functions for the rest of the hooks.
ability to register individual hooks independent of the rest of the
hooks has an important benefit besides saving some memory. In Apache
1.3, all of a module's functions are tied together. This means that
if mod_include executes before mod_cgi, then for each hook
mod_includes function will execute before mod_cgi's. This
restriction has been removed with 2.0. When registering a function in 2.0, it is possible to specify when it
should run relative to other modules, either first, last, or in the
middle. Any number of modules can register for those positions and
within each section the modules are executed in random order. This
means that if two modules both register a function for the check_auth
hook and they both want their functions to run first, it is
impossible to determine which of these two will actually run first.
All that is known about those two functions is that they will both
run before any function registered to run either last or in the
middle. If the module's function must run either before or
after another module's function for the same hook, it is also
possible to specify that when registering for the hook.
Apache Portable Run-Time
the biggest change in Apache 2.0 is the Apache Portable Run-Time
(APR). In previous versions of Apache, portability for Apache was
handled internally, which kept Apache developers from doing what
they really wanted, making the most popular web server. With Apache
2.0, portability is handled by APR. APR is a project that is
currently tied to Apache, but should be spun off into its own project
soon after the official 2.0 release. The goal of APR is to provide a
single C interface to platform- specific functions so that code can
be written once and compiled and run everywhere.
approach has a couple of advantages over handling portability
internally. First of all, the Apache code becomes much more
manageable and maintainable. Secondly, and more importantly, APR
uses native calls whenever possible. As a result, when Apache is
running on a Windows system, it looks like a native Windows program,
but when Apache is running on a Unix system, it looks like a UNIX
program. On Windows this advantage has provided Apache with a major
been designed with C programmers in mind. Whenever possible APR functions have been made to mimic POSIX
functions in order to make it easy for programmers to quickly port
current programs to APR. This has already been done with some
of Apache's support programs. For example, ApacheBench, which has
never run on Windows, has been ported with minimum effort to all
APR-supported platforms including Windows, OS/2, and BeOS.
2.0 has already had three alpha releases and everyday more work is
done to make it more reliable, more secure, and faster. This
column will continue to be a place to find the latest news about
Apache 2.0 as well as some tips for migrating
your existing setup to 2.0 as soon as possible. The next
version of 2.0 will be released when it is ready. Until then, keep
checking back here. By the time 2.0 is released,
you'll be ready for it.