New in the Apache space: the Perchild MPM, which specifies User and Group IDs for clusters of child process. Ryan Bloom explains how this will make your life simpler.
One of the biggest problems with administering a major server housing multiple sites is restricting access to the sites to only those people responsible for maintaining
a specific site. The reason for this is that all of the Apache child
processes run with the same user and group Id.
Therefore, all of the files need to be readable, writable, and
executable by the user and group that the server is running as. This
becomes a much bigger issue when you add CGI and PHP scripts to the
site. If those scripts must access private information, then that
information must be stored with relatively insecure user and group
Apache 1.3 solved this problem by introducing suexec, which
introduces other problems and PHP and mod_cgi can not take advantage
of it. Apache 2.0 has introduced a new MPM to solve this problem in
a more elegant way that all scripts can take advantage of.
MPM is called Perchild, and it is based on the Dexter MPM. This
means that a set number of child processes are created and each
process has a dynamic number of threads. In this MPM it is possible
to specify User and Group IDs for clusters of child process. Then,
each virtual host is assigned to run in a specific cluster of child
processes. If no cluster of child processes is specified, then the
virtual host is run with the default User and Group Ids.
were many designs considered for this MPM, but in the end only one
made sense. The first consideration was which MPM to base off of.
The options were the prefork, mpmt_pthread, and dexter. Prefork and
mpmt_pthread had one major drawback, they create new child processes
which are completely separated from each other whenever the server
gets busy. This means that the parent process would need to
determine what User and Group Ids the new process should have when it
is created. While this seems easy at first glance, it requires load
balancing techniques that begin to get very complicated. If the
prefork or mpmt_pthread MPMs are desired, it makes more sense to put
a load balancer or proxy in front of the web servers, and run
multiple instances of Apache on different ports. To the client, this
would look very similar to the Perchild MPM.
eliminating prefork and mpmt_pthread, the only option left was
Dexter. Now, the question was how to associate virtual hosts with
child processes. Do we base the number of child processes on the
number of virtual hosts, or do we allow the web admin to specify how
the setup should look. Assuming that the more flexible we make the
Perchild MPM, the more likely it was to be used, we allow the web
admin to determine how their site looks. This is done through the
combination of two directives:
NumChildren UserID GroupID
first directive allow the administrator to assign a number of child
processes to use the same User and Group Ids. This is to provide for
some level of robustness. Because Perchild creates new threads in
the same child process to handle new requests, it is not the most
robust server, although it is very scalable. If one of
the threads seg faults, then that entire process will die, taking
with it all of the requests currently being server by that child
process. By specifying more than one child per user/group pair, we
allow the server to balance the number of requests between multiple
child processes. The second directive is specified inside a
VirtualHost stanza, and assigns that Virtual Host to a specific User
and Group Id. The server is smart enough to combine all of the
VirtualHosts with the same User and Group Ids to the same child
How Does it Work?
obvious question now, is how does this work internally. The Perchild
MPM has a special global table which it uses to start children and
allow those children to change to the correct user Ids. It
also uses the per-server configuration to pass requests between child
processes. When the MPM encounters a
begins to fill out the global child table. Each child process gets
one place in the table, which stores the User and Group Id that the
child should run as. The table also stores a socket descriptor, but
it isn't filled out until later.
parsing the configuration for each VirtualHost, if the server
AssignUserId directive, it fills out a perchild
per-server configuration structure, which contains the two socket
descriptors. In order to do this, the server creates a set of
anonymous Unix Domain Sockets which are used to pass the request
between processes. After the sockets are created, the server
searches the child table to find the child processes that have the
same User and Group Ids. Once found, one of the socket descriptors
is attached to all of the processes with that User Group combination.
Both socket descriptors are attached to the specific VirtualHost
that is being configured. This step is repeated for all
VirtualHosts. Once all VirtualHosts have been configured, the server
ensures that each host has been assigned a socket. If not, the
server creates a set of default sockets and stores those in any
server that doesn't already have a socket.
step is to create the child processes. When each process is started,
it checks the global child table, and switches to the appropriate
User and Group Ids. If no User and Group Id are specified for this
child process, then the User and Group specified in the main server
are used. Each child also adds the socket in the socket table to the
list of sockets it will poll on. From here, child startup proceeds
as normal with each child process polling on all of the ports opened
in the parent process. This leaves the server looking like Figure 1.
request comes in, the Perchild MPM is the first module called in the
post_read_request phase. During this phase, the Perchild MPM ensures
that the request is for the current child process. If so, processing
continues as normal. If not, the child process uses the VirtualHost
that is attached to the request to find the correct Unix Domain
Socket to use. The child process begins by finding the socket that
is currently being used to communicate with the client in the
connection structure. Once this socket is found, it is passed to the
correct child process through the Unix Domain socket (S1 or S2 in the
diagram). Finally, the part of the request that has already been
read from the client is sent to the new child over the Unix Domain
socket. The original child process then closes its connection to the
client, and longjmps out of the
post_read_request phase to the end of
processing a request. This thread then goes back to listening for
another new request.
request processing then moves to the correct child processing. Once
a socket is passed over the Unix Domain socket, the new child process
is woken up out of poll with data its end of the Unix Domain socket.
Each child has a table over sockets to use for this occasion, there
is one socket in the table for each thread in the process. Usually,
the sockets are set to -1, but when the passed socket descriptor is
detected, we set this thread's spot in the table to -2. Later, the
fact that the socket is -2 is used to determine that we must receive
the socket descriptor from the Unix Domain socket. The received
socket is then placed in the thread's position in the socket table.
then continues as normal, reading from the Unix Domain socket, until
the post_read_request phase. At this point, we know that the request
has come from another child process in our server and we know that
this request is meant for this child processes User and Group Id.
The only thing left to do is replace the Unix Domain socket that is
currently in the connection structure with the socket that was passed
from the first child process. This child then continues serving the
will never be the fastest MPM, because it relies on passing socket
descriptors between processes, which is inherently a slow process.
It would be much faster to give the server multiple IP addresses, and
have different Apache installations listen to port 80 on different
IPs. However, that can get very difficult to administer.
MPM was finished the day before the fifth alpha was released, so it
is not well tested at all. Over the next few weeks and months, this
MPM will become more stable and more portable. Currently, this MPM
has only ever been tested on Linux, but with minor modifications, it
should work on almost all Unices. There has been talk of
modifying the Windows MPM to allow the threads to change their
identities for each request, but that has not happened yet.