Reverse Proxying With Apache 2.0

Thursday Dec 18th 2003 by Martin Brown
Share:

We flip the proxy model on its head and discuss when and how to implement a reverse proxy server using Apache 2.0.

The previous Apache-focused tutorial published on ServerWatch discussed the benefits of a proxy server for the network, and how it can speed up access, reduce bandwidth requirements, and perform basic information filtering tasks. This type of proxy is a forward proxy -- it forwards requests from a network to the Internet.

However, if the proxy model is flipped on its head, a different type of proxy server is created -- a reverse proxy. In this instance, instead of requests from a client being forwarded (and optionally cached) through the proxy to the Internet, requests are forwarded (and cached) to one or more Web servers, as illustrated in Figure 1.





Figure 1

Interesting, you're thinking. But what is the benefit of this?

Reverse proxies are useful for reasons similar to why forward proxies are useful. The performance and security aspects are similar to those provided by a forward proxy. The other, and less obvious, advantage is that a reverse proxy provides a unified interface to Web servers.

Reverse Proxy Gateway Operation

One of the problems with supporting a modern Web site is that as the site grows, the level and quantity of information requested and returned also increases. A number of solutions have been developed to resolve this issue. The most obvious is to just build a bigger, more powerful server by adding more CPUs, RAM, disk space, and network interfaces. Ultimately, however, a physical or practical limit is reached that makes it impossible to expand any further.

Other solutions involve simple, or complex, load balancing techniques, clustering tools, or manual (and generally complex) methods of splitting up the site into different areas, and manually redirecting users to different machines to handle the requests and load.

With a reverse proxy, a single machine is inserted to act as a gateway to the real servers in the network. Now, instead of multiple machines directly handling the requests from clients, a single machine is responsible for accepting and redirecting the requests to the real servers. This means that a single domain continues to appear as a single machine, while still having the flexibility of multiple machines working behind the scenes to honor the actual requests.

The unified interface is, in essence, the same as using a forward proxy for Internet access. However, instead of being a single interface to the Internet, it becomes a single interface into the Web server network.

Caching of Static Data

Another problem with most Web sites, even those based on static content, is that the information must be read off of the disk each time it is supplied to a client. With a bit work within Apache we can use mod_cache (and the mod_mem_cache module) to keep some documents in memory.

A reverse proxy can provide an in-memory cache on a single machine, servicing the requests from clients for a number of different real servers because the proxy server is caching only requests.

Caching Dynamic Data

As with a forward proxy, we can cache content from the individual Web servers, enabling the reverse proxy to appear as a single machine. Because the information is cached, it can be returned significantly quicker than from a typical static or dynamic solution. In a Web server design where individual pages are generated from a large number of dynamic components this can be a significant benefit.

Consider a typical page, made up of 10 different dynamic elements. If 100 clients attempt to access those dynamic elements simultaneously, then 1,000 requests must be loaded. For some sites, these 'dynamic' elements are nothing more than data extracted from a database. The actual basis page data doesn't change much, but when used with a dynamic site, we must still process that same database request each time the page is accessed.

If a reverse proxy is placed between the clients and the Web servers, we can cache the basic content, reducing the load on the database that provides the information and on the server that must execute the application to load that data and convert it into a page.

The reverse proxy could cache the entire content of the dynamic elements in memory, or on disk, and return them to clients much quicker than the dynamic process. You could also set the cache to be updated (through the cache expiry system) to provide the latest versions of stories and data for the site.

Security

Because all requests give the appearance of coming in through a single server, not to one of the many back-end servers that actually support the site, reverse proxying enables us to provide a single point of authentication. Users log in to the proxy server -- as the gateway to the Web site -- and need never log in again, even though they may be accessing other machines to obtain information. This can be done with either Apache's own authentication systems (using a cookie or database-based authentication) or SSL-based communication. Using a reverse proxy, you need manage only one certificate on one machine.

The same basic principles also apply when restricting access. If you were supporting an intranet and wanted to support connectivity from specific hosts, domains, or through a VPN connection, then you could open up the connectivity through a reverse proxy without having to open up the main servers.

If you have a firewall, then you can use the proxy server on the public side, or within your DMZ using secured (VPN) or restricted communication links between the reverse proxy and the real servers behind the firewall, as shown in Figure 2.

Figure 2

Basic Reverse Proxy Configuration

From a client perspective, a reverse proxy looks just like a standard Web server. It doesn't require any special configuration to operate (and if it did, it wouldn't be anywhere near as useful).

The only real requirement is to ensure the forward proxy is switched off, which is done using the ProxyRequests directive:

ProxyRequests Off

But we do need to configure the reverse proxy to tell it where it should be redirecting or caching information for clients that request information. The system redirects specific directories within the hostname assigned to the proxy server to an alternative host. For example, Figure 3 shows three back-end servers, and a front-end reverse proxy identified as www.mcslp.com.

Figure 3

When a user requests www.mcslp.com/marketing, the admin actually wants the content on marketing.mcslp.com to be returned instead. For this he must edit the Apache httpd.conf file on the reverse proxy, or the machine being used as a front end to the Web site, and then set the ProxyPass directive for the requested directory to point to the URL of the real data. For example:

ProxyPass /marketing http://marketing.mcslp.com

The above line would cause the proxy server to supply the data from marketing.mcslp.com when a request for an object within /marketing was requested. For example, the content of the URL www.mcslp.com/marketing/index.html would actually come from marketing.mcslp.com/index.shtml.

ProxyPass generates an internal proxy request from the remote directory and then returns the information, just as a forward proxy does with a proxy request from a client. This is not redirection -- the information is loaded to the proxy server from the real host and sent back to the client from the reverse proxy as if the data were from the proxy server.

You can also configure the same effect from within a Location directive by simply omitting the directory (because Apache gets the directory context from the Location directive):

ProxyPass http://market.mcslp.com

The redirection for all three directories requires something like:

ProxyPass /marketing http://marketing.mcslp.com
ProxyPass /accounts http://finance.mcslp.com
ProxyPass /sales http://sales.mcslp.com

The second argument is a URL, so it could point to a sub-directory on a remote machine, too (e.g., the directive).

However,

ProxyPass /contact http://sales.mcslp.com/contact

would redirect requests from www.mcslp.com/contact to the same directory on the sales Web server.

You can also stop subdirectories of a directory being passed through by using an exclamation mark (!) as the destination URL. For example, to reverse proxy /marketing, but not /marketing/contact you would use:

ProxyPass /marketing/contact !
ProxyPass /marketing http://marketing.mcslp.com

Proper Reverse Proxy Configuration

The only problem with the ProxyPass directive is that it's not "clean" reverse proxying. Although the directive will correctly pass data through to the remote host, the HTTP headers (some of which contain the true location of the data) will remain unchanged. So, for example, when accessing www.mcslp.com/marketing/index.html, the client browser will be able to identify the true source of the data as marketing.mcslp.com/index.html just by looking at the HTTP headers returned.

The one downside is that this can cause problems with relative links in pages that would ultimately point to the true server, not the proxy server we're trying to hide behind. Solving this problem requires an additional directive, ProxyPassReverse. This forces the proxy module to rewrite the HTTP header fields Location, Content-Location, and URI with the address of the proxy server, not the true server.

A true reverse proxy configure requires two lines:

ProxyPass /marketing http://marketing.mcslp.com
ProxyPassReverse /marketing http://marketing.mcslp.com

The first line triggers the proxy request for the real data; the second handles the rewriting.

Important to note is that at no point does Apache rewrite the content of the information it is sending back, which can cause a few problems. Luckily, if you are already using a single server and replacing it with multiple servers and a reverse proxy interface, you shouldn't have to make changes on the site, as the references you are already using will continue to be valid in the new setup.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved