In the classic sense, a proxy server is a server that sits between you and the Internet. If a Web browser is configured to do so, all requests will be made through the proxy, which in turn will apply filtering rules. The proxy will then request the site the user was trying to reach on his or her behalf, or more accurately: "by proxy," as the name implies.
A "transparent proxy" refers to a proxy server configured to serve requests without the client machine knowing about it. The drawback here is that the proxy will not support SSL, but on the bright side users' browsers require no configuration for plain HTTP traffic. Many times this is used with a caching proxy, which serves images and other large files from its cache, rather than using Internet bandwidth to fetch them every time.
A reverse proxy, the main topic today, is one that sits between your Web server and the world. When an HTTP connection comes in, the reverse proxy will decide what to do, and then make a request to the appropriate back-end Web server. Reverse proxies are very important, and they are frequently tasked with many roles.
What It Does A reverse proxy can be an SSL terminator. This means SSL certificates (and their keys) are installed on the proxy server, as well as the corresponding IP addresses for those sites. SSL is therefore terminated at the proxy, and the requests to the back end happen (generally) in plain text. This is usually OK, but if your internal network is insecure, tricks can be used to get the requests shipped via secure channels.
A reverse proxy can be an SSL terminator. This means SSL certificates (and their keys) are installed on the proxy server, as well as the corresponding IP addresses for those sites. SSL is therefore terminated at the proxy, and the requests to the back end happen (generally) in plain text. This is usually OK, but if your internal network is insecure, tricks can be used to get the requests shipped via secure channels.
This is as good a time as any to bring up "virtual hosts" and SSL. The concept of a virtual host, based on the name of the site, operates on knowledge of the URL used to connectthe HTTP header data. When an HTTP request is made, a Web server that supports virtual hosts will serve different content based on the site requested. Essentially, this means you can point hundreds of domain names at the same IP address. If SSL is negotiated, it must be done with a specific IP address, and the SSL certificate must match the name of the site the user is trying to access. SSL negotiation happens before HTTP data is passed, so the server has only one choice for which certificate to present per IP address. If, after an SSL connection is negotiated, it turns out that the URL requested was actually for a different site, the Web browser will inform you. If it didn't work this way, then SSL would be pointless. Ergo, there is no such thing as a virtual host with SSL.
A reverse proxy can also be a load balancer. Load balancing, in basic terms, works in one of two ways. It either intelligently round-robin requests to a group of servers at the IP layer, or uses a proxy and do even more intelligent things. A group of servers can be used to serve sites by using a DNS round-robin. A hostname can be given multiple DNS records, so that connections will choose one out of the group. Of course, this is a pain to manage with SSL sites. A router can also load balance requests in a similar fashion, which requires state be kept so subsequent requests make it to the right server. Most devices that do this are simply going to act as a proxy, though. Using a proxy to load balance makes great sense, especially considering the other features it can provide.
A reverse proxy can also act as a sort of application layer firewall for your Web servers. In two regards, actually: Incoming requests are subject to the rules and policies defined in the proxy server's configuration, and Web servers can be locked off from the world, effectively neutering cross-site scripting exploits.
A reverse proxy is often tasked with acting as a content filter, too. This is closely related to the firewalling aspect but with more distinction. Most proxy server vendors implement a mechanism to block certain keywords or content-types. This can be another layer to preventing code exploits from getting back to your real servers.
Pretty much everything a forward proxy can do can be accomplished with a reverse proxy server as well. A caching proxy, like squid, can be used in conjunction with the reverse proxy in a variety of configurations. If the reverse proxy doesn't support caching, many sites opt to configure access to the back-end servers through a caching proxy, so images and other static content doesn't have to be retrieved from the real servers. Many reverse proxies can also farm out specific tasks, like images, to a completely separate server. These are often referred to as "Web accelerators."
What Does It
There are many proxy server products that will operate as a reverse proxy, but we'll just focus on a few free and open source ones. Apache 2.2 now comes with mod_proxy_balancer. Apache has supported reverse proxying for a long time with mod_proxy, but with the balancer module, Apache can now be used to configure much more complex and resilient setups. Of course configuration isn't simple, and Apache itself is very resource intensive and memory hungry.
Pound is a reverse proxy and load balancer that terminates SSL connections. It is very nice to configure. Its advantage over Apache is that it is lightweight and carefully written. Many pound users report amazing statistics of throughput, and of course mention that it has been reliable the entire time.
This article was originally published on Enterprise Networking Planet.