PaulSD.com
Apache Notes
(Created: 09/01/2010)


I've administered a number of enterprise Apache servers over the years. The following are some of my notes, which fill in some gaps in the Apache documentation. Note that while most of this is applicable to any Apache server, file paths and other minor details may be Ubuntu/Debian specific.


Modules

For security, I always start by disabling all Apache modules, then selectively adding back in only the specific modules that I need.

The minimum files you'll need in /etc/apache2/mods-enabled/ are:
authz_default.load authz_host.load mime.load mime.conf setenvif.load setenvif.conf

In RHEL, the minimum modules you'll need to leave uncommented in httpd.conf are:
LoadModule authz_host_module modules/mod_authz_host.so
LoadModule authz_default_module modules/mod_authz_default.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule logio_module modules/mod_logio.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule mime_module modules/mod_mime.so

Some modules you will likely want to add back in:
On-the-fly compression: mod_deflate (deflate.load and deflate.conf)
Directory Index Files: mod_dir (dir.load and dir.conf)
Directory File Listings: mod_autoindex (autoindex.load and autoindex.conf, requires alias.load and alias.conf)
HTTPS support: mod_ssl (ssl.load and ssl.conf)
RewriteRule support: mod_rewrite (rewrite.load)
Proxy support: mod_proxy_http (proxy.load, proxy.conf, and proxy_http.load)


Misc Configuration Details

I use the following LogFormat statement for my access logs:
LogFormat "%t %A:%{local}p %a \"%{X-Forwarded-For}i\" \"%{SSL_CLIENT_S_DN}x\" \"%{SSL_CLIENT_M_SERIAL}x\" \"%u\" %k %{Host}i \"%{X-Forwarded-Host}i\" \"%r\" %>s \"%{Location}o\" %Dus(%Ts) %IB(in) %OB(out) %bB(body out) \"%{User-Agent}i\" \"%{Referer}i\"" custom_combined

'DocumentRoot' may be specified relative to 'ServerRoot', but <Directory> statements require the full path and cannot be relative to 'ServerRoot'. However, environment variables (configured in /etc/apache2/envvars) may be used in <Directory> statements: <Directory "${ServerRoot}/htdocs">

For Name-Based Virtual Hosts, the first defined VirtualHost is used when no other VirtualHosts match the name provided by the client.

'CustomLog' may be used repeatedly to create multiple access log files. However, adding any CustomLog directives to a VirtualHost causes all server-level CustomLog directives to be ignored for that VirtualHost.


mod_rewrite Quirks

To limit confusion and configuration mistakes, mod_alias and mod_rewrite should generally not be used together. However, if they are, all mod_rewrite directives are processed before the mod_alias directives, and the mod_alias directives are only processed if no RewriteRules match, or if the matching RewriteRule uses the 'PT' flag.

"RewriteOptions inherit" adds the parent context's rules AFTER the child context's rules. To add rules before the child context's rules, use an 'Include' file. For example, if you have a standard set of RewriteRule filters that you wish to apply to all VirtualHosts before the VirtualHost-specific rules, you should create a file like /etc/apache2/vhost-include with your standard rules, then put 'Include vhost-include' in each VirtualHost above the VirtualHost-specific rules.

RewriteRule URL encoding behavior:
% must be escaped as \% in the substitution string.

Before being matched against RewriteRules, the request URL has its query string removed then is URL-decoded. This does not affect server variables like %{REQUEST_URI}.

When redirecting, the target URL (after substitution, including re-appended query string) is partially URL-encoded (+, /, etc are not encoded, but other special characters are). The 'NE' flag disables this partial URL-encoding. This is not performed when proxying (regardless of the 'NE' flag).

The 'B' flag causes back-references ($1, $2, etc) to be fully URL-encoded (including +, /, etc) before substitution. The 'B' flag does not affect server variable substitutions (like %{REQUEST_URI}). Be careful of interactions between this and the partial URL-encoding performed on redirects after substitution.

URLs containing an encoded / (%2F) are refused with a 404 error before they even reach the RewriteRules. Set "AllowEncodedSlashes On" to allow them.


Reverse Proxy Configuration

A Reverse Proxy can be defined using:
ProxyPass /front http://back/path
or:
RewriteRule ^/front/(.*)$ http://back/path/$1 [P,L]

A worker connection pool is automatically created for the destination of each ProxyPass directive, and is configured using any specified parameters. A pool is NOT automatically created if a RewriteRule is used (by default, each request results in a new connection to the back-end). The target URL of each request (after any rewriting) is used to select the worker (longest substring match wins). So, to manually create connection pools for use by RewriteRule directives:
ProxyPass / !
ProxyPass / http://back1/ retry=0
ProxyPass / http://back2/ retry=0
A pool is also automatically created if a ProxyPassMatch directive is used, but its destination URL is copied verbatim (including $1, $2, etc), so the worker will never substring match the target URL of a request, making the worker useless, and making ProxyPassMatch behave the same as a RewriteRule.

Due to the substring-based URL matching for worker connection pools, if you want to specify a custom 'connectiontimeout' or 'timeout' parameter for connections to the back-end based on a regex (for example, matching particular file types), you will need to use a host alias for the back-end server:
RewriteRule .html$ http://back-alias%{REQUEST_URI} [P,L]
RewriteRule ^ http://back%{REQUEST_URI} [P,L]
ProxyPass / !
ProxyPass / http://back-alias/ retry=0 timeout=60
ProxyPass / http://back/ retry=0 timeout=120
(Where back-alias is another name for the same back-end server, perhaps defined in /etc/hosts)

Connections in the connection pool are typically closed after 15 idle seconds by the back-end server. Adjust 'KeepAliveTimeout' and 'KeepAliveRequests' on the back-end server to keep the connections open longer. Note that the 'ProxyTimeout' directive sets both the default connection establishment and default idle connection timeouts. Also note that the ProxyPass 'max' and 'min' parameters are per MPM process, not global.

ProxyPassReverse* directives and mod_proxy_html do not have access to the HTTP_HOST server variable, but can use variables set by mod_rewrite. So, to store the requested hostname for use by other examples below:
RewriteRule ^ http://back%{REQUEST_URI} [E=HOSTNAME:%{HTTP_HOST},P,L]

To pass the HOST header through to the back-end:
ProxyPreserveHost On

To allow the 'interpolate' flag to be used on ProxyPassReverse* directives (which enables environment variable substitution):
ProxyPassInterpolateEnv On

To fix links returned by the back-end in 'Location', 'Content-Location', and 'URI' Headers:
ProxyPassReverse /front http://back/path # If ProxyPreserveHost Off
ProxyPassReverse /front http://${HOSTNAME}/path interpolate # If ProxyPreserveHost On

To fix cookies returned by the back-end:
ProxyPassReverseCookieDomain back ${HOSTNAME} interpolate # If ProxyPreserveHost Off
ProxyPassReverseCookiePath back front # No trailing /

To fix links in HTML returned by the back-end:
Install the libapache2-mod-proxy-html package and run:
ln -s /etc/apache2/mods-available/proxy_html.* /etc/apache2/mods-enabled/
SetOutputFilter INFLATE;proxy-html;DEFLATE # Enable mod_proxy_html and auto-extract gzip'd pages
ProxyHTMLDoctype XHTML Legacy # Produce Transitional XHTML
# HTML is always re-parsed, by default into HTML4.01 (DOCTYPE tag is stripped), which will mess up formatting if content is XHTML
# Leave off 'Legacy' to produce Strict XHTML (strips any invalid elements/attributes)
# Can also specify custom DOCTYPE tag using 'ProxyHTMLDocType "" XML'
ProxyHTMLExtended On # Fix Links in CSS/JavaScript
ProxyHTMLLinks tag attr # Rewrite the specified attribute in the specified tag (for non-standard tags/attributes)
ProxyHTMLInterp On # Enable Environment Variable Substitution
ProxyHTMLURLMap http://back/path/ http://${HOSTNAME|front}/path/ VL # Fix Absolute Links (if ProxyPreserveHost Off)
ProxyHTMLURLMap http://${HOSTNAME}/back/ http://${HOSTNAME}/front/ vVL # Fix Absolute Links (if ProxyPreserveHost On)
ProxyHTMLURLMap /back/ /front/ # Fix Relative Links (Note that ^ cannot be used when fixing JavaScript Links)


HTTPS Server Certificates

In addition to the standard certificate validation procedures, web browsers match the 'Common Name' and/or 'Subject Alternative Name' fields in the server's certificate against the host name in the URL being accessed, to validate that the server providing the certificate is the authoritative server for the web site the browser wants to access.

In the simple case, 'Common Name' should simply be set to the DNS host name of the Server.

If Name-Based Virtual Hosts are used with HTTPS, things get a bit complicated. The SSL certificate exchange happens before any HTTP messages are transferred (including the HOST header, which is used for Name-Based Virtual Hosts). A new (Firefox 2.0, IE7) TLS extension called Server Name Indication (SNI) allows the host name to be sent during the TLS handshake, before the certificate exchange (see http://wiki.apache.org/httpd/NameBasedSSLVHostsWithSNI and http://en.wikipedia.org/wiki/Server_Name_Indication). However, unless both the client and server support SNI, the server cannot determine the requested host name until after sending it's SSL certificate, so only a single SSL certificate may be used for all Virtual Hosts using a single IP.

To list multiple host names in a certificate, use IPs as host names, or use wildcards:
(See RFC 2818 and http://wiki.cacert.org/wiki/VhostTaskForce)

Only a single value may be listed in the 'Common Name' field.

Multiple values may be listed in the 'Subject Alternative Name' field, syntax:
DNS:domain1.com, DNS:domain2.com, IP:1.1.1.1

Some Browsers (eg. Java) will not accept an IP in the 'Common Name' field, so always use 'Subject Alternative Name' to list IPs.

Most Browsers (eg. Firefox and IE) will ignore the 'Common Name' if 'Subject Alternative Name' is present, so make sure the 'Subject Alternative Name' includes the value listed in 'Common Name'.

Some Browsers will display the 'Common Name' but not the 'Subject Alternative Name', so put the most recognizable name in 'Common Name'.

Per RFC 2818, the '*' wildcard character may be used in either 'Common Name' or 'Subject Alternative Name' to match a single domain name component (ie. '*.domain.com' should match 'www.domain.com' but not 'domain.com' or 'www.sub.domain.com').

Some Browsers (eg. Java) allow '*.*.domain.com', but most Browsers do not (eg. Firefox >=3.0.13, Chrome, and IE). (See Mozilla Bug #159483 and MS KB #258858 ; Google uses a *.*.appspot.com cert even though it doesn't work).

Some Browsers (eg. Firefox, but not IE) will match all domains and sub-domains if '*' (with no other characters) is listed.

Server certificates (even those used for testing) should always be signed by a CA (even if it is a local CA) instead of being Self-Signed. Firefox will print a warning if the certificate is Self-Signed, even if it is trusted. Apache will complain if the Certificate's 'Basic Constraints' extension 'CA' flag is set to 'true', so either set it to 'false' or don't add it to the cert at all.


FIPS Compliance

The US Federal Government requires that all of its servers follow the NIST FIPS 140-2 guidelines. These guidelines disallow the use of SSLv2 and SSLv3 (TLSv1 or later is required), and specify a small set of ciphers that may be used within TLS:
DHE-DSS-AES256-SHA DHE-RSA-AES256-SHA AES256-SHA
DHE-DSS-AES128-SHA DHE-RSA-AES128-SHA AES128-SHA
EDH-DSS-DES-CBC3-SHA EDH-RSA-DES-CBC3-SHA DES-CBC3-SHA
(In order of descending preference ; See NIST Special Publication 800-52)

This is normally configured using:
SSLProtocol +TLSv1
SSLCipherSuite !ADH:!MD5:HIGH
However, that basically causes Apache to simply reject any non-conforming clients, which leads to a very poor user experience. (If rejected, Firefox will display a "no common encryption algorithms" error, while IE will simply display a generic "Connection Failed" error that gives the user no hints as to why it failed.)

For this reason, I prefer to instead configure Apache to accept any connection, then use mod_rewrite to reject any non-conforming clients, allowing me to present a descriptive error message to those clients, including instructions on how to correct the problem. To accomplish this, I edit /etc/apache2/mods-enabled/ssl.conf and replace the default 'SSLProtocol' and 'SSLCipherSuite' lines with:
SSLProtocol all
SSLCipherSuite HIGH:MEDIUM:LOW:-ADH:-MD5:ALL:COMPLEMENTOFALL
SSLHonorCipherOrder on
Then I add the following to the top of each SSL VirtualHost:
RewriteCond %{SSL:SSL_PROTOCOL} !=TLSv1
RewriteRule ^ /var/www/error/ssl_protocol.html [E=SSL_ERROR:1,L]
RewriteCond %{SSL:SSL_CIPHER} !^(DHE-DSS-AES256-SHA|DHE-RSA-AES256-SHA|AES256-SHA)$
RewriteCond %{SSL:SSL_CIPHER} !^(DHE-DSS-AES128-SHA|DHE-RSA-AES128-SHA|AES128-SHA)$
RewriteCond %{SSL:SSL_CIPHER} !^(EDH-DSS-DES-CBC3-SHA|EDH-RSA-DES-CBC3-SHA|DES-CBC3-SHA)$
RewriteRule ^ /var/www/error/ssl_cipher.html [E=SSL_ERROR:1,L]
LogFormat "%t %A:%{local}p %a \"%{X-Forwarded-For}i\" %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%{User-Agent}i\" \"%r\"" ssl
And add the following to either ssl.conf or each SSL VirtualHost (depending on whether I'm using server-level or VirtualHost-level CustomLog statements):
CustomLog /var/log/apache2/ssl_error.log ssl env=SSL_ERROR
And, of course, create ssl_protocol.html and ssl_cipher.html files containing the descriptive error messages.

Note that the above SSLCipherSuite is specifically crafted to use FIPS ciphers if available, but fall back to any other cipher (with a reasonable order of preference) if the FIPS ciphers are not supported. You can use "openssl ciphers -v 'HIGH:MEDIUM:LOW:-ADH:-MD5:ALL:COMPLEMENTOFALL'" to verify the cipher order. The Apache SSLCipherSuite documentation incorrectly states that the '+' prefix adds ciphers to the list, when in reality it only pulls existing ciphers to the end of the list (see "man ciphers").

If you do not use "SSLProtocol all", make sure you prefix each protocol specification with '+'. Otherwise, only the last protocol specification will be used. For example "SSLProtocol TLSv1 SSLv3" will only enable SSLv3 ; you should instead use "SSLProtocol +TLSv1 +SSLv3".


Browser / Apache Parallel Connection Behavior

By default, Firefox opens up to 6 simultaneous connections to each server, IE7 and earlier opens 2 (I think IE8 and later opens 6).

Unfortunately, there is no standard way to tell a browser or user to back off (come back later or reduce the number of simultaneous connections) if the server is overloaded.
The "503 Service Unavailable" HTTP error is intended to provide this, and even allows a "Retry-After" header. Supposedly Google and other web spiders respect this, but neither Firefox or IE will retry any request that resulted in a 503. The only reliable way to get a browser to temporarily back off is to drop SYN packets ; both Firefox and IE will retry after a short timeout, though Firefox is not very smart about it (it opens 6 connections for the first 6 requests, then reuses idle connections for additional requests, but if one of the 6 connections is being dropped, it will not close any idle connections until after it has given up retrying the bad connection).

In addition, neither Firefox or IE will gracefully handle a 503 or any other error on sub-requests (for example, if an image request results in a 503, only the broken image icon will be displayed, and there is no way to manipulate that response to get a message to the user). So, if you need to display a message to users about an overload condition, you must differentiate between HTML requests and sub-requests.

Firefox will immediately retry "408 Request Timed Out" HTTP errors, and can be forced to immediately retry using a 302/307 redirect to the same URL (the "Retry-After" header is not respected). However, IE will not retry a request that resulted in a 408, and will not follow a 302/307 redirect to the same URL (unless, for example, a bogus query string is added).

Apache doesn't actually accept a connection until the first data packet is received.
The TCP handshake for an HTTP Request looks like: SYN (client->server), SYN/ACK (server->client), ACK (client->server), HTTP Request Data (client->server). The ACK and first Data packet may or may not be combined into one packet. The Linux Kernel accepts SYN packets, replies with SYN/ACK, and normally passes connection into the process's accept() method when ACK is received. Apache uses the TCP_DEFER_ACCEPT socket option (see the 'AcceptFilter' directive and "man 7 tcp"), which basically causes the kernel to drop the ACK, and instead pass the connection into accept() when the first Data packet is received.

The Kernel accepts up to /proc/sys/net/ipv4/tcp_max_syn_backlog (default 1024) incompletely established connections, then uses syncookies for additional SYN packets (so it can forget the connection parameters and derive them again from the cookie data returned in the first ACK). Incomplete connections that aren't using syncookies will normally retry the SYN/ACK /proc/sys/net/ipv4/tcp_synack_retries times (default 3) before dropping the connection. When using TCP_DEFER_ACCEPT, the SYN/ACK is instead retried for the specified timeout before dropping the connection (Apache specifies a compiled-in timeout of 30 seconds).

Apache will accept() up to 'MaxClients' simultaneous client connections. The Kernel will queue up to the smaller of /proc/sys/net/core/somaxconn (default 128) and 'ListenBackLog' (default 511) additional established connections. Above this limit, the kernel will drop additional SYN packets (default, if /proc/sys/net/ipv4/tcp_abort_on_overflow == 0) or reject them with TCP RST (if tcp_abort_on_overflow == 1). There does not appear to be any way to return an HTTP error page when Apache is approaching its connection limit.

Apache has no built-in mechanism for limiting simultaneous connections per client. iptables can track simultaneous connections per client IP and drop or reject SYN packets above a limit. The limitipconn2 module can return an HTTP error if a client has too many connections (but only after the request has been received - no limit is enforced on incomplete requests). Remember that organizations with outbound proxies may have a large number of clients behind a single IP.

To limit simultaneous connections per client using iptables:
iptables --new-chain connlimit-httpd
iptables --append INPUT --protocol TCP --destination-port 80 --syn --jump connlimit-httpd
iptables --append INPUT --protocol TCP --destination-port 80 --jump ACCEPT
iptables --append connlimit-httpd --match connlimit ! --connlimit-above <Max Concurrent Connections> --jump ACCEPT
iptables --append connlimit-httpd --match hashlimit --hashlimit-name connlimit-httpd --hashlimit-mode srcip --hashlimit-burst 1 --hashlimit-upto 6/m --jump LOG --log-prefix "iptables: cl-httpd: "
iptables --append connlimit-httpd --jump DROP # or '--jump REJECT -p TCP --reject-with tcp-reset'

After being accepted by Apache, incomplete requests normally time out only if the standard 'Timeout' idle timer or 'KeepAliveTimeout' idle timer expires. mod_reqtimeout can add timeouts and minimum rate limits on request header and request body transfers to prevent a client from holding open a nearly-idle connection indefinitely.


Handling Server Overload

First, iptables should be configured to drop SYN packets above a reasonable per-client simultaneous connection limit, and mod_reqtimeout should be configured appropriately. In general, I believe it is a bad idea to return an HTTP error if a single client is using too many connections, as this is no more effective at stopping malicious users than simply dropping SYN packets, and it may cause problems for a flood of legitimate clients behind an organization's outbound proxy.

If numerous clients are causing the server to hit its simultaneous connection limit:
For a temporary burst of connections, it is probably best to use the default queue/drop behavior to attempt to process the requests without returning errors. For a sustained flood of connections, it would probably be better to return a descriptive HTTP error, since shedding load and informing the user provides a much better user experience than queuing work and responding extremely slowly (perhaps only returning an error for HTML requests and not sub-requests, preferably a small error requiring no additional images or CSS).

While a single Apache server cannot do this itself, this can be implemented using a proxy in front of the application server, where the proxy is configured with a high connection limit (well above the app server's limit), a low 'ProxyTimeout' value (below the proxy's 'Timeout' value), and a custom 504 'ErrorDocument'.

This allows the application server to handle a temporary burst of connections by queuing/dropping, but effectively configures the proxy to measure the application server's response time and return clients an HTTP error if the application server is truly overloaded.

You will likely also want to set the ProxyPass 'retry' parameter to 0 to prevent a single timeout from blocking all subsequent requests with a 503 during the retry timeout. To specify different ProxyTimeout values for different kinds of requests (HTML vs Images, for example), see the Reverse Proxy notes above.

If a low ProxyTimeout cannot be used for some reason, a custom 503 'ErrorDocument' on the proxy and tcp_abort_on_overflow=1 on the application server can instead be used to return a descriptive error when the application server has reached its queued connection limit. However, note that this is less than ideal, since the response time may be poor before reaching the connection limit, or may be acceptable even above the limit, depending on numerous factors.