Make your cheap Apache server Un-Slashdottable, Un-Digg-Effect-able

in
Cheap hosting plans or low-power servers, especially those running media-rich sites or slow, heavy code (like CMSes such as Drupal or Wordpress) can fall victim to sudden bursts of heavy traffic from popular social media sites such as Digg or Slashdot.


Drupal offers a great module called "Coral Defender" to automatically save your server without your lifting a finger, by automatically redirecting traffic from pre-defined high-traffic sites to the Coral Content Distribution Network, and automated web caching system. Coral Defender is pre-configured for Slashdot, Blogspot, BoingBoing, del.icio.us, Digg, Engadget, kuro5hin, Netscape, reddit, SomethingAwful, Technorati and twitter. Wordpress offers the very similar Digg Defender, which is pre-configured for Digg, Slashdot, Fark, SomethingAwful, kuro5hin, Engadget, BoingBoing and del.icio.us.


But, not everyone runs Drupal or Wordpress. Underneath, these tools work by using regular expressions to match the referrer headers in the HTML request, and issuing a redirect if necessary. A redirect reply is a very small and efficient response from the server. Your server can crank out hundreds of 302 response codes from a RewriteRule and still use less CPU and bandwidth than spitting out ONE single pageview of your main site.


Martin Fitzpatrick at MuTube has already written a great article on the underlying mechanism, explaining the details of Coral, mod_rewrite and referrers. Go read the article, because it's all the meaty info you really ought to learn about. Martin's sample script is rigged for Digg, BlogSpot, reddit and slashdot.


My contribution here is to take Martin's script, which can be used on any Apache server with mod_rewrite (even one running Drupal or Wordpress!) generalize it, extend it with every high-traffic source site I could think of, and make it an easy download. EVERY Apache server out there that isn't already prepared for heavy traffic through a load balancer or the like should install this script. It can only help you if you install it BEFORE someone links to a popular page on your server. So, either do it now, or hope you never get popular.


This nice thing about the Coral cache is that it caches the Javascript code used by ad networks like AdSense (like this one: ) and that code gets run each time the cached page gets viewed, so you get an ad impression (and potential revenue-generating click) even from the cached copy! The Digg Defender WordPress module and Coral Defender Drupal modules rewrite in-page local URLs to Coral as well, which saves you additional redirects.


Curious about what sites to include? Try using Google Sets, and feeding it a few known high-traffic sites. Creepy how smart Google is.


Bob Maple points out that the RewriteRule should match on ^/(.*)$ instead of Martin's ^(.*)$. This is because the requests coming into the server should always start with /. If you keep the leading /, which gets put into $1, then when you assemble a new URL with http://%{HTTP_HOST}.nyud.net:8080/$1, it will look like http://myhost.nyud.net:8080//foo.html. While most servers will tolerate the // between the port number and the path/file, it's improper and ugly, and easy enough to solve in the regex match.


Another important aspect is that the LAST RewriteCond rule must omit the [OR] option.
#add rules ABOVE this one, do NOT change or remove this one,
#as it needs to omit the [OR] option for the ruleset to work
RewriteCond %{HTTP_REFERER} example\.com [NC]
OR combines the current rule with the NEXT rule using OR logic rather than the default AND. This allow you to string a long chain of referrer sources together to a short set of other conditionals above. If you look carefully at Martin's example, he leaves off the OR (and the NC as well) on his last rule for Slashdot, but he doesn't really draw attention to it or explain why.
RewriteCond %{HTTP_REFERER} slashdot\.org
Leaving off NC is a mistake, and missing the OR omission will make your rules misfire and redirect EVERYONE, because they are trying to OR with a final (missing) RewriteCond, which effectively is an open wildcard. In my case, I've added a final rule referencing the fictional example.com, which you'll never get referrers from. This way you don't have to remember to omit OR from your last real rule, you can just copy/paste/modify existing rules as long as you keep them above the comment telling you not to mess with the final rule line.


Installation:
First, make sure you have mod_rewrite installed properly.
Then, drop the contents of the following into your server config. Part of the beauty of mod_rewrite is that RewriteRule can be employed in the context of either server config (global to everywhere on the server), virtual host (applies globally to an entire site), directory (just what it says) or htaccess. The first three contexts are normally setup in the server config file(s), which many virtual hosting customers won't have available to them. However htaccess is simply a file inside your site directory, and is usually available even to virtual server hosting customers.


So, if you run a number of sites on your own server, consider installing this globally in your server config. If you don't run your own server, go for the htaccess route.


Notes:
The machine running Arcticus.com (and some other small sites) is a rackmount Dual Xeon HT 2.4GHz 2Gb RAM Ubuntu 7.04/feisty Apache 2.2.3 colo'ed at Red Rocks Data Center hiding in a bunker at a former satellite uplink facility. Kudos to Tom and Tim at RRDC for providing such great service.


Todo:
As others have noted, it would be nice if this feature could automatically turn on only when the server was under load. I don't know of a way to accomplish this with pure mod_rewrite, since the server variables available in the match string are limited and offer no insight into the server load. One way to do it would be to have some other hook add these rules to .htaccess (or the server config) whenever load exceeded a threshold, but that's beyond the scope of this article.

AttachmentSize
RewriteRules.txt2.13 KB