Sunday, September 23, 2007

Turbo-charge your PHP website with query caching

Many PHP websites also rely on some kind of back-end storage, most common I would guess is MySQL. But no matter what database server you utilize, you will sooner or later come to a point in time where your traffic exceeds your servers capabilities. First time this happens you will probably start optimizing your queries, perhaps caching some results in the session, adding a few indices to your tables and so on. This works for some time, and even might work over and over again, but working like this, killing fires as they flame up, can be quite stressful (and annoying!)

So, how to avoid this then? I have developed a simple enough method that cache entire query results using the query as the cache key that works out great for me most of the time. Since the queris can be quite long, I really use the MD5 sum of the query, but that's just one strategy. I use a range of different cache backends for storing my cache data. Sometimes I can use a local fast filesystem (like tmpfs), so I use a file-based cached, but in some cases when I need extreme performance, I might "couple" shared-memory and memcached storage. Caching some data in the session might still be a good idea though.

Getting it right from the start is one thing, if you're able to do that (your site is not crwaling on its knees quite yet :). It will definately help you a lot. If you're already stuck with thousands and thousands lines of code, maybe this will help you get started..?

You need to look closely at every piece of data you pull from the database and consider if, how, and for how long you would be able to cache this "object". You will also need to determine when it needs to be expired (re-read) from the database.

Let's take a look at a simple example from a typical site, where querying the number of unread messages for a (logged in) user is a common operation (maybe even every page load).


// Example code:
function getNumUnreadMessages($user_id)
{
$sql = "SELECT COUNT(*) FROM messages WHERE recevier_user_id = $user_id AND status='unread'";
$results = db_fetch_value($sql);

return $results;
}


A quick thought tells us it's not really efficient to check this value every page load. The first approach might be to add a variable holding the timestamp of the last check in the session and check against this each time the function is called, but this will quickly become a little bit cluttered and tricky to keep track of when you have alot of queries.

Providing a simple wrapper function to the above database call is the first step towards implementing cached queries. Some code says more than a thousand words:


function cache_db_fetch_value($sql, $cache_key = false, $cache_time=600)
{
if(!$cache_key)
$cache_key = md5($sql);
$res = Cache::get($cache_key);
if($res === FALSE) {
$res = db_fetch_value($sql);
if($res !== FALSE) {
Cache::set($cache_key, $res, $cache_time);
}
}

return $res;
}

function getNumUnreadMessages($user_id)
{
$sql = "SELECT COUNT(*) FROM messages WHERE recevier_user_id = $user_id AND status='unread'";
$results = cache_db_fetch_value($sql);

return $results;
}



I guess you get the picture? We wrap all calls to our original db_fetch_value() in a method that handles the caching "for us", automatically.


However, this will not immediately notify the user if he gets a message, to do that we need to tweak a little bit more. We need to specify the cache-key in a way that makes it identifiable via the user_id when caching the number of unread messages. Then, when a message is sent to a user, we simply need to Cache::remove() it, and the next time the user checks his unread messages, he will find that he has one!




function getNumUnreadMessages($user_id)
{
$sql = "SELECT COUNT(*) FROM messages WHERE recevier_user_id = $user_id AND status='unread'";
$results = cache_db_fetch_value($sql, "unread_messages_{$user_id}");

return $results;
}

function sendMessage($sender_id, $receiver_id, $message)
{
// INSERT INTO messages ...
Cache::remove("unread_messages_{$receiver_id}");
// ...
}



Let's go on with trying out a simple Cache implementation that you can start trying out with your own code.


This Cache class provides three basic self-explanatory methods, set,get and remove. The first function argument is always the cache_key. This implementation is very simple and stores cached objects on file in a temporary directory. I am sure you can build something more suitable for your environment.



public class Cache
{
static CACHE_DIR = "/tmp/cache";

function makeFileName($cache_key)
{
return Cache::CACHE_DIR . DIRECTORY_SEPARATOR . md5($cache_key);
}

function set($cache_key, $obj, $cache_time=600)
{
$filename = $this->makeFileName($cache_key);
$cache_obj = Array(time()+$cache_time, $obj);
if(file_put_contents($filename, serialize($cache_obj), LOCK_EX)) {
return TRUE;
}
return FALSE;
}

function get($cacke_key)
{
$filename = $this->makeFileName($cache_key);
if(is_file($cache_key)) {
list($expire, $obj) = unserialize(file_get_contents($filename));
if($expire < time()) {
@unlink($filename);
return FALSE;
}
return $obj;
}
return FALSE;
}

function remove($cache_key)
{
$filename = $this->makeFileName($cache_key);
@unlink($filename);
}
}



Of course you don't have to use this for only caching queries, it can be useful to cache many other things, like RSS streams, config files, static files etc etc.


Happy Caching for now!