Magento for Big Kids: Use a Consistent Cache Key Prefix

This is the first in what I hope to be a multi-part series on using Magento.  I've been working with it since June to run one of our core sites at Half Off Depot. During these six months, we have:

  • Built an incredible team
  • Had our application servers swap to disk and otherwise go belly up, on numerous occasions
  • Moved from FastCGI back to mod_php
  • Tuned Apache a bit to squeeze more out of our application servers
  • Moved from a single front end webserver to multiple
  • Introduced MySQL master-slave replication, when the slave actually in use -- it's not just a hot backup
  • Crafted a respectable Magento application deployment process using Capistrano
  • Cranked out some basic Magento extensions
  • Run countless sales, built mobile apps, launched new web applications and a private API

I don't claim to be a Magento expert after only six months (far from it), but I / we have learned a lot during this time.

In this first post, I'm going to unveil a handy tip:

Always use a consistent Magento cache key prefix.

This can be done in your app/etc/local.xml file like so:

<cache>
    ...
    <prefix>foo_</prefix>
    ...
</cache>

Magento performs poorly out of the box.  The crux of this is that its database is based on an Entity-Attribute-Value (EAV) data model.  Using APC is basically a requirement if you do any reasonable amount of traffic, too.  Big red flags, right? Anyways...

It also uses an aggressive caching model for things like:

  • Front end layout
  • Blocks of HTML used to render pages
  • Magento configuration data (because many core application config options are stored in the database)
  • Translation data
  • Full page cache
  • EAV types and attributes

Because we run on multiple front end web servers, we have to have a centralized cache.  We use Memcached for this. For example, if someone alters a product in our admin tool, which purges the cache item, the refreshed cache item needs to be available to all front end web servers at the same time. An inconsistency between our front end servers is not acceptable. Hence, the centralized cache.

When using Memcached, Magento uses Zend_Cache_Backend_TwoLevels under the hood because it has to maintain a mapping of cache type to the cache keys that make up the data for that type of cache. For example, you should be able to purge the layout or block cache without purging the entire set of cached data.  There's no sort of partial cache purge out of the box possible with Memcached.  You either mark all items invalidated, or delete what you want to, one key at a time. In our case, the mapping of cache type to its keys is stored in our MySQL database. That's not ideal, but...it is what it is for now.

But, here's the catch: Magento / Zend_Cache prefixes all cache keys.  If you don't define a cache key, Magento uses one that is formed with:

  1. Absolute path to your app/etc/ directory
  2. MD5 of this path string
  3. First three characters of the MD5 hash value

This makes all the sense in the world, right?  Maybe it does for some, but it doesn't for us.  We run one web site driven by this single Magento instance.

But, we also have a deployment process centered around Capistrano.  What does this boil down to?

  1. Code releases are kept at /foo/bar/site/releases/[YYYYMMDDHHIISS] of when the code was deployed
  2. /foo/bar/site/current is a symlink to /foo/bar/site/releases/[YYYYMMDDHHIISS] of the most recent release

So, as we deploy code, the absolute path to the app/etc/ directory on disk changes!

Thus, when we deploy a new release, we get extremely high cache churn -- the absolute path to the app/etc/ directory has changed because we move that current symlink!  We basically were rewriting all of our cache with every release, clobbering our Memcached instances and database.

We had to take the site down for a minute or two and refresh the cache in a controlled environment without many concurrent requests taking place. Once the cache had been warmed by a single user, all worked well once again.

Why not, say, throw an exception and barf all over the developer? Or provide a reasonable, consistent default value? Don't change it as filesystem changes occur. That's not intuitive at all. If you're running multiple sites and need a separate cache prefix, you're probably advanced enough that you've considered the implications of your approach already anyway.

So, what's the lesson here?  Define your own Magento cache prefix to ensure that it never changes.

Hope you find this helpful! Here are a few more topics that will be coming soon:

  • Deploying Magento to multiple web servers with Capistrano
  • Centralizing your /media directory
  • Don't touch Core!
  • Abstraction is for wimps: query the database yourself