Who's Online with PHP and Memcached

Whenever you Google around for things like "Who's Online php", you'll find that a lot of the solutions are centered around using a database. However, is this really necessary? For a site with, say, 50,000 concurrent users making, say, one page request every eight seconds, this could be a lot of database traffic if you're recording the user's activity on every request. One goal here: get Who's Online functionality off of the database. We'll explore a possible solution with Memcached that I've personally implemented, and thus far, it's been working great.

The first thing to consider: how real-time does something like "Who's Online" need to be? Is having it be accurate to, say, users that have been online within the last two minutes acceptable? Next, do we really need to know when a user made any action on the site, or can we consider them online every so often? Not recording activity on each page request significantly reduces the amount of recording going on.

Next, we have to keep in mind that Memcached values can be up to one megabyte in size. If we're going to have hundreds of thousands of users online, it's possible to exceed the 1M limit. Let's ignore this for now; we'll address it later. Let's assume that your site is relatively small and won't have more than a few hundred or thousand users online at any given time.

For this type of scenario, you can store a single array in Memcache. A decent structure is like so:

    '12345' => [unix timestamp],
    '12346' => [unix timestamp],
    '[user id]' => [unix timestamp],

Your user ID value can be whatever you'd like, as long as it's unique. For example, your most common IDs will (should!) be numeric, but a unique username or GUID-based ID will work fine.

Next, you store the timestamp of the user's last activity. A key point here is...how accurate does this data need to be? Is it sufficient to know who was online in, say, the last two minutes? If so, let's define "being online:"

Online: an authenticated user who viewed a page on the website within a given period of time.

In our case, let's say that the period of time is two minutes. You can code your application as follows:

If user is logged in

  1. Was user's online state recorded within the last 2 minutes (a timestamp for this can be recorded in a session or cookie value)?
    • YES: do nothing
    • NO:
      1. Retrieve array of online user data
      2. Update timestamp of user ID's last activity
      3. Save array of online user data back to Memcache
      4. Store online recording time (the current timestamp) to user session or cookie

Now, you need a backend process (or some sort of process) to clean up this array of online user data. For example, users that have not performed an activity in the past five minutes should be removed from this array. If a user has not been back within a given period of time, they should no longer be considered as online. This process could run out of cron, say, once a minute or every few minutes.

Your process for this back end script would be like so:

  1. Retrieve array of online user IDs
  2. Iterate over all user IDs, checking their last activity timestamp
    1. If user's last activity is more than X seconds old (say, 5 minutes), remove them from the array
    2. If user's last activitiy is within the past X seconds, they can remain in the array
  3. Store array of user IDs back to Memcached
  4. For convenience, you may also consider storing the number of users in the array in a separate Memcached value; this makes displaying a Who's Online counter nice and cheap

I've implemented this exact process on a website of a decent size, and it's been working great for a few months now. In our case, we've seen peaks of up to 140 users online at a time.

Maybe there are some holes in it, though? I'm not an expert on all of the inner-workings of Memcached, so maybe there's some sort of a race condition here.

This is, however, a simple way to implemented Who's Online functionality without taxing a database. Remember, just because a Memcached value can be 1MB, doesn't mean that it should be. If you find that the size of your array is large enough that the retrieval from Memcached takes a while, consider splitting it up over a few cache keys to keep the retrievals cheap. Pulling 1MB down the wire every so often ain't cheap!

UPDATE: See a follow-up post with example code here!