Example: Who's Online with PHP and Memcached

I figured it best to give an example to back up my last post entitled "Who's Online with PHP and Memcached." First, let's look at the WhosOnline class itself. This class is meant to be a Singleton, so you have to access it with WhosOnline::getInstance().

Also, DISCLAIMER: I wrote this code in about 20-30 minutes. There may be little odds and ends-type problems with it, but please post comments if you've got feedback!

/**
 * Class for accessing Who's Online data via Memcached.
 *
 * @author Brian DeShong
 */
class WhosOnline
{
    const RECORDING_DELAY_SECONDS = 120;
    private static $_instances = array();
    private $_mc;

    /**
     * Protected constructor to force use as a singleton.
     *
     * @param Memcache $mc Memcache object.
     */
    protected function __construct(Memcache $mc)
    {
        $this->_mc = $mc;
    }

    /**
     * Classic Singleton getInstance() method.  Allows for multiple
     * WhosOnline instances, though.  For example, maybe you want to use one
     * Memcached pool for users online in your forums, and another for users
     * online in your online dating application.  Coupling a different
     * Memcache object with a different $uniqueId allows this.
     *
     * @param Memcache $mc Memcache object.
     * @param string $uniqueId Unique ID of the object; optional.
     * @return WhosOnline
     */
    public static function getInstance(Memcache $mc, $uniqueId = 'default')
    {
        if (!isset(self::$_instances[$uniqueId])) {
            self::$_instances[$uniqueId] = new self($mc);
        }

        return self::$_instances[$uniqueId];
    }

    /**
     * Determines if current user's online status needs to be recorded or
     * updated.
     *
     * @return bool
     * @todo This method shouldn't reach out to $_SESSION.
     */
    public function needToRecordOnline()
    {
        return
            !isset($_SESSION['lastOnlineRecorded']) ||
            (isset($_SESSION['lastOnlineRecorded']) &&
             $_SESSION['lastOnlineRecorded'] <
                 time() - self::RECORDING_DELAY_SECONDS);
    }

    /**
     * Records given user ID as being online and records last activity
     * timestamp.
     *
     * @param int $userId User ID.
     * @return bool
     */
    public function recordOnline($userId)
    {
        if (!self::setUserOnline($userId)) {
            return false;
        }

        $_SESSION['lastOnlineRecorded'] = time();
        return true;
    }
    /**
     * Gets array of all users online.  Array is keyed by user ID with activity
     * timestamp as the value.
     *
     * @return array
     */
    public function getUsersOnline()
    {
        $usersOnline = $this->_mc->get('usersOnline');

        return ($usersOnline !== false ? $usersOnline : array());
    }

    /**
     * Sets an array of user IDs with their activity timestamps.
     *
     * @param array $usersOnline Array of user IDs online.
     * @return bool
     */
    public function setUsersOnline(array $usersOnline)
    {
        return
            $this->_mc->set('usersOnline', $usersOnline) &&
            $this->_mc->set('numUsersOnline', count($usersOnline));
    }

    /**
     * Sets given user ID as being online.
     *
     * @param int $userId User ID.
     * @return bool
     */
    protected function setUserOnline($userId)
    {
        $usersOnline = $this->getUsersOnline();
        $usersOnline[$userId] = time();
        return $this->setUsersOnline($usersOnline);
    }
}

Note the primary methods:

  • WhosOnline::getInstance()
  • WhosOnline::needToRecordOnline
  • WhosOnline::recordOnline()
  • WhosOnline::getUsersOnline()
  • WhosOnline::setUsersOnline()

The main reason we leave setUsersOnline() public is so that it can be accessed via a back end script to cleanup the entire array of user IDs online.

Next, our example file using this class:

wol_test.php

// Startup the session and assign a user ID.  Typically you would do this at
// authentication time.
session_start();

if (!isset($_SESSION['user_id'])) {
    $_SESSION['user_id'] = uniqid();
}

// Connect to Memcached and grab the Who's Online object.
$mc = new Memcache();
$mc->connect('localhost', 11211);
$who = WhosOnline::getInstance($mc);

// If user needs to be recorded as online, do so.
if ($who->needToRecordOnline()) {
    $who->recordOnline($_SESSION['user_id']);
}

// Grab users online to display; typically you would never do this on the
// front end, though.
$usersOnline = $who->getUsersOnline();
?>
Your session data:
<pre>
<?php echo print_r($_SESSION, true); ?>
</pre>

Users online: <?php echo count($usersOnline); ?>
<pre>
<?php echo print_r($usersOnline, true); ?>
</pre>

I placed the example wol_test.php file in my DocumentRoot and ran it through ApacheBench a few times, like so:

ab -c 10 -t 1000 http://localhost/wol_test.php

This causes the wol_test.php page to be requested 1,000 times at a level of 10 concurrent requests. I did this a few times and ended up with over 3,000 users in my array of users online. Based on a manual get from Memcached like so:

get usersOnline
VALUE usersOnline 1 126727

...we see that with over 3,000 users online, it only takes up 126,727 bytes in Memcached. Remember, the PECL extension for Memcache serializes any non-scalar values before storing them, so you have a cost associated with the serializing and unserializing of the array. Doing the math here, a 1MB serialized array will hold 30,838 users online. You'll be able to squeeze more out of it if you have integer user IDs; I'm using uniqid() here just for example purposes.

But is this is a good idea? Retrieving 1MB, or even 127k from Memcached every so often isn't cheap. Remember, you are:

  1. Retrieving string with serialized array of users online from Memcached
  2. Unserializing the string
  3. Adding user or updating their activity timestamp
  4. Serializing the array again
  5. Storing string back to Memcached

...this isn't cheap. This is probably going to be more sluggish than you're willing to acceept, and I doubt it'd scale well as you crept up into thousands of users online. I'm here with over 4,000 users in my array, and it performs well, but it's also on a page with nothing else -- once you tack on database queries and all sorts of other junk to render a page, you may be looking at a page that renders in over .5 seconds.

In a situation like this, you could consider splitting Who's Online data up into multiple values in Memcached. Basically, you can write your application code to use, say, 10 "buckets" of users online. You would randomly select one of the 10 buckets to add/modify the user. The key in a situation like this is to have a back end process to merge all of the arrays together, iterate over them removing stale users, and evenly distributing them back into Memcached.

I've started coding an example of this, but don't really have the will to finish it right now. :) Maybe later.

Lastly, let's look at the back end batch process that keeps the array of users online tidy; typically this script would run as a cronjob:

whos_online_cleanup.php

require_once './wol.php';

$now = time();
$mc = new Memcache();
$mc->connect('localhost', 11211);
$who = WhosOnline::getInstance($mc);

$usersOnline = $who->getUsersOnline();

if (empty($usersOnline)) {
    print "no users online; exiting\n\n";
    exit();
}

print "num users online: " . count($usersOnline) . "\n\n";
print "processing users...\n";

$numUsersRemoved = 0;

foreach ($usersOnline as $userId => $timestamp) {
    if ($timestamp < $now - 300) {
        print "removing $userId; last seen " .
            ($now - $timestamp) . " seconds ago\n";
        unset($usersOnline[$userId]);
        $numUsersRemoved++;
    }
}

print "num users removed: $numUsersRemoved\n";
print "current num users online: " . count($usersOnline) . "\n";
print "saving users online...";
print ($who->setUsersOnline($usersOnline) ? 'done!' : '** FAILED **');
exit();

Here's some example output from it:

brian@henery [/web/pages]$ php ./whos_online_cleanup.php
num users online: 56

processing users...
removing 46f549b786d68; last seen 496 seconds ago
removing 46f549b786e7c; last seen 480 seconds ago
removing 46f549b786ef2; last seen 476 seconds ago
removing 46f549b7871b5; last seen 445 seconds ago
removing 46f549b787213; last seen 480 seconds ago
removing 46f549b78a105; last seen 482 seconds ago
removing 46f549b789071; last seen 389 seconds ago
removing 46f549b7927eb; last seen 467 seconds ago
removing 46f549b79372a; last seen 437 seconds ago
removing 46f549b798931; last seen 487 seconds ago
removing 46f549b79b50b; last seen 423 seconds ago
removing 46f549b79bc0a; last seen 381 seconds ago
removing 46f549b79d000; last seen 379 seconds ago
removing 46f549b79dfa6; last seen 398 seconds ago
removing 46f549b79fb99; last seen 472 seconds ago
removing 46f549b7a3975; last seen 502 seconds ago
removing 46f549b7a8278; last seen 373 seconds ago
removing 46f549b7ab905; last seen 407 seconds ago
removing 46f549b7d635e; last seen 389 seconds ago
removing 46f549b7d63b0; last seen 500 seconds ago
removing 46f549b7d63d8; last seen 453 seconds ago
removing 46f549b7da797; last seen 309 seconds ago
removing 46f549b7dbb9a; last seen 491 seconds ago
removing 46f549b7dce8a; last seen 396 seconds ago
removing 46f549b7dddb0; last seen 361 seconds ago
removing 46f549b7e3da6; last seen 353 seconds ago
num users removed: 26
current num users online: 30
saving users online...done!

So, you can just cron this like so:

* * * * * /usr/local/bin/php /some/path/to/whos_online_cleanup.php > /dev/null 2>&1

...and feel free to redirect STDOUT to a log file if you'd like.

Some quick stats. With over 1,000 users in the array, running the cleanup script takes under .2 seconds:

brian@henery [/web/pages]$ time php ./whos_online_cleanup.php
num users online: 1221

...[snip]...

num users removed: 470
current num users online: 751
saving users online...done!
real    0m0.195s
user    0m0.040s
sys     0m0.040s

...so it's pretty speedy. It's worth noting that all of this is being done on my Mac Mini Core Solo with 2 GB RAM running PHP 5.2.4 and Apache 2.2.x on OS X 10.4.10. Oh, and with just over 5,000 users in the array, the script runs in .58 seconds.

So...pretty straightforward, right? What do you think? Surely there's room for improvement...