Monthly Archives: September 2007

Zend PHP Conference 2007

Ah, the smell of Fall is in the air…oh, no…wait…that’s the stench of ZendCon approaching. :)

Next week, I’ll be attending and speaking at the Zend PHP Conference in Burlingame, CA, right near San Francisco. I’ll be giving two talks…well, 1.5. First, I’ll be debuting my talk “The Grown-Up Company’s Guide to Development.” This particular talk focuses on increasing the quality of work by way of coding standards, code reviews, using robust development tools, and adopting a framework/toolkit. I’ll be presenting it at Atlanta PHP on October 4th, too.

Next up, my colleague Ben Ramsey and I will be co-presenting “Mobilizing & Sharing: How the Zend Framework Builds Community for Nokia MOSH.” This particular talk is basically a case study on how Schematic used Zend Framework in building Nokia’s new social networking and mobile content sharing site, Nokia MOSH. I’ll be covering some basic architectural information on MOSH during the talk as well.

So if you’re attending and happen to read my blog, be sure to say “hello!” I’ll be around from Sunday evening through Thursday afternoon.

Example: Who’s Online with PHP and Memcached

I figured it best to give an example to back up my last post entitled “Who’s Online with PHP and Memcached.”

First, let’s look at the WhosOnline class itself. This class is meant to be a Singleton, so you have to access it with WhosOnline::getInstance().

Also, DISCLAIMER: I wrote this code in about 20-30 minutes. There may be little odds and ends-type problems with it, but please post comments if you’ve got feedback!

/**
 * Class for accessing Who's Online data via Memcached.
 *
 * @author Brian DeShong
 */
class WhosOnline
{
    const RECORDING_DELAY_SECONDS = 120;
    private static $_instances = array();
    private $_mc;

    /**
     * Protected constructor to force use as a singleton.
     *
     * @param Memcache $mc Memcache object.
     */
    protected function __construct(Memcache $mc)
    {
        $this->_mc = $mc;
    }

    /**
     * Classic Singleton getInstance() method.  Allows for multiple
     * WhosOnline instances, though.  For example, maybe you want to use one
     * Memcached pool for users online in your forums, and another for users
     * online in your online dating application.  Coupling a different
     * Memcache object with a different $uniqueId allows this.
     *
     * @param Memcache $mc Memcache object.
     * @param string $uniqueId Unique ID of the object; optional.
     * @return WhosOnline
     */
    public static function getInstance(Memcache $mc, $uniqueId = 'default')
    {
        if (!isset(self::$_instances[$uniqueId])) {
            self::$_instances[$uniqueId] = new self($mc);
        }

        return self::$_instances[$uniqueId];
    }

    /**
     * Determines if current user's online status needs to be recorded or
     * updated.
     *
     * @return bool
     * @todo This method shouldn't reach out to $_SESSION.
     */
    public function needToRecordOnline()
    {
        return
            !isset($_SESSION['lastOnlineRecorded']) ||
            (isset($_SESSION['lastOnlineRecorded']) &&
             $_SESSION['lastOnlineRecorded'] <
                 time() - self::RECORDING_DELAY_SECONDS);
    }

    /**
     * Records given user ID as being online and records last activity
     * timestamp.
     *
     * @param int $userId User ID.
     * @return bool
     */
    public function recordOnline($userId)
    {
        if (!self::setUserOnline($userId)) {
            return false;
        }

        $_SESSION['lastOnlineRecorded'] = time();
        return true;
    }
    /**
     * Gets array of all users online.  Array is keyed by user ID with activity
     * timestamp as the value.
     *
     * @return array
     */
    public function getUsersOnline()
    {
        $usersOnline = $this->_mc->get('usersOnline');

        return ($usersOnline !== false ? $usersOnline : array());
    }

    /**
     * Sets an array of user IDs with their activity timestamps.
     *
     * @param array $usersOnline Array of user IDs online.
     * @return bool
     */
    public function setUsersOnline(array $usersOnline)
    {
        return
            $this->_mc->set('usersOnline', $usersOnline) &&
            $this->_mc->set('numUsersOnline', count($usersOnline));
    }

    /**
     * Sets given user ID as being online.
     *
     * @param int $userId User ID.
     * @return bool
     */
    protected function setUserOnline($userId)
    {
        $usersOnline = $this->getUsersOnline();
        $usersOnline[$userId] = time();
        return $this->setUsersOnline($usersOnline);
    }
}

Note the primary methods:

  • WhosOnline::getInstance()
  • WhosOnline::needToRecordOnline
  • WhosOnline::recordOnline()
  • WhosOnline::getUsersOnline()
  • WhosOnline::setUsersOnline()

The main reason we leave setUsersOnline() public is so that it can be accessed via a back end script to cleanup the entire array of user IDs online.

Next, our example file using this class:

wol_test.php

// Startup the session and assign a user ID.  Typically you would do this at
// authentication time.
session_start();

if (!isset($_SESSION['user_id'])) {
    $_SESSION['user_id'] = uniqid();
}

// Connect to Memcached and grab the Who's Online object.
$mc = new Memcache();
$mc->connect('localhost', 11211);
$who = WhosOnline::getInstance($mc);

// If user needs to be recorded as online, do so.
if ($who->needToRecordOnline()) {
    $who->recordOnline($_SESSION['user_id']);
}

// Grab users online to display; typically you would never do this on the
// front end, though.
$usersOnline = $who->getUsersOnline();
?>
Your session data:
<pre>
<?php echo print_r($_SESSION, true); ?>
</pre>

Users online: <?php echo count($usersOnline); ?>
<pre>
<?php echo print_r($usersOnline, true); ?>
</pre>

I placed the example wol_test.php file in my DocumentRoot and ran it through ApacheBench a few times, like so:

ab -c 10 -t 1000 http://localhost/wol_test.php

This causes the wol_test.php page to be requested 1,000 times at a level of 10 concurrent requests. I did this a few times and ended up with over 3,000 users in my array of users online. Based on a manual get from Memcached like so:

get usersOnline
VALUE usersOnline 1 126727

…we see that with over 3,000 users online, it only takes up 126,727 bytes in Memcached. Remember, the PECL extension for Memcache serializes any non-scalar values before storing them, so you have a cost associated with the serializing and unserializing of the array. Doing the math here, a 1MB serialized array will hold 30,838 users online. You’ll be able to squeeze more out of it if you have integer user IDs; I’m using uniqid() here just for example purposes.

But is this is a good idea? Retrieving 1MB, or even 127k from Memcached every so often isn’t cheap. Remember, you are:

  1. Retrieving string with serialized array of users online from Memcached
  2. Unserializing the string
  3. Adding user or updating their activity timestamp
  4. Serializing the array again
  5. Storing string back to Memcached

…this isn’t cheap. This is probably going to be more sluggish than you’re willing to acceept, and I doubt it’d scale well as you crept up into thousands of users online. I’m here with over 4,000 users in my array, and it performs well, but it’s also on a page with nothing else — once you tack on database queries and all sorts of other junk to render a page, you may be looking at a page that renders in over .5 seconds.

In a situation like this, you could consider splitting Who’s Online data up into multiple values in Memcached. Basically, you can write your application code to use, say, 10 “buckets” of users online. You would randomly select one of the 10 buckets to add/modify the user. The key in a situation like this is to have a back end process to merge all of the arrays together, iterate over them removing stale users, and evenly distributing them back into Memcached.

I’ve started coding an example of this, but don’t really have the will to finish it right now. :) Maybe later.

Lastly, let’s look at the back end batch process that keeps the array of users online tidy; typically this script would run as a cronjob:

whos_online_cleanup.php

require_once './wol.php';

$now = time();
$mc = new Memcache();
$mc->connect('localhost', 11211);
$who = WhosOnline::getInstance($mc);

$usersOnline = $who->getUsersOnline();

if (empty($usersOnline)) {
    print "no users online; exiting\n\n";
    exit();
}

print "num users online: " . count($usersOnline) . "\n\n";
print "processing users...\n";

$numUsersRemoved = 0;

foreach ($usersOnline as $userId => $timestamp) {
    if ($timestamp < $now - 300) {
        print "removing $userId; last seen " .
            ($now - $timestamp) . " seconds ago\n";
        unset($usersOnline[$userId]);
        $numUsersRemoved++;
    }
}

print "num users removed: $numUsersRemoved\n";
print "current num users online: " . count($usersOnline) . "\n";
print "saving users online...";
print ($who->setUsersOnline($usersOnline) ? 'done!' : '** FAILED **');
exit();

Here’s some example output from it:

brian@henery [/web/pages]$ php ./whos_online_cleanup.php
num users online: 56

processing users...
removing 46f549b786d68; last seen 496 seconds ago
removing 46f549b786e7c; last seen 480 seconds ago
removing 46f549b786ef2; last seen 476 seconds ago
removing 46f549b7871b5; last seen 445 seconds ago
removing 46f549b787213; last seen 480 seconds ago
removing 46f549b78a105; last seen 482 seconds ago
removing 46f549b789071; last seen 389 seconds ago
removing 46f549b7927eb; last seen 467 seconds ago
removing 46f549b79372a; last seen 437 seconds ago
removing 46f549b798931; last seen 487 seconds ago
removing 46f549b79b50b; last seen 423 seconds ago
removing 46f549b79bc0a; last seen 381 seconds ago
removing 46f549b79d000; last seen 379 seconds ago
removing 46f549b79dfa6; last seen 398 seconds ago
removing 46f549b79fb99; last seen 472 seconds ago
removing 46f549b7a3975; last seen 502 seconds ago
removing 46f549b7a8278; last seen 373 seconds ago
removing 46f549b7ab905; last seen 407 seconds ago
removing 46f549b7d635e; last seen 389 seconds ago
removing 46f549b7d63b0; last seen 500 seconds ago
removing 46f549b7d63d8; last seen 453 seconds ago
removing 46f549b7da797; last seen 309 seconds ago
removing 46f549b7dbb9a; last seen 491 seconds ago
removing 46f549b7dce8a; last seen 396 seconds ago
removing 46f549b7dddb0; last seen 361 seconds ago
removing 46f549b7e3da6; last seen 353 seconds ago
num users removed: 26
current num users online: 30
saving users online...done!

So, you can just cron this like so:

* * * * * /usr/local/bin/php /some/path/to/whos_online_cleanup.php > /dev/null 2>&1

…and feel free to redirect STDOUT to a log file if you’d like.

Some quick stats. With over 1,000 users in the array, running the cleanup script takes under .2 seconds:

brian@henery [/web/pages]$ time php ./whos_online_cleanup.php
num users online: 1221

...[snip]...

num users removed: 470
current num users online: 751
saving users online...done!
real    0m0.195s
user    0m0.040s
sys     0m0.040s

…so it’s pretty speedy. It’s worth noting that all of this is being done on my Mac Mini Core Solo with 2 GB RAM running PHP 5.2.4 and Apache 2.2.x on OS X 10.4.10. Oh, and with just over 5,000 users in the array, the script runs in .58 seconds.

So…pretty straightforward, right? What do you think? Surely there’s room for improvement…

Who’s Online with PHP and Memcached

Whenever you Google around for things like “Who’s Online php”, you’ll find that a lot of the solutions are centered around using a database. However, is this really necessary? For a site with, say, 50,000 concurrent users making, say, one page request every eight seconds, this could be a lot of database traffic if you’re recording the user’s activity on every request.

One goal here: get Who’s Online functionality off of the database. We’ll explore a possible solution with Memcached that I’ve personally implemented, and thus far, it’s been working great.

The first thing to consider: how real-time does something like “Who’s Online” need to be? Is having it be accurate to, say, users that have been online within the last two minutes acceptable? Next, do we really need to know when a user made any action on the site, or can we consider them online every so often? Not recording activity on each page request significantly reduces the amount of recording going on.

Next, we have to keep in mind that Memcached values can be up to one megabyte in size. If we’re going to have hundreds of thousands of users online, it’s possible to exceed the 1M limit. Let’s ignore this for now; we’ll address it later. Let’s assume that your site is relatively small and won’t have more than a few hundred or thousand users online at any given time.

For this type of scenario, you can store a single array in Memcache. A decent structure is like so:

array(
    '12345' => [unix timestamp],
    '12346' => [unix timestamp],
    '[user id]' => [unix timestamp],
    ...,
    ...);

Your user ID value can be whatever you’d like, as long as it’s unique. For example, your most common IDs will (should!) be numeric, but a unique username or GUID-based ID will work fine.

Next, you store the timestamp of the user’s last activity. A key point here is…how accurate does this data need to be? Is it sufficient to know who was online in, say, the last two minutes? If so, let’s define “being online:”

Online: an authenticated user who viewed a page on the website within a given period of time.

In our case, let’s say that the period of time is two minutes. You can code your application as follows:

If user is logged in

  1. Was user’s online state recorded within the last 2 minutes (a timestamp for this can be recorded in a session or cookie value)?
    • YES: do nothing
    • NO:
      1. Retrieve array of online user data
      2. Update timestamp of user ID’s last activity
      3. Save array of online user data back to Memcache
      4. Store online recording time (the current timestamp) to user session or cookie

Now, you need a backend process (or some sort of process) to clean up this array of online user data. For example, users that have not performed an activity in the past five minutes should be removed from this array. If a user has not been back within a given period of time, they should no longer be considered as online. This process could run out of cron, say, once a minute or every few minutes.

Your process for this back end script would be like so:

  1. Retrieve array of online user IDs
  2. Iterate over all user IDs, checking their last activity timestamp
    1. If user’s last activity is more than X seconds old (say, 5 minutes), remove them from the array
    2. If user’s last activitiy is within the past X seconds, they can remain in the array
  3. Store array of user IDs back to Memcached
  4. For convenience, you may also consider storing the number of users in the array in a separate Memcached value; this makes displaying a Who’s Online counter nice and cheap

I’ve implemented this exact process on a website of a decent size, and it’s been working great for a few months now. In our case, we’ve seen peaks of up to 140 users online at a time.

Maybe there are some holes in it, though? I’m not an expert on all of the inner-workings of Memcached, so maybe there’s some sort of a race condition here.

This is, however, a simple way to implemented Who’s Online functionality without taxing a database. Remember, just because a Memcached value can be 1MB, doesn’t mean that it should be. If you find that the size of your array is large enough that the retrieval from Memcached takes a while, consider splitting it up over a few cache keys to keep the retrievals cheap. Pulling 1MB down the wire every so often ain’t cheap!

UPDATE: See a follow-up post with example code here!

php|works in Atlanta this week!

For the like…two of you that read my blog, this year’s php|works conference is here in Atlanta, and it’s this week!

I won’t be speaking, but I’ll be attending Thursday and Friday in support of my co-workers Ben Ramsey and Maggie Nelson, both of whom are speaking at the conference. I’ll also be attending the various social events and after-hours things, so I hope to see some of you there!

In ZendCon news, I’m wrapping up the details of my presentation, “The Grown-Up Company’s Guide to Development.” I’ve got 70 slides, so I’ve got to ditch 20 or 30 in order to not bore you all to tears. ;) In any case, see you two readers around!

Some credit from the world’s top gamer!

Noticed this tonight:

Click here for article

My old high school friend and neighbor, Johnathan “Fatal1ty” Wendel mentions how my brothers and I got him into FPS gaming back in the day (you can just search the page for “Brian”). Ah, we used to play Doom 2 deathmatch over the Novell IPX network I had setup in the basement. That was so long ago!

Those were the days. Wendel, if you’re reading this, keep it up!