Top Ten List + CoderFaire Atlanta 2013

Back in March, I gave a new talk at Atlanta PHP: "Top Ten List: PHP and Web Application Performance". This talk is a culmination of my ~14 years of experience primarily as a web application developer, but also as a systems administrator / DevOps-type.  After working with PHP and web applications for so many years, I have amassed quite a few tricks for squeezing maximum performance out of web applications, PHP or otherwise. I'll be presenting it again at CoderFaire Atlanta on April 20, 2013.  CoderFaire is organized by a fantastic crew of Cal Evans, Kathy Evans, Chris Spruck, Kevin Roberts, and Jacques Woodcock, so it's going to be a great event. I've never attended a CoderFaire event before, but I've only heard positive things. Because it's not limited to a single technology platform, you're sure to meet a wide array of technical minds from all different backgrounds. I'm sure we'll all walk away with some fresh, new ideas from this diverse crowd.

At only $50 per ticket, you're not going to find a better deal on a technical conference this year. Register now!

As a little teaser, here are each of the 10 guests introducing the topics.  Come on out for the juicy details. Be prepared to go home, sit down, and optimize some aspects of your web application, though!  See you there.

10. Elizabeth Naramore, GitHub: Tweak your realpath cache settings


9. Scott Rocher, Tonx Coffee:  Whenever possible, use offline processing

8. Matthew Turland, Synacor: Write efficient SQL queries

7. Scott Lively, 3SI Security Systems: Don't execute queries in loops

6. Jed Lau & Maggie Nelson, Findery: Know what your application is doing

5. Robert Swarthout, ShootProof: Use gzip compression on responses

4. Ian Myers, Findery: Do not use .htaccess files

3. Ken Macke, RockIP Networks: Cache all the data that you can

2. Davey Shafik, EngineYard: Use a content delivery network

1. Ben Ramsey, Moontoast: Use APC and set apc.stat = 0

On to CrowdTwist

TL;DR: I'm making a move to CrowdTwist, a New York City-based startup providing social and loyalty services for some of the world's biggest brands. I worked with their co-founder and CTO, Mike Montero, from 2001-2005 at Community Connect Inc., where I got my start as a developer in PHP and open source tools. I'm thrilled to be joining him and the CrowdTwist team on what's sure to be an incredible adventure.

Back in 2001, I joined Community Connect Inc. (now Interactive One) as a Senior Network Support Specialist. I was an internal sysadmin, spending my time managing Linux-based file servers, development servers and things of that nature. I had been living in New York just a few short months.

CCI operated what were, at the time, some of the most highly-trafficked social networking sites on the Internet. This was before MySpace and Facebook, of course. BlackPlanet.com was one of the mostly highly-trafficked PHP-based sites on the Internet. We were doing an insane amount of traffic. Our applications had to perform well and scale. We had no choice.

Even as a sysadmin-type, I was surrounded by some incredibly talented developers, who were all working with PHP on Linux and Apache using Oracle. There were caching, using CDNs, and doing things that were still relatively new on the Web. I caught the development bug. I started writing code in my spare time, taking on little projects on the side. How could you not get totally infected in an intense, exciting environment such as this?

In early 2002, I received a call out of the blue from my boss (also our CTO), Mike Montero. On that call, he asked me if I'd be interested in moving to the CCI development team. There was only one way to answer: "YES!"

Thus, in early 2002, just about 10 years ago, I became a developer. The most lowly of the low -- "Associate Software Developer." Over the next three and a half years, I worked my way up to Technical Lead, learning a ridiculous amount from my colleagues. I had amassed this strong set of experience in systems administration and development. I was really growing my skills, and I loved every second of it. I worked many late nights and weekends...and it was an absolute thrill.

Our work was literally being seen by millions of users every day. We were building quality products, all on a home-grown internal framework of sorts. We were doing code reviews. We were writing unit tests. This was how software was built. I never knew any lifestyle but this -- it was my first development gig! This was just the way things were done. This period of time really shaped my personal stance on how to build quality software that was both performant and scalable. I consider myself so very fortunate to have started with this level of experience. It's what gave me such a strong base of experience as a software developer.

In mid-2005, I moved on from CCI and spent almost five years in an interactive agency, Schematic (now Possible Worldwide). During this time, I gained exposure to new, different technologies like Zend Framework and Memcached. This was my first foray into leading major technical projects for clients, but still rolling up my sleeves, diving into architecture and code. I was using my skills from CCI with PHP, sysadmin duties, and databases, and applying them to client work time and time again. I was working in a world that was very different from what I had known at CCI, but bringing so much of that experience forward with me. In mid-2007, we moved to Atlanta, where I stayed with Schematic.

After almost five years at Schematic, I moved on to Yahoo! for a little over a year. This allowed me to get back to my development roots, focusing solely on code and architecture. I had a great time.

In mid-2011, I made a move to Half Off Depot to build an internal development team and grow the technical side of the company as Lead Software Architect. Here, I've was using all of my skills: systems administration, PHP, MySQL administration, managerial duties, recruiting, and working with other departments, such as marketing and design.

Over the past eight months, I've made a huge impact at Half Off Depot in terms of stabilizing the application and its Production environment. I've branched out into using Git and GitHub, Capistrano and Amazon Web Services. I've also had the opportunity to continue sharpening my Objective-C and iOS development skills. Overall, Half Off Depot has challenged me, and I've enjoyed it. I've reaffirmed to myself that I've got a breadth and depth of skills, and that I'm still pretty sharp with all of them. It's also reminded me how much I enjoy a startup environment.

But about a month ago, Mike Montero came calling again -- this time, with an opportunity for me to join CrowdTwist, a New York City-based startup where he's a co-founder and CTO. CrowdTwist is an emerging, unique player in the loyalty space. Think "platform as a service." APIs, user-facing sites, large amounts of data. And an incredible team that's tapping into this data to provide real value for their clients.

When someone you trust and respect comes calling and seeks you out, you listen and explore. And that's exactly what I did. And let me tell you, the CrowdTwist team is INCREDIBLE. I could not be more excited for this career change, both for the opportunity to work with Mike once again, but also to work with all of the brilliant team members and their clients.

I'm in a unique position where I had almost five years of CCI-level experience, coupled with seven years of experience since then. Now I'm going back to work with Mike and the CrowdTwist team, where I'll be able to bring my strong foundation from CCI, along with all that I've learned in the years after CCI. My career has come full circle with respect to the last decade.

I typically like to make a job change, then stay there for at least four years as I did with CCI and Schematic. However, this opportunity with CrowdTwist is so rare that I had to take it. To be with this caliber of talent in such a promising space where they're truly a pioneer? You just don't say "no" to that. Or if you do, you regret it in a few years when they've been wildly successful.

So, on March 12th, I'm joining CrowdTwist full-time. I'll be working remotely from Atlanta, but traveling up to New York City from time to time. I'll be focusing on a mix of back end development and architecture, systems administration, and helping the team continue building quality software.

To my Half Off Depot colleagues, it's been incredible! We've done some great things together. I wish you all the best of luck. Also, this has easily been one of the best team of technologists I've ever worked with. Thanks, guys.

To my future CrowdTwist colleagues, thanks for welcoming me! I'm so thrilled at the opportunity to join you. This is going to be an incredible ride. I'm ready to rock.

See you soon, CrowdTwist! And if you've read this far, thanks. :)

“Rickroll To Go…” ZendCon Session audio posted!

My ZendCon 2008 talk, “Rickroll To Go With WURFL, PHP, and Other Open Source Tools”, was just released at Zend DevZone as ZendCon Sessions episode #23! If you're just now finding my blog from there, welcome! And thanks to Eli White, Community Relations Manager for Zend, for selecting it for posting.

You can get all of the relevant info using the links below:

Slides and videos of the presentation materials ZendCon Sessions page with audio MP3 audio of the presentation iTunes DevZone podcast

Enjoy, and thanks for listening! Find me on Twitter or email me if you'd like to discuss the materials.

MySQL replication and the sync_binlog option

Recently I've been focusing on MySQL replication for a project at work. On this particular project, I'm acting in a Solutions Architect role and have been since about September of 2008. Because of my background in systems administration, I tend to get myself into situations where I become the Schematic-side sys admin on projects. This involves things like deployment processes, getting development, staging, and production environments setup, and now, setting up MySQL replication. This is probably because a) I'm probably bad at delegating these things to others, b) I'm kinda' good at it, and c) let's be honest, I'm a control freak, so I like knowing the servers hosting my apps are setup in a meticulous manner.

In short, we're running MySQL 5.0.45 on RedHat Enterprise Linux 5 (I know, I know...RedHat and MySQL 5.0...boo...but it's okay). We're required to replicate our production database to a secondary machine for backup purposes. This way, if our production server dies, we can manually failover to the slave (once we enable writes to it, of course), then swap the two back once the production server is back up.

All in all, this site is rather low-traffic at around 20,000 dynamic page views per day. Factoring in US-based users in an 11 hour time period, that's about 1,800 request per hour, or right around 0.5 page views per second. We've got a single server in production that's acting as our webserver and database server; it's got a RAID array in it that was all setup by the group hosting the application (so I don't know tons about it, but it's new, quality hardware).

I'm using my master database for reads and writes in Production. Again, the slave is really only required for live backup-type purposes.

Now, I'm no expert on MySQL replication, but I've learned a lot these past few weeks. So I'm going to share one big caveat here. Please correct me as you see fit!

MySQL's got a sync_binlog configuration option. You typically set it in my.cnf, and its value is an integer from 0-n. This value determines how many binary log writes need to occur before its contents are flushed out of the buffer and onto disk. With it set to zero, your operating system just determines when the buffer is flushed to disk.

I have a database migration process that copies table structures and their data from PostgreSQL into MySQL, then basically migrates that data into the appropriate tables in the new MySQL instance. It involves the transformation of a lot of data. It's a sizable, complete data set for 8+ year old system that's not the prettiest, best normalized data model in the world.

Per recommendations in High Performance MySQL, I had my sync_binlog value set to 1 in Production.

When I was performing a test migration to Production recently, the process took about 3 hours. Wow. Thanks, MySQL! It normally takes about 90 minutes in Staging, if that.

In digging around Google and MySQL.com, I found that a non-zero value for sync_binlog causes more disk seeks to flush the binary log to disk. The benefit of having it set to 1 is so that every transaction can be written to the binary log, which is then flushed to disk upon commit. Then, if your server happens to die, the last completed transaction will always be present in the binary log on disk, so you never have to worry about, say, missing a transaction replay on your slaves. However, this results in a lot more disk activity on your master.

I set sync_binlog to 0 and re-ran my migration. It ran in 90 minutes -- that's a 50% performance gain! Now, if you do the math, this makes sense. It's one less disk seek and write per-transaction, so this result totally makes sense. Hooray for numbers, right?

I'm willing to gamble the integrity of data on my slave for the 50% performance increase. (remind me of this post in 6 months when I'm kicking myself over this for some reason, okay?)

With no binary logging enabled (i.e. in our dev environment), this process takes about 20 minutes. This makes sense -- far less disk writes during the process.

Another way to workaround this would be to keep your binary logs on a physically separate disk. However, I don't have that luxury at this point, so that's not an option for me. If I had my druthers, this is how I'd handle the problem, but...no dice for now.

Anyways, my main point: if you are willing to gamble with every single transaction being replicated to your slave in the event of a crash, perhaps you can set sync_binlog to 0. If you've got a separate disk to devote to your binary log, by all means, set it to 1! There are other concerns around this are related to battery-backed disk cache, which you can read a bit more about in Jeremy Cole's post on MySQL replication. You can also see some handy benchmarks that compare MySQL with and without binary logging.

Finally, I'll admit this is a bit of a knee-jerk reaction post. I've done a bunch of research on this, but it's not all quite fleshed out in my mind yet. I get the whole cause and effect in theory, but I haven't dug into MySQL source or other materials to really understand what's going on behind the scenes.

MySQL replication is a tricky thing. It's great when it works, but understand that there are overhead tradeoffs in using it! I'm sure I'll learn more in the weeks and months following our launch, so I look forward to sharing more of my successes and/or pains on this. Comments, feedback, and flames such as "OMG, you're so wrong Brian!" and "Brian is a n00b!" are welcome.

Slides: ZendCon 2008, “Rickroll To Go…”

It's the first day of ZendCon 2008! I'm giving my new talk, "Rickroll To Go With WURFL, PHP, and Other Open Source Tools" today at 4:00 PM PST. The slides are below in a variety of formats:

PDF (no transitions) PDF (one transition per page) Quicktime movie

If you're at ZendCon and reading this, be sure to drop on by at 4:00 PM -- it's sure to be a ball. Enjoy!

Speaking at Atlanta PHP, 3/6/2008!

For you Atlantans that may read my blog, I'll be speaking at this week's Atlanta PHP meetup. Specifically, I'll be presenting on "Robust Batch Processing with PHP," which I'm also slated to present at php|tek 2008 in May.

Also, special thanks to all Atlanta PHP attendees for allowing me to use them as my guinea pigs before taking new talks out to conferences. You regulars know that I've done this with other talks in the past. You're a very gracious, brave bunch, so thanks, everyone! :)

See you all there!

Robust Batch Processing with PHP (part 1/2)

I submitted a proposal for php|tek 2008 entitled "Robust Batch Processing with PHP." Granted, the schedule has not been posted yet, so I don't know if my talk has even been accepted, but I wanted to formulate some thoughts around the topic for a long-overdue blog post. So, first things first: what is batch processing?

Let's look at some Wikipedia definitions on the topic:

"Batch processing is execution of a series of programs ("jobs") on a computer without human interaction."

...and...

"Batch jobs are set up so they can be run to completion without human interaction, so all input data is preselected through scripts or commandline parameters. This is in contrast to interactive programs which prompt the user for such input."

So, no human interaction, which means they're generally running out of a scheduler, such as crond.

In this post, I'm going to talk about batch processing in relation to web-based applications. Given this, what are some examples of common batch processing used in web-based applications? Here are a few:

  • Sending of emails
  • Video transcoding
  • Generating image thumbnails
  • Communication with third-party services
  • Processing post-authorization of credit/debit card and online check transactions

...just to name a few. Typically, tasks such as these would be done with PHP scripts run from the command line, either interactively or from a scheduler such as cron or at. In some cases, you may even go so far as to set aside a dedicated machine (or machines) to perform these operations (this is my preferred method).

"Why would I want to do any of these things in batch?" you ask? Here are some reasons:

  • To keep front end webservers doing what they do best: serving requests!
  • To allow for graceful handling of failure in the form of retrying the operation against a third-party vendor. For example, if your credit card processing vendor is down for maintenance, but you want to post-authorize credit card payments, you want to wait a little bit and try again. This is easiest done in a batch process. The alternative would be to, say, charge the user during their actual HTTP request and retry over and over until the post-authorization request completed. This is a lousy user experience.
  • Sending emails from front end webservers is just silly and wasteful. Why make SMTP connections from these webservers? Send emails in batch on the back end so you can handle failure cases, hard and soft bounces of messages and so on.

Batch processing is tricky because it's non-interactive. Jobs such as the examples above will run at least once a day, and more often than not, they'll run every few minutes or hours. Maybe you only post-authorize credit cards every six hours, but you would definitely transcode videos around or send emails all throughout a day in order to keep your site "living."

What are some of the challenges with batch processing?

  • Developers need to be made aware of problems
  • Processing needs to be retried if any sort of failure occurred
  • Detailed logs of job executions must be kept so developers can investigate failures and successes; you should leave a full audit trail to anyone can track down the lifecycle of processing
  • These batch jobs should be easy for developers to develop. Imagine duplicating logging code across all of your batch processes -- you don't want to repeat yourself!

Point here is that if any of your processes is failing, your developers should be made aware of it immediately, or at least sometime shortly after the failure. How do we handle these requirements?

Define error levels

First, what are the different types of errors that we have? Well, in my experience, they're similar to the Syslog priority levels. These are made available in PHP for use with trigger_error() using some pre-defined constants.

Out of these pre-defined constants, you have these main levels:

  • debug
  • info
  • notice
  • warning
  • error
  • fatal

What do we do with errors of these levels? Let's say that developers should only be emailed for anything warning or above. Anything else should just be written to the log.

Making developers aware of failures

When you think of the best way to notify developers of problems during processing, what comes to mind first? ... ... ... what was that? Email? Yes, email. So, those warning-level error messages just just spoke about...all of those should be emailed to the developer at the completion of the process.

Now, it's not the only option, but it may be the most obvious. If your transaction to post-authorize a credit card fails, your developers should be made aware of it right away so someone can contact the vendor, or, say, identify firewall issues in your environment. Similarly, if your video transcoding server(s) is/are down, videos can't be transcoded -- someone needs to be made aware of that! Let's email them.

Let's take this rough example code:

$config = array('foo' => 'bar', 'baz' => 'bop'); // Config options
$batch = Batch::getInstance($config);
$vendor = Some_Billing_Processor::factory();
$accounts = Foo::getAccountsForPostAuth();

foreach ($accounts as $account) {
    $accountId = $account->getId();
    $amountToBill = Foo::getPreAuthorizationAmount($account);

    try {
        if ($vendor->postAuthorize($account)) {
            $batch->info("[$accountId] billed $amountToBill");
        } else {
            $batch->warning("[$accountId] failed to bill $amountToBill");
            // Record failure so processing can be retried
        }
    } catch (Vendor_Exception $e) {
        $batch->error(
            "[$accountId] caught exception during vendor communication");
        // Record failure so processing can be retried
    }
}

In this case, we've raised a warning for the failed post-auth, and we raised an error if an exception was caught (i.e. inability to connect to the vendor's service). The info() call won't result in an email to the developer, though, but that message will be logged. Bottom line...if we can't take the customer's money, a developer needs to address the situation soon.

In the cases of the warning and error, these log entries will be emailed to the error email recipient(s) upon completion of the script.

Another alternative here would be to never email warnings when they occur, but write a separate script that parses logs for warnings, rolls them up into one message body, and emails the developers every few hours. This keeps the email traffic down, and ultimately keeps your developers from thinking that their back end scripts "cry wolf" by being too chatty.

What about those logs you speak of?

Everyone's been in a situation where they have a single directory full of log files. These can be either foo.log, foo.log.1, or even foo.log.20071114 ...or any number of naming conventions. Even worse is a single log file for a process that just grows and grows. Log rotation is an easy fix for these scenarios.

Personally, I feel that this is bad practice. I tend to prefer date-based directory names for storing log files. In my opinion, planning for this from the start of your project is far better than having to react in a knee-jerk fashion later on once you've filled a directory or reached some sort of maximum file size limit on your filesystem. Consider this directory and the files in it:

/var/log/cc_auth
    pre_auth.log.20071114
    post_auth.log.20071114
    pre_auth.log.20071115
    post_auth.log.20071115
    pre_auth.log.20071116
    post_auth.log.20071116
    pre_auth.log.20071117
    post_auth.log.20071117
    pre_auth.log.20071118
    post_auth.log.20071118

Messy, right? In this example, you end up with a few downsides:

  • A lot of files in each directory
  • Potential to hit Unix max files per directory limit (on ext2 and some older/other filesystems)
  • Date-based filenames are cumbersome to type (or even auto-complete in your Unix shell)

Personally, I prefer a structure using date-based directories like so:

/var/log/cc_auth/2007/11/14
    pre_auth.log
    post_auth.log
/var/log/cc_auth/2007/11/15
    pre_auth.log
    post_auth.log
/var/log/cc_auth/2007/11/16
    pre_auth.log
    post_auth.log
/var/log/cc_auth/2007/11/17
    pre_auth.log
    post_auth.log
/var/log/cc_auth/2007/11/18
    pre_auth.log
    post_auth.log

In this situation, you've got a clean structure laid out in directories on disk. Now, you could make an argument that you use more inodes, but that's a weak argument. Point here being...nice and pretty, right?

What are some other useful things to log during batch job execution?

The more useful data you can log, the better (within reason, of course). Here are some handy examples:

  • PID
  • Start time of job
  • End time of job
  • Elapsed time
  • Number of notices, warnings, errors, etc.

To illustrate, here's a log entry for a script running on a batch job class that I built at work:

(5527) ------------------------------------
(5527)   Hostname: articuno (batch)
(5527)     Script: /data/baz/deploy/batch/Foo/Bar/some_script.php
(5527)   Log File: /data/baz/log/Foo/Bar/2007/11/13/some_script.log
(5527)      Start: 2007-11-13 02:39:02 GMT
(5527) ------------------------------------
(5527) [2007-11-13 02:39:02] [info] locked 1 items for copyright scanning
(5527) [2007-11-13 02:39:02] [info] [3EC7539BF9F0C72EE040050AEE042902] performing copyright scan; entity type id = 3; name = High sound TR TONE.mp3; scanning file = /foo/bar.mp3; mime type = audio/mpeg
(5527) [2007-11-13 02:39:18] [info] [3EC7539BF9F0C72EE040050AEE042902] entity is not copyrighted
(5527) [2007-11-13 02:39:18] [info] [3EC7539BF9F0C72EE040050AEE042902] removing entity from pending state
(5527) [2007-11-13 02:39:18] [info] [3EC7539BF9F0C72EE040050AEE042902] copied files in temporary storage to public storage
(5527) [2007-11-13 02:39:18] [info] [3EC7539BF9F0C72EE040050AEE042902] deleted all files in temporary storage
(5527) [2007-11-13 02:39:18] [info] [3EC7539BF9F0C72EE040050AEE042902] set permissions on entity's public storage directory
(5527) [2007-11-13 02:39:18] [info] [3EC7539BF9F0C72EE040050AEE042902] set copyright scan outcome
(5527) [2007-11-13 02:39:18] [info] [3EC7539BF9F0C72EE040050AEE042902] removed entity from upload queue
(5527) [2007-11-13 02:39:18] [info] [3EC7539BF9F0C72EE040050AEE042902] queued cdn purge of entity's urls
(5527) [2007-11-13 02:39:18] [info] released lock for process articuno (batch):5527
(5527) [2007-11-13 02:39:18] [info] found 0 copyrighted entities
(5527) ------------------------------------
(5527)       End: 2007-11-13 02:39:18 GMT
(5527)   Elapsed: 16.472630s
(5527) ------------------------------------

At first glance, there are a bunch of things that are really clear from this log entry:

  • The process ID is 5527
  • The entire execution took about 16.5 seconds
  • We see what is being processed, and an entry for every action taken along with its success (or failure)

Now, this is a pretty useful, but successful, log entry. Let's take a look at a failure case, shall we?

(906) ------------------------------------
(906)   Hostname: sentret (video01)
(906)     Script: /data/baz/deploy/batch/Foo/Bar/transcode_videos.php
(906)   Log File: /data/baz/log/Foo/Bar/2007/11/07/transcode_videos.log
(906)      Start: 2007-11-07 15:58:02 GMT
(906) ------------------------------------
(906) [2007-11-07 15:58:02] [info] locked 2 items of type 4
(906) [2007-11-07 15:58:02] [info] [3D3A7D11E19B8906E040050AEE04323B] Starting flv transcode for 3gp and thumbnails
(906) [2007-11-07 15:59:04] [info] [3D3A7D11E19B8906E040050AEE04323B] Finished
(906) [2007-11-07 15:59:04] [info] [3D3A7D11E19B8906E040050AEE04323B] Starting preview flv transcode for entity page
(906) [2007-11-07 15:59:31] [notice] [3D3A7D11E19B8906E040050AEE04323B] error transcoding video to flash; skipping video; path = /foo/Croud.flv; message = Encoding process encountered an error
(906) [2007-11-07 15:59:31] [info] [3E48BC06162F4A8CE040050AEE042BCC] Starting flv transcode for 3gp and thumbnails
(906) [2007-11-07 15:59:56] [notice] [3E48BC06162F4A8CE040050AEE042BCC] error transcoding video to flash; skipping video; path = /foo/DBA_27108.gif; message = Could not create the flix handle - flixd unreachable (not running); flix result = -9
(906) [2007-11-07 15:59:56] [error] [3E48BC06162F4A8CE040050AEE042BCC] error connecting to the flix engine; skipping video; path = /foo/DBA_27108.gif; message = Could not create the flix handle - flixd unreachable (not running); flix result = -9
(906) [2007-11-07 15:59:57] [info] released lock for process sentret (video01):906
(906) ------------------------------------
(906)       End: 2007-11-07 15:59:57 GMT
(906)   Elapsed: 114.864283s
(906) ------------------------------------

In this case, we see that an error occurred. The developers would receive an email reading:


(906) ------------------------------------
(906)   Hostname: sentret (video01)
(906)     Script: /data/baz/deploy/batch/Foo/Bar/transcode_videos.php
(906)   Log File: /data/baz/log/Foo/Bar/2007/11/07/transcode_videos.log
(906)      Start: 2007-11-07 15:58:02 GMT
(906) ------------------------------------
(906) [2007-11-07 15:59:56] [error] [3E48BC06162F4A8CE040050AEE042BCC] error connecting to the flix engine; skipping video; path = /foo/DBA_27108.gif; message = Could not create the flix handle - flixd unreachable (not running); flix result = -9
(906) ------------------------------------
(906)       End: 2007-11-07 15:59:57 GMT
(906)   Elapsed: 114.864283s
(906) ------------------------------------

We maintain a setting for "minimum email log level," which defaults to warnings. So, that's what allows us to email anything at warning-level or higher to the developers that can address the situation. Alternatively, we could set that level to email developers on anything at notice-level or above. It's all configurable in the batch framework.

Similarly, we define a default exception handler and an error handler to trap uncaught exceptions and errors from PHP. Having an exception handler, for example, allows us to catch all exceptions, log their being uncaught, and email the developers to let them know of the problem. Likewise, PHP notices or warnings are logged and emailed if applicable, too.

We've definitely achieved our goal of making developers aware of problems!

So, this all looks great, Brian, but how can I get my hands on it?

Well, at this time, I'm not at liberty to release any of this code. Perhaps it's worth submitting a Zend Framework proposal to keep it in Userland, or even a PEAR2 module.

Even still, let's assume that we'll want the following:

  • Parsing of command line options (short and long)
  • Lock file support
  • Email recipient(s) on errors (you could even, say, send SMS messages!)
  • Flexible logging in date-based directories or files, or any arbitrary structure
  • Ability to define levels at which emails are generated
  • Easy way to use batch functionality in any batch script

On the database side of things, let's consider these requirements:

  • Ability to delay retry of processing for a specified amount of time
  • Ability to retry up to X times, then cease retries

I've had this post brewing for a long time now, so I'm going to deem this one "part one of two" and address some of the points above in a second post on the topic. The database portion alone is pretty lengthy. I also haven't heard back on php|tek acceptance at this point, but if I get accepted, I'll definitely be bringing some more cohesion to this topic.

If you have any questions or comments, just ask! I'm also going to send a PEAR2 proposal post-Thanksgiving, so heads up!

Example: Who's Online with PHP and Memcached

I figured it best to give an example to back up my last post entitled "Who's Online with PHP and Memcached." First, let's look at the WhosOnline class itself. This class is meant to be a Singleton, so you have to access it with WhosOnline::getInstance().

Also, DISCLAIMER: I wrote this code in about 20-30 minutes. There may be little odds and ends-type problems with it, but please post comments if you've got feedback!

/**
 * Class for accessing Who's Online data via Memcached.
 *
 * @author Brian DeShong
 */
class WhosOnline
{
    const RECORDING_DELAY_SECONDS = 120;
    private static $_instances = array();
    private $_mc;

    /**
     * Protected constructor to force use as a singleton.
     *
     * @param Memcache $mc Memcache object.
     */
    protected function __construct(Memcache $mc)
    {
        $this->_mc = $mc;
    }

    /**
     * Classic Singleton getInstance() method.  Allows for multiple
     * WhosOnline instances, though.  For example, maybe you want to use one
     * Memcached pool for users online in your forums, and another for users
     * online in your online dating application.  Coupling a different
     * Memcache object with a different $uniqueId allows this.
     *
     * @param Memcache $mc Memcache object.
     * @param string $uniqueId Unique ID of the object; optional.
     * @return WhosOnline
     */
    public static function getInstance(Memcache $mc, $uniqueId = 'default')
    {
        if (!isset(self::$_instances[$uniqueId])) {
            self::$_instances[$uniqueId] = new self($mc);
        }

        return self::$_instances[$uniqueId];
    }

    /**
     * Determines if current user's online status needs to be recorded or
     * updated.
     *
     * @return bool
     * @todo This method shouldn't reach out to $_SESSION.
     */
    public function needToRecordOnline()
    {
        return
            !isset($_SESSION['lastOnlineRecorded']) ||
            (isset($_SESSION['lastOnlineRecorded']) &&
             $_SESSION['lastOnlineRecorded'] <
                 time() - self::RECORDING_DELAY_SECONDS);
    }

    /**
     * Records given user ID as being online and records last activity
     * timestamp.
     *
     * @param int $userId User ID.
     * @return bool
     */
    public function recordOnline($userId)
    {
        if (!self::setUserOnline($userId)) {
            return false;
        }

        $_SESSION['lastOnlineRecorded'] = time();
        return true;
    }
    /**
     * Gets array of all users online.  Array is keyed by user ID with activity
     * timestamp as the value.
     *
     * @return array
     */
    public function getUsersOnline()
    {
        $usersOnline = $this->_mc->get('usersOnline');

        return ($usersOnline !== false ? $usersOnline : array());
    }

    /**
     * Sets an array of user IDs with their activity timestamps.
     *
     * @param array $usersOnline Array of user IDs online.
     * @return bool
     */
    public function setUsersOnline(array $usersOnline)
    {
        return
            $this->_mc->set('usersOnline', $usersOnline) &&
            $this->_mc->set('numUsersOnline', count($usersOnline));
    }

    /**
     * Sets given user ID as being online.
     *
     * @param int $userId User ID.
     * @return bool
     */
    protected function setUserOnline($userId)
    {
        $usersOnline = $this->getUsersOnline();
        $usersOnline[$userId] = time();
        return $this->setUsersOnline($usersOnline);
    }
}

Note the primary methods:

  • WhosOnline::getInstance()
  • WhosOnline::needToRecordOnline
  • WhosOnline::recordOnline()
  • WhosOnline::getUsersOnline()
  • WhosOnline::setUsersOnline()

The main reason we leave setUsersOnline() public is so that it can be accessed via a back end script to cleanup the entire array of user IDs online.

Next, our example file using this class:

wol_test.php

// Startup the session and assign a user ID.  Typically you would do this at
// authentication time.
session_start();

if (!isset($_SESSION['user_id'])) {
    $_SESSION['user_id'] = uniqid();
}

// Connect to Memcached and grab the Who's Online object.
$mc = new Memcache();
$mc->connect('localhost', 11211);
$who = WhosOnline::getInstance($mc);

// If user needs to be recorded as online, do so.
if ($who->needToRecordOnline()) {
    $who->recordOnline($_SESSION['user_id']);
}

// Grab users online to display; typically you would never do this on the
// front end, though.
$usersOnline = $who->getUsersOnline();
?>
Your session data:
<pre>
<?php echo print_r($_SESSION, true); ?>
</pre>

Users online: <?php echo count($usersOnline); ?>
<pre>
<?php echo print_r($usersOnline, true); ?>
</pre>

I placed the example wol_test.php file in my DocumentRoot and ran it through ApacheBench a few times, like so:

ab -c 10 -t 1000 http://localhost/wol_test.php

This causes the wol_test.php page to be requested 1,000 times at a level of 10 concurrent requests. I did this a few times and ended up with over 3,000 users in my array of users online. Based on a manual get from Memcached like so:

get usersOnline
VALUE usersOnline 1 126727

...we see that with over 3,000 users online, it only takes up 126,727 bytes in Memcached. Remember, the PECL extension for Memcache serializes any non-scalar values before storing them, so you have a cost associated with the serializing and unserializing of the array. Doing the math here, a 1MB serialized array will hold 30,838 users online. You'll be able to squeeze more out of it if you have integer user IDs; I'm using uniqid() here just for example purposes.

But is this is a good idea? Retrieving 1MB, or even 127k from Memcached every so often isn't cheap. Remember, you are:

  1. Retrieving string with serialized array of users online from Memcached
  2. Unserializing the string
  3. Adding user or updating their activity timestamp
  4. Serializing the array again
  5. Storing string back to Memcached

...this isn't cheap. This is probably going to be more sluggish than you're willing to acceept, and I doubt it'd scale well as you crept up into thousands of users online. I'm here with over 4,000 users in my array, and it performs well, but it's also on a page with nothing else -- once you tack on database queries and all sorts of other junk to render a page, you may be looking at a page that renders in over .5 seconds.

In a situation like this, you could consider splitting Who's Online data up into multiple values in Memcached. Basically, you can write your application code to use, say, 10 "buckets" of users online. You would randomly select one of the 10 buckets to add/modify the user. The key in a situation like this is to have a back end process to merge all of the arrays together, iterate over them removing stale users, and evenly distributing them back into Memcached.

I've started coding an example of this, but don't really have the will to finish it right now. :) Maybe later.

Lastly, let's look at the back end batch process that keeps the array of users online tidy; typically this script would run as a cronjob:

whos_online_cleanup.php

require_once './wol.php';

$now = time();
$mc = new Memcache();
$mc->connect('localhost', 11211);
$who = WhosOnline::getInstance($mc);

$usersOnline = $who->getUsersOnline();

if (empty($usersOnline)) {
    print "no users online; exiting\n\n";
    exit();
}

print "num users online: " . count($usersOnline) . "\n\n";
print "processing users...\n";

$numUsersRemoved = 0;

foreach ($usersOnline as $userId => $timestamp) {
    if ($timestamp < $now - 300) {
        print "removing $userId; last seen " .
            ($now - $timestamp) . " seconds ago\n";
        unset($usersOnline[$userId]);
        $numUsersRemoved++;
    }
}

print "num users removed: $numUsersRemoved\n";
print "current num users online: " . count($usersOnline) . "\n";
print "saving users online...";
print ($who->setUsersOnline($usersOnline) ? 'done!' : '** FAILED **');
exit();

Here's some example output from it:

brian@henery [/web/pages]$ php ./whos_online_cleanup.php
num users online: 56

processing users...
removing 46f549b786d68; last seen 496 seconds ago
removing 46f549b786e7c; last seen 480 seconds ago
removing 46f549b786ef2; last seen 476 seconds ago
removing 46f549b7871b5; last seen 445 seconds ago
removing 46f549b787213; last seen 480 seconds ago
removing 46f549b78a105; last seen 482 seconds ago
removing 46f549b789071; last seen 389 seconds ago
removing 46f549b7927eb; last seen 467 seconds ago
removing 46f549b79372a; last seen 437 seconds ago
removing 46f549b798931; last seen 487 seconds ago
removing 46f549b79b50b; last seen 423 seconds ago
removing 46f549b79bc0a; last seen 381 seconds ago
removing 46f549b79d000; last seen 379 seconds ago
removing 46f549b79dfa6; last seen 398 seconds ago
removing 46f549b79fb99; last seen 472 seconds ago
removing 46f549b7a3975; last seen 502 seconds ago
removing 46f549b7a8278; last seen 373 seconds ago
removing 46f549b7ab905; last seen 407 seconds ago
removing 46f549b7d635e; last seen 389 seconds ago
removing 46f549b7d63b0; last seen 500 seconds ago
removing 46f549b7d63d8; last seen 453 seconds ago
removing 46f549b7da797; last seen 309 seconds ago
removing 46f549b7dbb9a; last seen 491 seconds ago
removing 46f549b7dce8a; last seen 396 seconds ago
removing 46f549b7dddb0; last seen 361 seconds ago
removing 46f549b7e3da6; last seen 353 seconds ago
num users removed: 26
current num users online: 30
saving users online...done!

So, you can just cron this like so:

* * * * * /usr/local/bin/php /some/path/to/whos_online_cleanup.php > /dev/null 2>&1

...and feel free to redirect STDOUT to a log file if you'd like.

Some quick stats. With over 1,000 users in the array, running the cleanup script takes under .2 seconds:

brian@henery [/web/pages]$ time php ./whos_online_cleanup.php
num users online: 1221

...[snip]...

num users removed: 470
current num users online: 751
saving users online...done!
real    0m0.195s
user    0m0.040s
sys     0m0.040s

...so it's pretty speedy. It's worth noting that all of this is being done on my Mac Mini Core Solo with 2 GB RAM running PHP 5.2.4 and Apache 2.2.x on OS X 10.4.10. Oh, and with just over 5,000 users in the array, the script runs in .58 seconds.

So...pretty straightforward, right? What do you think? Surely there's room for improvement...