Category: Tech

MySQL replication and the sync_binlog option

Recently I’ve been focusing on MySQL replication for a project at work. On this particular project, I’m acting in a Solutions Architect role and have been since about September of 2008.

Because of my background in systems administration, I tend to get myself into situations where I become the Schematic-side sys admin on projects. This involves things like deployment processes, getting development, staging, and production environments setup, and now, setting up MySQL replication. This is probably because a) I’m probably bad at delegating these things to others, b) I’m kinda’ good at it, and c) let’s be honest, I’m a control freak, so I like knowing the servers hosting my apps are setup in a meticulous manner.

In short, we’re running MySQL 5.0.45 on RedHat Enterprise Linux 5 (I know, I know…RedHat and MySQL 5.0…boo…but it’s okay). We’re required to replicate our production database to a secondary machine for backup purposes. This way, if our production server dies, we can manually failover to the slave (once we enable writes to it, of course), then swap the two back once the production server is back up.

All in all, this site is rather low-traffic at around 20,000 dynamic page views per day. Factoring in US-based users in an 11 hour time period, that’s about 1,800 request per hour, or right around 0.5 page views per second. We’ve got a single server in production that’s acting as our webserver and database server; it’s got a RAID array in it that was all setup by the group hosting the application (so I don’t know tons about it, but it’s new, quality hardware).

I’m using my master database for reads and writes in Production. Again, the slave is really only required for live backup-type purposes.

Now, I’m no expert on MySQL replication, but I’ve learned a lot these past few weeks. So I’m going to share one big caveat here. Please correct me as you see fit!

MySQL’s got a sync_binlog configuration option. You typically set it in my.cnf, and its value is an integer from 0-n. This value determines how many binary log writes need to occur before its contents are flushed out of the buffer and onto disk. With it set to zero, your operating system just determines when the buffer is flushed to disk.

I have a database migration process that copies table structures and their data from PostgreSQL into MySQL, then basically migrates that data into the appropriate tables in the new MySQL instance. It involves the transformation of a lot of data. It’s a sizable, complete data set for 8+ year old system that’s not the prettiest, best normalized data model in the world.

Per recommendations in High Performance MySQL, I had my sync_binlog value set to 1 in Production.

When I was performing a test migration to Production recently, the process took about 3 hours. Wow. Thanks, MySQL! It normally takes about 90 minutes in Staging, if that.

In digging around Google and MySQL.com, I found that a non-zero value for sync_binlog causes more disk seeks to flush the binary log to disk. The benefit of having it set to 1 is so that every transaction can be written to the binary log, which is then flushed to disk upon commit. Then, if your server happens to die, the last completed transaction will always be present in the binary log on disk, so you never have to worry about, say, missing a transaction replay on your slaves. However, this results in a lot more disk activity on your master.

I set sync_binlog to 0 and re-ran my migration. It ran in 90 minutes — that’s a 50% performance gain! Now, if you do the math, this makes sense. It’s one less disk seek and write per-transaction, so this result totally makes sense. Hooray for numbers, right?

I’m willing to gamble the integrity of data on my slave for the 50% performance increase. (remind me of this post in 6 months when I’m kicking myself over this for some reason, okay?)

With no binary logging enabled (i.e. in our dev environment), this process takes about 20 minutes. This makes sense — far less disk writes during the process.

Another way to workaround this would be to keep your binary logs on a physically separate disk. However, I don’t have that luxury at this point, so that’s not an option for me. If I had my druthers, this is how I’d handle the problem, but…no dice for now.

Anyways, my main point: if you are willing to gamble with every single transaction being replicated to your slave in the event of a crash, perhaps you can set sync_binlog to 0. If you’ve got a separate disk to devote to your binary log, by all means, set it to 1! There are other concerns around this are related to battery-backed disk cache, which you can read a bit more about in Jeremy Cole’s post on MySQL replication. You can also see some handy benchmarks that compare MySQL with and without binary logging.

Finally, I’ll admit this is a bit of a knee-jerk reaction post. I’ve done a bunch of research on this, but it’s not all quite fleshed out in my mind yet. I get the whole cause and effect in theory, but I haven’t dug into MySQL source or other materials to really understand what’s going on behind the scenes.

MySQL replication is a tricky thing. It’s great when it works, but understand that there are overhead tradeoffs in using it! I’m sure I’ll learn more in the weeks and months following our launch, so I look forward to sharing more of my successes and/or pains on this. Comments, feedback, and flames such as “OMG, you’re so wrong Brian!” and “Brian is a n00b!” are welcome.

Factory Method pattern; ATLPHP tonight!

I’ll be giving a “mini-talk” on the Factory Method pattern at Atlanta PHP this evening.

Who doesn’t love the Factory Method pattern, right? Good stuff. A link to a PDF of the slides is below. I struggled to come up with a cool example, but an one related to cars was the best that I could do. You’ll find some PHP-specific example code in the slides, too.

Grab the slides (PDF)

For further reading on the Factory Method pattern and other classic design patterns, you can always grab a copy of Design Patterns: Elements of Reusable Object-Oriented Software. Enjoy!

PHP/technical New Year’s resolutions – 2009

Reflecting on my 2008 New Year’s resolutions, I didn’t accomplish all of them. The only one I really even began to tackle was contributing to Zend Framework.

I participated in PHP TestFest, and my tests ended up making their way into CVS, so that was a nice surprise. I did manage to speak at both php|tek 2008 and ZendCon 2008, so that was good…though not really a resolution of mine.

Regardless, it’s been a good year for accomplishing some of my technical- and PHP-related goals. But what about 2009?

Well, since it’s public knowledge now, my wife and I are expecting our first child in June 2009 (woohoo!), so I’m going to be taking a bit of time off of the conference circuit. I specifically didn’t propose anything for php|tek 2009, because it’s due to take place just a few weeks before the kid’s due date, so I don’t want to be away from home at such an important time. Maybe I’ll supplement that with more talks at Atlanta PHP?

So, since I’ll be stepping back from conferences a bit, what will I be focusing on? Here are some of my PHP/technical resolutions for 2009:

  • Continue contributing to Zend Framework: With my first proposal now in the state of “Pending Recommendation,” I’d like to start drafting a proposal for a Zend_Cache_Profiler of sorts, ala Zend_Db_Profiler. I’ll be looking to write up and submit that proposal within Q1 2009, I think.
  • Contribute to php|architect: The 2009 Editorial Calendar for php|architect has been released. There are at least two topics in there that I’d love to write on. Specifically, I want to attempt to adapt my “Rickroll To Go With PHP, WURFL, and Other Open Source Tools” presentation into an article format. That should prove to be an interesting, entertaining challenge.
  • Catch up on my list of technical books I want to read: My Amazon Wish List is filled with all kinds of books that I want to read, so I’m really hoping to get through a handful of them this year.
  • Finish my iPhone game: It’s a super top secret idea, of course, but the gameplay is largely done and works well. I’ve got to work on scoring, how leveling works, preferences, and finally, graphic design. So…I’ve got a long way to go on that. It’s been a great exercise in learning UIKit/Objective-C!
  • Write an OS X Memcached GUI monitoring/profiling client?: I’ve wanted to build a little OS X desktop app for monitoring the performance of Memcached servers for a while now — think cool graphs of gets, puts, evictions, bytes used, etc. Something that, if you managed a pool of many Memcached servers, that it’d come in really handy at giving you a snapshot of performance and potential areas of improvement. This would be another great exercise at learning more about Cocoa/Objective-C, too. Desktop software development just feels a bit more legitimate sometimes, ya’ know? Or maybe that’s just me.

So, I think that’s it for now. We’ll see how I do this year.

Happy holidays to everyone! See you in 2009.

DeShong.net now at Slicehost

After much input from many of my colleagues, I have decided to move all of my DeShong.net sites and services to Slicehost.

I’ve been with DreamHost for almost three years now, but decided it was time for a change. Among the factors in this are:

  • I want to move my email hosting to Gmail for Domains — it seems far superior to DreamHost nowadays
  • DreamHost seems to be somewhat oversold; performance on my shared hosting package is pretty poor
  • DreamHost recently moved me to a new server, which is fine, but my existing PHP CGI setup broke (which I had done during the days when they were still running PHP 4 as a module, but I wanted more custom configuration control)
  • I never had enough control over my LAMP setup, which has always bugged me. I wanna’ run APC, Memcached, etc.!
  • I’ve hosted all of my personal photos at Flickr for quite some time now, so I no longer need an old, crufty Gallery2 installation.

So, Slicehost gives me less physical resources (on their smallest slice, which is 256 MB RAM, and 10 GB storage), but they’re dedicated resources, so the performance is better. No oversold, shared hosting boxes for me, thanks!

Also, I get root access to the virtual machine. I tend to shy away from virtual machines, but they are all the rage these days. I figure I should give it a whirl, eh?

Anyways, I’ve carefully migrated all of my DNS to Slicehost, my Subversion repository, my WordPress blog, and my Wiki, along with some other odds and ends. I also have chosen to not take some of my friends’ domains with me over to Slicehost due to the limited storage, so…sorry, guys!

So, if you notice anything funky, let me know. Otherwise, all existing URLs should work, so things should be completely transparent for the most part. I’ll get my Gmail email cutover done next week, but my MX records will still point to DreamHost for another week or so.

Wish me luck!

Conference wrap-up: Schematic Tech Summit 2008

As I write this, I’m in flight back to Atlanta from the first annual Schematic Technology Summit, which was held in San Jose, Costa Rica at the La Condesa hotel and resort.

What an incredible event. Everyday, I get to work with all sorts of smart, passionate technical minds from many disciplines. The Schematic Technology Department is made up of about 90 people across our New York, Los Angeles, Atlanta, Austin, Minneapolis, London, and San Jose, Costa Rica offices. The past four days have seen all of those 90 people in the same place — quite an accomplishment!

An equally impressive accomplishment is that we had over 70 presentations from 45 speakers across seven different disciplines. Internally, we’re made up of PHP, Java, .Net, Silverlight, Flash, and HTML/Javascript developers. We also have a team of Solutions Architects across the globe, as well as Quality Assurance Analysts. Also represented was the Schematic Technology Management team, of which I’m a part of, in addition to being the Platform Chair of our Open Source Platforms Group (the PHP team).

We’ve been planning this event since about May of this year. We opened up a call for proposals to all technology Platform Groups, sifted through all of the submissions, chose from them, and worked out a schedule. All of these talks were spread out over six different rooms, ranging from small to large.

As OSPG Platform Chair, I was responsible for all or part of a total of seven different presentations. I presented:

  • Opening keynote (OSPG portion)
  • Zend Framework: A Look Back (and Forward)
  • State of phplib (our internal PHP code library, modeled after Zend Framework)
  • Abracadabra!: Mastering Unix Shell Scripting
  • Shrinking Your Static Stuff
  • Load Testing Introduction
  • Open discussion (discussed various OSPG and PHP topics)

I actually won an award for “most prolific” speaker since I had the largest amount of presentations on the schedule. My fellow OSPG teammates also presented:

  • Joseph Jorgensen: Flash Remoting with AMFPHP
  • Pablo Viquez: PHAR: PHP’s Self-Contained Archives (and a short spanish lesson!)
  • Maggie Nelson, David Mora: Be The Database! (theme song included)
  • Karolina Hidalgo: MVC (.Net and PHP comparison)
  • Ben Ramsey:
    You Look Like You Could Use Some REST! REST and the Resource-Oriented Architecture Explained and Web Application Security 101
  • Megan McNulty, Jim Connell: “I Can Haz App Enjun?” (intro to Google App Engine)

Special thanks to all of them for their hard work in proposing talks and ultimately preparing them! For the rest of you that we couldn’t accept, we’ll get you on the books for 2009!

I also attended many other talks (when I wasn’t speaking myself!), such as:

  • Robert Reinhardt: Personal Brand Building
  • Michelle Kempner, Schematic SA Group: Diagramming Pictionary
  • Schematic SA Group: SA Rapid Fire Show and Tell

My Atlanta cohorts, Ryan Taylor, Corey Schuman and Brandon Dement spoke on “PixelBender Unleashed,” “WPF,” and “Automation” respectively. Good stuff, guys! ATL represent!

In our spare time, the group could be found in the hotel casino, drinking in one of the bars, or playing ping pong (New York won the first office ping pong tournament!). We also took a group outing to La Paz Waterfall Gardens on Saturday afternoon. We survived the bus ride up, down, and through mountains to reach our destination, which is filled with some amazing waterfalls, monkeys, butterflies, hummingbirds, and other awesome rainforest stuff. We all hiked through the trails and paths winding through the rainforest, checking out a handful of amazing waterfalls. I don’t know about the rest of you Schemers, but that trip reminded me just how out of shape I am. Mowing the yard isn’t enough for exercise!

Also, a huge thanks to our event planners Kimberly Brown and Viria Azofeifa, as well as Yvette Pasqua, Jason Buzzeo, Larry Davidson, and Chris Bray for driving this thing! You all rocked it.

Anyways, just a little wrap-up from me! I’m excited to see how things might come together to hold the second annual Schematicon in 2009! Don’t forget to relive the action on Twitter (#schematicon08 and #squarecon08) and Flickr (schematicon08).

WordPress Themes