Category: Scalability

“Rickroll To Go…” ZendCon Session audio posted!

My ZendCon 2008 talk, “Rickroll To Go With WURFL, PHP, and Other Open Source Tools”, was just released at Zend DevZone as ZendCon Sessions episode #23!

If you’re just now finding my blog from there, welcome! And thanks to Eli White, Community Relations Manager for Zend, for selecting it for posting.

You can get all of the relevant info using the links below:

Slides and videos of the presentation materials
ZendCon Sessions page with audio
MP3 audio of the presentation
iTunes DevZone podcast

Enjoy, and thanks for listening! Find me on Twitter or email me if you’d like to discuss the materials.

MySQL replication and the sync_binlog option

Recently I’ve been focusing on MySQL replication for a project at work. On this particular project, I’m acting in a Solutions Architect role and have been since about September of 2008.

Because of my background in systems administration, I tend to get myself into situations where I become the Schematic-side sys admin on projects. This involves things like deployment processes, getting development, staging, and production environments setup, and now, setting up MySQL replication. This is probably because a) I’m probably bad at delegating these things to others, b) I’m kinda’ good at it, and c) let’s be honest, I’m a control freak, so I like knowing the servers hosting my apps are setup in a meticulous manner.

In short, we’re running MySQL 5.0.45 on RedHat Enterprise Linux 5 (I know, I know…RedHat and MySQL 5.0…boo…but it’s okay). We’re required to replicate our production database to a secondary machine for backup purposes. This way, if our production server dies, we can manually failover to the slave (once we enable writes to it, of course), then swap the two back once the production server is back up.

All in all, this site is rather low-traffic at around 20,000 dynamic page views per day. Factoring in US-based users in an 11 hour time period, that’s about 1,800 request per hour, or right around 0.5 page views per second. We’ve got a single server in production that’s acting as our webserver and database server; it’s got a RAID array in it that was all setup by the group hosting the application (so I don’t know tons about it, but it’s new, quality hardware).

I’m using my master database for reads and writes in Production. Again, the slave is really only required for live backup-type purposes.

Now, I’m no expert on MySQL replication, but I’ve learned a lot these past few weeks. So I’m going to share one big caveat here. Please correct me as you see fit!

MySQL’s got a sync_binlog configuration option. You typically set it in my.cnf, and its value is an integer from 0-n. This value determines how many binary log writes need to occur before its contents are flushed out of the buffer and onto disk. With it set to zero, your operating system just determines when the buffer is flushed to disk.

I have a database migration process that copies table structures and their data from PostgreSQL into MySQL, then basically migrates that data into the appropriate tables in the new MySQL instance. It involves the transformation of a lot of data. It’s a sizable, complete data set for 8+ year old system that’s not the prettiest, best normalized data model in the world.

Per recommendations in High Performance MySQL, I had my sync_binlog value set to 1 in Production.

When I was performing a test migration to Production recently, the process took about 3 hours. Wow. Thanks, MySQL! It normally takes about 90 minutes in Staging, if that.

In digging around Google and MySQL.com, I found that a non-zero value for sync_binlog causes more disk seeks to flush the binary log to disk. The benefit of having it set to 1 is so that every transaction can be written to the binary log, which is then flushed to disk upon commit. Then, if your server happens to die, the last completed transaction will always be present in the binary log on disk, so you never have to worry about, say, missing a transaction replay on your slaves. However, this results in a lot more disk activity on your master.

I set sync_binlog to 0 and re-ran my migration. It ran in 90 minutes — that’s a 50% performance gain! Now, if you do the math, this makes sense. It’s one less disk seek and write per-transaction, so this result totally makes sense. Hooray for numbers, right?

I’m willing to gamble the integrity of data on my slave for the 50% performance increase. (remind me of this post in 6 months when I’m kicking myself over this for some reason, okay?)

With no binary logging enabled (i.e. in our dev environment), this process takes about 20 minutes. This makes sense — far less disk writes during the process.

Another way to workaround this would be to keep your binary logs on a physically separate disk. However, I don’t have that luxury at this point, so that’s not an option for me. If I had my druthers, this is how I’d handle the problem, but…no dice for now.

Anyways, my main point: if you are willing to gamble with every single transaction being replicated to your slave in the event of a crash, perhaps you can set sync_binlog to 0. If you’ve got a separate disk to devote to your binary log, by all means, set it to 1! There are other concerns around this are related to battery-backed disk cache, which you can read a bit more about in Jeremy Cole’s post on MySQL replication. You can also see some handy benchmarks that compare MySQL with and without binary logging.

Finally, I’ll admit this is a bit of a knee-jerk reaction post. I’ve done a bunch of research on this, but it’s not all quite fleshed out in my mind yet. I get the whole cause and effect in theory, but I haven’t dug into MySQL source or other materials to really understand what’s going on behind the scenes.

MySQL replication is a tricky thing. It’s great when it works, but understand that there are overhead tradeoffs in using it! I’m sure I’ll learn more in the weeks and months following our launch, so I look forward to sharing more of my successes and/or pains on this. Comments, feedback, and flames such as “OMG, you’re so wrong Brian!” and “Brian is a n00b!” are welcome.

Slides: ZendCon 2008, “Rickroll To Go…”

It’s the first day of ZendCon 2008! I’m giving my new talk, “Rickroll To Go With WURFL, PHP, and Other Open Source Tools” today at 4:00 PM PST.

The slides are below in a variety of formats:

PDF (no transitions)
PDF (one transition per page)
Quicktime movie

If you’re at ZendCon and reading this, be sure to drop on by at 4:00 PM — it’s sure to be a ball. Enjoy!

Slides from php|tek 2008

For those of you interested, I’ve made the PDFs of my slides from php|tek 2008 available below:

If you have any questions or comments on the materials, feel free to email me (brian at deshong dot net).

Speaking at Atlanta PHP, 3/6/2008!

For you Atlantans that may read my blog, I’ll be speaking at this week’s Atlanta PHP meetup. Specifically, I’ll be presenting on “Robust Batch Processing with PHP,” which I’m also slated to present at php|tek 2008 in May.

Also, special thanks to all Atlanta PHP attendees for allowing me to use them as my guinea pigs before taking new talks out to conferences. You regulars know that I’ve done this with other talks in the past. You’re a very gracious, brave bunch, so thanks, everyone! :)

See you all there!

WordPress Themes