High availability for MySQL on Amazon EC2 - Part 1

Like many, I have been seduced by the power and flexibility of Amazon EC2. Being able to launch new instances at will depending on the load, is almost too good to be true. Amazon has also some drawbacks, availability is not guaranteed and discovery protocols relying on Ethernet broadcast or multicast cannot be used. That means, it is not easy to have a truly highly available solution for MySQL on EC2. If a MySQL instance fails, here are some challenges that we face:

Detect the failure
Kill the faulty instance
Free the shared resources (ex: EBS volumes)
Launch a new instances
Reconfigure the shared resources
Reconfigure instance monitoring
Reconfigure the database clients

Facing these challenges, a customer recently asked me if I could build a viable HA solution for MySQL on EC2. The goal is to have a cheap small instance monitor the availability of a large (any size) and taking measures to keep MySQL available. A few weeks later, I ended up with a solution that work and is decently elegant using Pacemaker and Heartbeat. The setup is fairly complex, being in the Amazon EC2 virtual world is not a simplification, far from. Because of the complexity, the story will be broken into multiple posts:

Part 1 – Intro (this post)
Part 2 – Setting up the initial instances
Part 3 – Configuring the HA resources
Part 4 – The instance restart script
Part 5 – The instance monitoring script
Part 6 – Publishing the MySQL server location
Part 7 – Pitfalls to avoid

Hopefully, I should be able to write those posts quickly but since consulting is my primary duty, I don’t have much control over my workload.

Stay tuned!

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Jonathan

14 years ago

Scalarium will do all of this for you automatically for a simple MySQL setup

Niklas

14 years ago

Looking forward to read these articles!

Michael Lucas

14 years ago

Or you could just use Amazon’s new multi-AZ RDS service and be done in 10 minutes with one step.
http://aws.amazon.com/rds

Patrick Casey

14 years ago

Michael,

I think its going to depend on your definition of HA and what you’re willing to accept for a failover scenario.

With Amazon’s multi AZ deployment, you have about a 3 minute failover after the loss of the primary before the secondary comes online. While that’s pretty good and totally viable for most use cases, its certainly not going to work in a lot of other cases where true “no downtime” availability is required.

IMHO the Amazon offering is a bit on the weird side since it solves a (usually) unreleated pair of HA problems.

Typical HA solutions I’ve worked on guarantee either:

A) Low production impact, some failover latency, some data loss (asynch replication)
B) Some production impact, low failover latency, no data loss (synchronous replication)

Amazon has an unusual offering (to my eye) in that they’re offering:

C) Some production impact, some failover latency, no data loss

Usually if I’m on an app that can’t afford data loss *at all* then it also can’t afford failover latency.

Oren

14 years ago

You could also try using MMM with the nameserver patch found in its contrib directory. Although having drawbacks, it works on EC2.

Sheeri K. Cabral (Pythian)

14 years ago

Most of the issues here are very similar to issues folks face even if they’re not on EC2, it might be useful to point that out. Pretty much every point except #4 (launch new instance) is the same even if you’re in a non-cloud environment (of course, steps 3 and 5 dealing with shared resources may be automatic or harder, for instance an NFS disk mounted in 2 places is automatically already there, but a physically attached RAID setup is near impossible to automate moving a physical cable 🙂 )

Anyway, I think this series will be incredibly useful for folks even if they’re not in a cloud environment, because no matter what, in any HA scenario, you have to:

1. Detect the failure
2. Remove the failed instance from the “usable” instance list (ie, ‘take it out of the load balancer’)
3. Free the shared resources
4. Replace the failed instance with a new one
5. Reconfigure the shared resources
6. Reconfigure instance monitoring
7. Reconfigure the database clients

(This made me think fondly to a decade ago — part of my Master’s Degree project was a Beowulf cluster to split up processing load and deal with any failures gracefully. We had a few dozen 386 and 486’s, which were dinosaurs back then…..it was fun….)

Yves Trudeau

Author

14 years ago

Hi,
I am aware that there are other solutions, I will not discuss the merits of each but I never saw a one size fit all. This solution is not either a one size fit all but it is very flexible and be applied not only for MySQL but almost any sort of application. I am onsite this week so I am progressing slowly in part 2.

Amazon Blog

14 years ago

I am quite familiar with other solution but all I want is one window solution and its not like that the itâ€™s the one solution for all issues.

MySQL 5.7
Support

Compare Percona to Leading Database Solutions

Software
Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

High availability for MySQL on Amazon EC2 – Part 1 – Intro

Related

Related Blog Articles

RECOMMENDED ARTICLES

How to Safely Upgrade InnoDB Cluster From MySQL 8.0 to 8.4

The Open Source Ripple Effect: How Valkey Is Redefining the Future of Caching, and Why It Matters

Beyond Guesswork: Enterprise-Grade PostgreSQL Tuning with pg_stat_statements

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

Auditing Login attempts in MySQL

MySQL “Got an error reading communication packet”

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

High availability for MySQL on Amazon EC2 – Part 1 – Intro

Related

About the Author

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

How to Safely Upgrade InnoDB Cluster From MySQL 8.0 to 8.4

The Open Source Ripple Effect: How Valkey Is Redefining the Future of Caching, and Why It Matters

Beyond Guesswork: Enterprise-Grade PostgreSQL Tuning with pg_stat_statements

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

Auditing Login attempts in MySQL

MySQL “Got an error reading communication packet”

MySQL 5.7
Support

Software
Downloads