Heartbleed Defeated - Engineering at Prezi

Huge issues are rarely the result of one tiny mistake. But when that little error happens to occur within the context of a widely used crypto library, the impact can be catastrophic. I’m writing, of course, about Heartbleed, which affected up to ⅔ of all online businesses.

For those of you without an engineering background, Heartbleed took advantage of a security hole in recent versions of the OpenSSL library, meaning it could leak information from the server’s memory. Such information might be nothing more than meaningless junk with no context, or… in the worst cases, it could contain vital password data. Fortunately, the evidence indicates that we caught this problem before our users were negatively impacted.

Let me jump ahead to the end of the story. We were able to quickly and proactively respond to the threat of Heartbleed and eliminate any risk of unauthorized access to our users’ accounts. But what makes the story interesting is HOW we were able to get it done so fast.

The minute we identified the severity of the threat, we declared what we call a “Security Prio1” status. Let’s start with what that means. A regular “Prio1” means that large amounts of our users are unable to access their presentations due to some technical problem. We escalate rapidly in this situation to ensure our users aren’t left hanging just as they’re about to present.

During a Security Prio1, we are working to prevent the opposite problem: unauthorized access to presentations. Security Prio1 status is a signal to all of our developers to drop whatever else they’re working on and devote their full and immediate attention to the problem until it’s solved.

All the infrastructure engineers came together and collaborated to wrestle Heartbleed to the ground. With all hands on deck, we were able to work with rapid efficiency, and we patched all our servers just within a few hours. When Amazon patched our load balancers, we stopped leaking data. Heartbleed was defeated within a day after first rearing its ugly head.

Here are some of the steps we took along the way:

Applied the security patch to every one of our known vulnerable servers–more than 100 of them
Identified corner-cases, such as services that didn’t restart automatically
Detected and assessed third-party service providers (e.g. Amazon ELBs)
Changed our SSL certificates, and requested the revocation of the old ones
Wrote monitoring to detect which services were unpatched or were using compromised certificates
Performed a forced log-out of every user to renew their session cookies. (Since we use signed session cookies, we also needed to invalidate the old ones)
Assessed the impact on the users

Heartbleed is traceless, and nearly impossible to detect. Until the last step, when Amazon upgraded the ELBs, we weren’t really sure what had happened.

Fortunately, Security Prio1 was invented specifically for these types of shitstorms, so we were ready. But it’s about more than just that. The real reason we were able to act so rapidly has less to do with established procedures or protocols and more to do with Prezi culture. Everyone here recognizes the importance of our customers above all else, and are ready to do whatever it takes to ensure the best possible experience for them.

Now, despite our success against Heartbleed, we strongly recommend that you change your internet passwords as soon as possible, just to be on the safe side. You should do this on a regular basis anyway, not just for Prezi but for any of the sites you frequently visit.

May	JUN	Jul
	05
2013	2014	2015