Toggle Menu

Insights > Digital Service Delivery > What the Amazon AWS S3 Outage Reminds Us: It’s all about Trade-Offs

What the Amazon AWS S3 Outage Reminds Us: It’s all about Trade-Offs

Early this year, Amazon’s S3 service experienced a major service outage. For about 4-5 hours on February 28th, many high profile software-as-a-service (SaaS) providers which depend on S3’s East Coast region experienced outages as well, leading to the claim that “Amazon broke the internet.” In the time since Amazon recovered from their service outage, I […]

By

December 04, 2017

Early this year, Amazon’s S3 service experienced a major service outage. For about 4-5 hours on February 28th, many high profile software-as-a-service (SaaS) providers which depend on S3’s East Coast region experienced outages as well, leading to the claim that “Amazon broke the internet.”

In the time since Amazon recovered from their service outage, I have seen a few different types of reactions around the internet community, which I’ll attempt to summarize below:

After witnessing these groups of reactions, one thing became very clear to me – for a development team working with the cloud, their choices (as with so many choices in development) are all about trade-offs.

On the flip-side of reducing risk by making use of multiple regions or providers, there is a cost to mitigate that risk. Beyond the explicit cost of keeping a copy of a company’s data in another data center, there also exists the risk and associated cost of synchronizing that data, securing that data, testing that data, and ensuring appropriate failover mechanisms.

Given the low likelihood of a regional failure on a service like Amazon’s S3, businesses felt confident enough that their trade-off was worth the risk. Now that an outage on this scale has actually occurred, I imagine businesses will be re-evaluating the impact of that risk.

Personally, my bet is that enough customers understand sale and upstream dependencies that when something on the scale of this outage happens, if they’re informed about the outage via the media and the provider itself (in this case, Amazon) they’re willing to be forgiving. Therefore, the customer relationship doesn’t suffer as much. I’ve also noticed customers being much more forgiving when good DevOps practices – such as status pages, and clear communication – are employed by a business.

But what if an outage scenario involved data loss? The calculus would change altogether.

So, “expect things to fail” really means “make your trade-offs based on a small likelihood of downtime, not 0% chance of downtime, and plan accordingly.” For a service that can survive an hours-long outage, the cost saving trade-offs are a no-brainer. Services that are mission critical or have volatile customer bases do not have this luxury, and they may need to build fault-tolerance across regions and cloud providers, at a cost to themselves and possibly their end users.

In IT, we are always in the business of value and trade-offs. As SaaS providers are now finding out, the key is knowing your customers’ expectations for the value your business provides, and making the correct trade-offs to deliver that value.

You Might Also Like

Modernization

3 Tips to Deploy Your App in the Cloud

You’ve built a fantastic application, and now you want to make it available to people....

Modernization

5 Things Product Managers Need to Think About When Migrating to the Cloud

The 2019 State of the Cloud report found that 94% of 786 IT professionals use...

Artificial Intelligence (AI)

Unifying the Backend – Why We Need to Unite Data Science and DevOps

Your company has invested in data science; you’ve created data teams, invested in expensive data...