Toggle Menu

Insights > Digital Service Delivery > What the Amazon AWS S3 Outage Reminds Us: It’s all about Trade-Offs

What the Amazon AWS S3 Outage Reminds Us: It’s all about Trade-Offs

Early this year, Amazon’s S3 service experienced a major service outage. For about 4-5 hours on February 28th, many high profile software-as-a-service (SaaS) providers which depend on S3’s East Coast region experienced outages as well, leading to the claim that “Amazon broke the internet.” In the time since Amazon recovered from their service outage, I […]

By

December 04, 2017

Early this year, Amazon’s S3 service experienced a major service outage. For about 4-5 hours on February 28th, many high profile software-as-a-service (SaaS) providers which depend on S3’s East Coast region experienced outages as well, leading to the claim that “Amazon broke the internet.”

In the time since Amazon recovered from their service outage, I have seen a few different types of reactions around the internet community, which I’ll attempt to summarize below:

After witnessing these groups of reactions, one thing became very clear to me – for a development team working with the cloud, their choices (as with so many choices in development) are all about trade-offs.

On the flip-side of reducing risk by making use of multiple regions or providers, there is a cost to mitigate that risk. Beyond the explicit cost of keeping a copy of a company’s data in another data center, there also exists the risk and associated cost of synchronizing that data, securing that data, testing that data, and ensuring appropriate failover mechanisms.

Given the low likelihood of a regional failure on a service like Amazon’s S3, businesses felt confident enough that their trade-off was worth the risk. Now that an outage on this scale has actually occurred, I imagine businesses will be re-evaluating the impact of that risk.

Personally, my bet is that enough customers understand sale and upstream dependencies that when something on the scale of this outage happens, if they’re informed about the outage via the media and the provider itself (in this case, Amazon) they’re willing to be forgiving. Therefore, the customer relationship doesn’t suffer as much. I’ve also noticed customers being much more forgiving when good DevOps practices – such as status pages, and clear communication – are employed by a business.

But what if an outage scenario involved data loss? The calculus would change altogether.

So, “expect things to fail” really means “make your trade-offs based on a small likelihood of downtime, not 0% chance of downtime, and plan accordingly.” For a service that can survive an hours-long outage, the cost saving trade-offs are a no-brainer. Services that are mission critical or have volatile customer bases do not have this luxury, and they may need to build fault-tolerance across regions and cloud providers, at a cost to themselves and possibly their end users.

In IT, we are always in the business of value and trade-offs. As SaaS providers are now finding out, the key is knowing your customers’ expectations for the value your business provides, and making the correct trade-offs to deliver that value.

You Might Also Like

Events

Best Tech Events for June 2018

Beat the heat and head to one of our favorite meetups or conferences! Check out...

Advanced Data & Analytics

Unifying the Backend – Why We Need to Unite Data Science and DevOps

Your company has invested in data science; you’ve created data teams, invested in expensive data...

Events

Best Tech Events for April 2018

Celebrate spring and head to an event or meetup! Check out what our technologists are...