Amazon Cloud Storage

Is there a final writeup from DC regarding using Amazon, backups, redundency, etc.? How safe is our data? Are you looking into a backup to provide classes should the next outage last for a longer time?

 

thanks!

 

blake eitniear

Following the Amazon outage of April 21st, we took an extensive look at our processes and architecture to explore what steps we could take to reduce the risk of service interuption in the future.  We also took a look at what worked well.  There is a lot you can learn from a service interuption and our goal was to make sure we captured those lessons and applied new processes to further harden our infrastructure.  Here are some of the things we learned:

1.  Not everyone was down.  While the problem did affect DigitalChalk, many others who didn't have critical services in the affected data centers were unaffected.  The media reported a blanket outage but in fact it was limited.  The likelyhood that AWS will have catastrophic failures of a single service across multiple regions is extremely low and did not happen on April 21st.  With a few specific steps we are taking, the chances of any extended outage occurring in the future will be reduced to a very low possibility.

2.  Our backup and recover processes worked correctly, the challenge we faced was access to the services needed to bring our systems back online quickly.  Our confidence in our processes in storing and safeguarding critical data and the AWS services that support them remains very high.

3.  We gained greater transperency from Amazon about how some services work within or relative to it's geography.  This gives us new insight into how to reduce risk by distributed redundacy of services.  We are exploiting that information to further spread our "eggs" across multimple "baskets".

We are taking steps to increase the redundancy of our automated recovery services and spread them across AWS regions in ways that reduce the likelyhood of this type event affecting us in the future.  Our data entegrity and backups were not affected and our confidence in that remains very high.  Ultimately, we all need to step back from this specific event and look at the bigger picture.  Cloud services are not without some risk of failure but the resources and skilled professionals working at AWS are unequaled.  Any service you select will come with some level of risk.  The best actions we can take at DigitalChalk is to remain vigilant in our processes and with our architecture, remain redundant and agile.  Our business depends on your trust and we never take that for granted.  

 

The Team at DigitalChalk

Reply
required
required