“In this rapidly changing world of ours, this is the new face of reliability. It is no longer about preventing failure. It is about designing resilient services in which inevitable failures have a minimal effect on service availability and functionality.” —David Bills, Chief Reliability Strategist, Trustworthy Computing, Microsoft.
Crashing cloud servers are not a new thing; they aren’t even rare. And it does not only happen to small and new players. The biggest and most adept of them all–Amazon, Facebook, Google, Microsoft, and even the largest domain name registrar GoDaddy–have had their cloud servers crash in recent years. Amazon’s crashed 3 times in the second half of 2012 alone and, yes, the sites and services hosted on those servers went down too, as did those hosted on Angani, recently.
Angani Limited, a public cloud services provider based in Nairobi, formally launched its cloud services back in April. It’s been reported that they recently had an outage on November 4 (~11:30pm EAT). Quite a few customers have been affected and some are venting on Twitter.
According to some reports, there has been degradation of service for some weeks and it turns out that this service disruption is at least partly due to some “corporate restructuring”.
Although Angani’s systems were corrupted, there appears to have been no data loss. I will provide more information as it continues to come in from customers or Angani itself (which is quite mum at the moment).
As of this writing, I cannot access Angani’s site from my location so I don’t know what their service level agreements look like, what their data recovery policy and capabilities are, and what their incident response plans are, but it looks like the issue will take a while to resolve since they do not have a timeline of how long it’ll take to fix it.
The moral of the story: If you use cloud services, back up your backups (and mirror those backups!)