So – first things first, everyone is going on and on about the AWS outage last week. Here’s my short 2 cents on it:
- People who say “hey see, public cloud availability is not ‘enterprise ready’!” IMO are being alarmist, and are off base. For years, many traditional enterprise apps (sadly) dream of the overall availability that many of the cloud services have exhibited for years now.
- It highlighted that you need to know what you’re doing when using these public cloud services – just because it’s a public cloud doesn’t absolve you from that Arguably, the app-side folks need to know more (they have to build in app-level resiliency as a core part of their design). If apps were written to have app-level resiliency and leverage multiple AWS availability zones, they would have been OK.
- That many people whose use cases demand more (more availability/recoverability/compliance) really weren’t thinking enough. Man, I HOPE that this thread on the AWS support forums is a gag. If not, it’s really scary. https://forums.aws.amazon.com/message.jspa?messageID=241597
I will make one critical statement though – it highlights why I believe the EMC/VMware view of cloud – that being able to federate in and out is VERY important (along with the ability to attest to trust/compliance in the cloud).
My point here is that Amazon going down isn’t the main thing (again, if you had a well-designed app, and planned for the architectural model of AWS, you would have been fine). But, lets say that it dragged on and on, or multiple availability zones started to fail, or Amazon had a huge breach – or heck, even when out of business (unlikely – but could happen to us all) – if switching is virtually impossible (due to API lockin, app-dev lockin, or the huge problem of getting massive amounts of data out) – now THAT’S a scary proposition…
That’s why I think all the vendor community working towards portability (at every layer), and technologies to make workload and data federation easier is very, very important. None of us are perfect – but this case really does highlight where that potential “Hotel California” effect Paul Maritz always talks about could become very, very bad.
Switching gears…
EMC World is right around the corner. If you haven’t registered yet, there’s something wrong with you :-)
So – the hands-on-labs (vLabs) is becoming epic. This will be very, very cool – check out the link here for details. At the link you can see a list of all the on-demand labs we will offer, and there is a picture as we stage everything. All powered by the EMC Demo Cloud. A big shout out to the team (which spans all of EMC) who is working so hard to make it happen.
To give you an idea of how epic this will be, here’s a shot of our much, much smaller PLAN B.
The biggest risk for us will be a telecom failure – as all the HoL will be literally powered out of our Demo Cloud which is in RTP. These travel half-racks have VNXes and UCS C-series rack-mount systems which at a moment’s notice SHOULD be able to jump in if things go horribly awry. But then again, the risk of a “no net” environment is half the fun :-)
Speaking of “no net” – Chad’s World Live is coming along – will have some very cool demos, and a bunch of customers on stage. Of course, there are some bits we’re trying to “gag up a bit”. Here’s a shot of Wade and I crashing Joe Tucci’s office…
And then inevitably getting kicked out by Pam :-)…
Make sure you register! EMC World is going to be a gas!
How cool are those half racks!!!
Posted by: Duncan | April 25, 2011 at 07:11 AM
Man. A small stack of C-Series, a VNX 3300 and some management/support magic sauce and that could be a little Vblock! Hmmmm....
Posted by: Jeramiah Dooley | April 25, 2011 at 07:28 AM
OMG... Whilst I appreciate that "md76040303317" is trying to make a point (albeit in *very* poor taste), thankfully the post is in part a hoax (see section of post thread below).
I have managed complex life critical/mission critical systems for a large Medical Research organisation in the UK; whilst I did everything practicable to prevent unforeseen IT disruption/outages (off-site DR facility, fail-over circuits/servers; etc.), when I worked on implementing a BCM/DR plan, we also had appropriate manual 'last resort' protocols in place for absolutely critical systems; so that in the event primary and secondary IT services let-us down, we could work-around these.
--Yogesh
"Re: Life of our patients is at stake - I am desperately asking you to contact
Posted by: md76040303317
Posted on: Apr 23, 2011 2:08 PM
in response to: Marc Spitzer Reply
This is a home based system, not an intra-hospital system. So the promised 99.95% uptime is fine. But this situation showed that the promised 99.95% = fiction...
BTW. All three servers are working - hopefully the situation will remain stable."
Posted by: Yogesh Sharma | April 25, 2011 at 09:09 AM
HA HA HA - You crashed Joe's office...good call. Whose MB Air is that on his desk? Let's hope Wade "liberated" it on the way out.
BTW - spot on with your AWS stuff. As our world becomes more simplified and commoditized, one phrase will begin to resonate MUCH louder:
Caveat Emptor.
AWS's SLAs (which have not changed since 2008), says EC2's "Service Commitment" is 99.95%; S3's is 99.9%...if you go premium. These are AWESOME SLAs for stretching into temporary hybrid cloud space when needed, but should NEVER EVER EVER EVER be considered for the sole location of "life critical" (to borrow from your link) applications.
And THAT is the second challenge of a commoditized Public Cloud - it gives more people the ability to simply bypass a seasoned and cynical IT pro who ALWAYS thinks in pairs - "If this function is so critical, what happens when it goes down?". You don't need to think about that if you have a dream, a credit card, and access to AWS.
May we live in interesting times.
DP
Posted by: DP | April 25, 2011 at 09:42 AM
> If apps were written to have app-level resiliency and leverage multiple AWS availability zones, they would have been OK.
I think that was the problem here -- all of the availability zones in US-EAST had the problem. People who were following Amazon's guidelines were still hooped.
The way that Amazon bills for traffic between regions makes it expensive to replicate data between regions, and they don't offer a solid way to use both data centres at the same time if you do so.
I forsee some new product offerings along those lines though ;)
But I agree -- not a reason to dismiss *wavy hands* the cloud
Posted by: Sean | April 25, 2011 at 10:18 AM
Mike @NetApp - Great comments on AWS.
It is a scary proposition when speaking with customers and it becomes obvious 'The Cloud' does not always go hand in hand in their minds with a conversation about doing it right - things to keep applications and data properly protected and available.
"Cloud" does not equal 'Safe" in all cases, and it never absolves the IT buyer from building and buying the solution that meets the required SLA's. There has always been a basic set of business requirements in every shop, and the big question is this: Will The New Architecture Meet My Requirements. The answer is a solid YES, but only if we ignore the glitter and keep our core requirements close at hand. Else, nasty surprises lurk.
It brings me to an adjacent point, one that I've experienced as a vendor(mostly) and a customer (a little).
Who is at fault when a failure like this occurs? In my experience, everyone owns a piece of the blame - vendor, customer, consultant - all of them. I've never seen a disaster occur that has brought a business to it's knees where it was not a perfect storm of circumstances owned by all the participants. Ever.
It does happen occasionally that the full set of the accused stand up together and take ownership, make speedy repairs and then strengthen not only their infrastructure, but the business and personal bonds between them. That is a real big win in the face of failure. People doing good by one another.
Posted by: Mike Shea | April 25, 2011 at 10:32 AM
Just curious -- what do you use for the half-height racks? I'm considering one for my home server setup, and those look pretty good :)
Posted by: Jeff McJunkin | April 25, 2011 at 03:01 PM
Love the way that Plan B is the half racks whereas everyone else has that as plan A and Cloud Plan B.
Outages happen always will I'd much rather Amazon is managing my outages not some snotty 20 yearold who is more concerned with getting back to gears of war.
Posted by: PlayBlue Judy | October 18, 2011 at 05:33 AM