« Celerra Virtual Appliance HOWTO 301 - Replicating Between Two Celerra VSAs | Main | EMC, Storage Resource Management, ControlCenter, VMware and Corporate Culture »

December 04, 2008

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e552e53bd2883301053638594a970c

Listed below are links to weblogs that reference Does VMware DPM shorten ESX server lifespan?:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Mike Shea

Sure would like some clarification.

Seagate does not use MTBF anymore, but has opted for a clearer AFR.
http://forums.seagate.com/stx/board/message?board.id=BeforeYouBuyBoard&message.id=3

Why not test to clear standards the manufacturor uses? MBTF is almost meaningless.

How did EMC find the time to test this to get a statistically accurate idea of failure rate. It would take a very long time to pull this off.

Also, how did you come to the conclusion that firing up your disk 1 time per day is statistically relevant? I don't know what the answer would be, but since I have a degree in Comp Sci, I think the answer might be "It depends". ;-)

I don't positively *know* the spin down affect drive life, but I do know that heating and cooling cycles are negative. I do know that spinning up is a stressing condition on any mechanical system.

Frankly, I hope you are correct. But time will be the ingredient needed to prove it all out.

I'd also rather hear this kind of ting coming from Seagate themselves. Agree?

Always enjoy stopping in Chad! Hope the EMC diaspora find new homes soon too. :-(

Daniel Eason

Chad,

Interesting post, this is something I raised on VMTN a year ago http://communities.vmware.com/message/818774#818774 with a few people chipping in that the disks now days are sensible enough with preemptive commands to spin down safely...

Anyone thats performed migration of physical servers from one Datacenter to another knows that you will have some hit of kit which fails either immediately or over a period of time.

There is a low possibility of both drives in a raid 1 set failing at once at the end of the day and to be honest with the smart opportunity and cost savings from DPM on offer in my opinion its worth the risk!!!!

Chad Sakac

Mike - thanks for the comment.

Unfortuantely, I was in Paris (EMC forum, meeting with the VMware Sales, Partner and SE teams for South EMEA, and customers of course) this week, so didn't have the amount of time I would normally like to make in any post.

So - any error is mine alone. I asked the CX product/engineering team their testing results, and they didn't use "MTBF" they used the word "reliability" - they may be using a different metric now, I'll double check. How did they do it? Well, we HAVE been testing this for a long time (more than a year before we introduced the feature mid-year), and of course you know the facility in Franklin where they do all the mechanical testing... It's BIG.

I also considered putting a qualifer in the heading (where I said "NO"), but decided to put the qualifiers in the body (where I used words like "strongly suspect"). In the end, I agree with Mr. Eason's comment - even if there was a higher chance of ESX server failure, I think the upside outweighs it because a good VI design is in essence stateless on the server (this gets better in the next version with the Distributed vSwitch or Nexus 1000v/VN-link)

Re: the statistical significance of periodic vs constant soft-error checking, I'm a EE with a CS minor, so I've taken my fair share of stats :-) The point here is that over a slightly longer period of time the entire dataset is checked. During normal IO operation (drive spun up for backup, or restore - which will happen once a day), all the normal checks occur, they added the periodic check (drives will get spun up, a quick set of checks, then spun down) ensures that over time, every block of the drive is exercised and checked (because otherwise "stale" parts of the drive - particularly with the B2D use case - wouldn't get checked)

I do agree that it would be great to hear the drive manufacturers themselves pipe in on the thread - any Seagate/WD/STEC folks out there? Will be interesting to see how Enterprise Flash drives change the dynamic here over the next few years.

Re: diaspora - I hope that ever person finds a place where they enjoy working. Since you are a former EMCer at NetApp - I sincerely hope you're happy there. Life is too short to not enjoy your work, I certainly love mine.

Quoting one of the most former recent NetApp folks now at EMC (who sent this email on 12/2): "To be honest, I felt little uncomfortable adjusting to EMC environment during the first 30 days. I always used to have a nagging question in the back of my mind about my decision to move. The good news is that I don't have it anymore and I really I feel very happy to have made a decision to move to EMC. I would like to take this opportunity to thank you for bringing me here." Moral of the story, IMHO - EMC and NetApp are both great companies, fiercely competitive with each other, and I think it consistently forces us to be the best we can be for the customer (though occassionally brings out the worst in us towards each other).

As a frequent commenter on the EMC blogs, what are your thoughts about prefacing posts on competitor blogs with "Disclosure - I'm a ____ employee"? (anyone interested, Mike's other comments are here: http://virtualgeek.typepad.com/virtual_geek/2008/09/my-likely-last.html#comments and here: http://virtualgeek.typepad.com/virtual_geek/2008/08/welcome---my-fr.html)? Lately I've been trying to do that when I post on a NetApp blog - you know, just so the people reading the comment can apply their own judgement knowing the dynamic?

Charlie Dellacona

The drive vendors all claim 50K spin-ups reliability for their enterprise quality drives. At 10 spin-ups per day, this is 13 years of use before failure - and 10 per day is probably high. So, even given some exaggeration on the reliability ratings this is longer than the useful life of most servers.

Your comprehensive response is more than low quality FUD like this deserves.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

  • BlogWithIntegrity.com

Disclaimer

  • The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC. This is my blog, it is not an EMC blog.

Enter your email address:

Delivered by FeedBurner