« One perspective on the Cisco Application Centric Infrastructure launch... | Main | This is unbelievable, and getting to the point of the ridiculously awesome... »

November 14, 2013

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Vaughn Stewart

Kudos to you, EMC and the XtremIO team on the launch. The adoption rate of flash is exceeding everyone's expectations wether the need is for performance or a means to address data center resource constraints (i.e. power & rack space).

I look forward to our healthy debates around technical details but for the moment, Nice Job!

Cheers,
v

Matteovari

Great post Chad! explains well why XtremIO is so unique and the great work EMC is doing

Cris Danci

Great post! Really good to see some transparency at this level.

Sadly, for all it's greatness, flash as a technology has enabled many new entrants into the market at an extremely low cost point. I once heard a wise man say: “A little flash goes along way, imagine what a lot can do” :) Architecturally, a lot of flash allows almost any array (regardless of it's architecture) to perform by today's standards. This is not purely just because flash is fast, but it's because we have not matured in general application development and requirements (most applications are still being developed to deal with the short comings of traditional storage technologies being the bottleneck) and only specific workloads (generally large aggregated ones or poorly designed ones) have flash requirements (again this is why hybrid arrays are a good fit for most workloads). Simply having an array that supports a lot of flash itself does not suggest good design; and realistically when we are talking about performance on scale, is all about good design, the backend will eventually become a problem sooner or later.

My mantra when it comes to performance is “bad design works, good design scales” and as a new generation of applications emerge and workloads characteristics increase the true nature (from a architectural and design perspective) of different flash arrays on the market will become apparent.

I think it was extremely (no pun intended) wise of EMC to spend time on XIO upfront rather than just rushing it to the market. The last thing we need is another ill designed array backended by lots of cheap commodity flash that will work today, but fail to scale tomorrow.

Well done.

conrad walker-simmons

Great post.
From my perspsctive we run allot of great EMC kit here at Sportsdirect in the UK. For us we adopted Extremio early on with four bricks in two node arrays. I can tell you the results have been staggering. Since removing xenapp servers and rds servers from the vnx and vmax and loading them on the XtremIO clusters. We have seen great de-duplication ratios about 5-1 real world on servers and and around 50k iops per array.

All this performance from a disk response time perspective comes in at 0.1ms. So results have been staggering,surprising and were still loading them daily. I expect to hit 100k iops per cluster array within a year hosting around 400 Virtual servers on them.

Hope this has been helpful for those that have yet to experience how ground breaking this is and what it means for farm servers and vdi deployments.

Endre Peterfi

Great to see a customer post this about XtremIO! Tt is indeed a great product but what is more important is that it has the EMC support structure, something that took EMC years to build and all the other AFA startups don't have ...

Thom Moore

...all hash calculations (inherently, as they are a more dense representation of a set of data) involve some insanely remote probability of hash collision – but these are astronomical.

Do you have any supporting analysis of XtremIO's hashing methodology for this claim or is this just a 'trust me'?

btw there are lossless hash algos. collisions only occur in the lossy ones.

Thom Moore

Do you an have answer to my question on the hash analysis? You say they 'took the time to do it right' so they must have done one. Can you share it?

Thom Moore

What analysis was done to validate your claim that the odds of collision is astronomical? They took the time to 'do it right' so it must have been done. Will you share it here?

Chad Sakac

@Thom - thanks for the persistent question. My apologies - have been really slammed.

Yes, analysis has been done.

Summary: Less than one in 10 septillion (a trillion trillion) chance of a hash collision after storing one petabyte of data. That's 2.58494E-26 if you want the specific value.

Put another way - even if you stored all the data created on Earth in 2011 on an XtremIO array, the probability of a hash collision is less than one in 10 trillion.

There is a similar probability of a meteor landing on your house (roughly 1 in 182 trillion).

reference: http://preshing.com/20110504/hash-collision-probabilities

This is the formula used, with our given hashing algorithm and our hash function size.

It's also the root of the expression "astronomical" :-)

Honestly, we run into far more practical issues so much earlier than hash collisions, the math is humourous. Human error (by customers or EMC/EMC Partner services, bugs in code) are... materially more likely :-)

I've always found this argument to be an interesting one, because the math is so compelling that "hash collision = something to be scared of!" (often from people who don't dedupe) argument doesn't hold up - at least for me.

Thanks again for asking!

Thom Moore

Thanks for the response. Jeff Preshing's analysis assumes a hash function with uniform output probability. He states it up front. But XtermIO uses SHA1 and no one to my knowledge has ever characterized its output probability so Preshing's assumptions can't be met. Recent cryptanalysis provides attacks showing collisions at much less than the expected birthday paradox difficulty (2^63 operations cryptanalysis vs 2^80 birthday) suggesting anything but output distribution uniformity.

I wouldn't call the math humorous, I'd call it incomplete. Did anyone prove the output of SHA1 isn't all concentrated in one small sub-range? If not this is a just a 'trust me' not a 'done right'.

Chad Sakac

@Thom - thanks for your comment.

The birthday paradox (at least to my knowledge) is generally applied to brute force attacks on hash functions when applied to crypto, and of course, in this case we're talking about hash-functions and collision likelihood in a data set. To me those are different.

People who are interested - good reading here:
https://en.wikipedia.org/wiki/Birthday_problem

I suspect (and I encounter many folks, many opinions - and some just want to prove how smart they are). I'm sure you are very smart, and in my experience with that type = this will turn into a pissing match.

So, I'll leave my comment to stand (and yours and any others you choose to post) and people can judge for themselves.

And personally, I DO think it's funny - 2^63 operations (brute force attempt to determine a hash value, birthday paradox) is still ~1:100 Billion odds.

I like those odds.

Thanks!

Thom Moore

I'll leave it at this. To know whether its safe to use a hash function you have to know the odds of collision and to know that you have to know the probability mass function of the hash, i.e. characterizing the output probabilities. No one has ever done so for SHA1 that I know of. Without it you're just guessing at the collision rate and you can't claim 'to have done it right'.

And not that I agree with your number, but 1 in 100 Billion is 1E-11, or 10000 times more likely (worse) than the undetected error rate of enterprise grade magnetic disks (1E-15). I doubt most IT people would find that funny.

Chad Sakac

@Thom - thanks for your comment

I get that you are stating that since someone has not done (to your knowledge or mine) a study of the randomness of the probability distribution of the SHA1 hash has not been done - it could have a non-random distribution or "probability mass" - it could be lower than 2^80.

I can tell you as someone that is intimately aware of results of customers - both good and bad (all the EMC execs are informed when we have a Sev 1 issue that is profound, and there is material data unavailable or data loss), this is academically fun and interesting (it really is! Thank you for the dialog).... Hash collision doesn't even enter into the realm of material consideration. There are much more material considerations to make sure that the customers data is secure.

BTW - if I was advocating SHA1 as a core crypto basis - I would be arguing with you, but that's not the case here, and the Wikipedia article I noted also drew that distinction (using how Git uses hash functions as an example).

Thom - sincerely, thank you for adding to the dialog!

Thom Moore

I could go on about the lack of necessary analysis but you raise an even more interesting question about error detection and issue handling.

Suppose a collision did happen at a customer using this device. The data would be misevaluated as duplicate, a pointer to previously stored data supposedly the same but not really would be entered and the user's data, not recognized as unique, would be discarded. No error would be recorded or reported to the host on the write. It would look successful.

Then someday the host would read it. From the point of view of both the device and the host, up to but not including the top application, the read would also look successful. No errors recorded or reported.

The top (or near) level application would receive data other than expected, it might detect bad data or it could just erroneously process it.

When the app guy tried to debug the bad processing, it would look like programming error because there would be no device error indication worth chasing.

You would have material data loss but no one would recognize it as such. Stealth loss.

Since you are intimately aware, how is EMC set up to handle a problem that hides in this fashion? Would it even catch it?

The comments to this entry are closed.

  • BlogWithIntegrity.com

Disclaimer

  • The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by Dell Technologies and does not necessarily reflect the views and opinions of Dell Technologies or any part of Dell Technologies. This is my blog, it is not an Dell Technologies blog.