Today, the EMC XtremIO system became GA. There will be a lot of back-and-forth in the industry on this topic. There will be us singing high praises (with customers of course), and then a lot of mud slinging from many fronts :-)
There are three common questions I’m seeing come up – in social media, from competitors, from customers:
- Why buy vs build? More specifically – why not build it, why not just run with all-flash VNX/VMAX (perhaps with some focus on differentiating on the media in some way)
- Isn’t this going to cannibalize some VNX/VMAX business?
- Why so long? More specifically – the acquisition was back in May 2012, and we originally targeted a Q2 2013 general availability – how should people read into it, and read into others who have been in market longer?
Everyone will come out swinging here (and I’m sure EMC will as well). Everyone will trot out their “battlecards” (and trust me, everyone has them).
I’m (as always) going to try to be as transparent as possible. This transparency and detailed technical discussion is “sunlight”.
I think that sunlight and explaining engineering rationale is always good.
I think I have a somewhat unique/fortunate perspective as the leader of the systems engineering team at EMC - the largest player in this space. That gives some degree of “insider access” through early stages of thinking (including M&A) through the phases of bringing technologies and solutions to market at scale… And frankly, how we approach that at scale, is, well… pretty cool.
So, without further ado, here’s the story of how we got here, what are the unique architectural things that I think are really cool about XtremIO and what I think is going to happen next…
So – 5 years ago, the whole company embarked on a “flash is going to radically disrupt the business of persisting information”, and set upon a broad “apollo mission”. We rapidly came to the conclusion that the disruption of flash was going to be broad and pervasive – affecting existing architectures & enabling new ones + affecting existing workloads & enabling new ones.
On the architectural front, a few things (even 5 years ago!) jumped out:
- Assuming that everyone would have access to the same core NAND technology from a narrow set of fabs (there are really only a few biggies: Micron, Samsung, Hynix, Intel, Toshiba) – flash tech (SLC, , eMLC, cMLC, and TLC) and packaging (SSDs, PCIe) would be pretty common across the industry – not the place to try to innovate, as this would intrinsically commoditize quickly. One large early player is feeling this harshly now – and I suspect you know who I mean. This is a really, really hard problem. It DOESN’T MEAN you can’t innovate around the hardware layer, but that it’s very tricky – and you tend to get outrun over short timeframes.
- Hybrids would continue to serve, for the foreseeable future (and remember, in IT land, anything past a 3 year horizon is pretty unknowable), the bulk of the IT workloads. This was intrinsic economics (of magnetic and flash media) and the fact that a huge swath of the market has blended workloads and puts them in a common place.
- Hybrids would be pretty heavily impacted – mostly because their IO paths and code was not designed for ultra low latency (messes with caching), mature RAID would cause write amplification (bad with flash), and perhaps most fundamentally, the architectures were not designed for hundreds of thousands to millions of IOps being commonplace.
- That the potential biggest disruption (even more than All Flash Arrays or “AFA”) is in the world of server flash – not so much on the hardware side (see point #1), but more on the opportunity for new “mashups” (sometimes called “hyper-converged”) architectures where internal storage in the server is shared/distributed. This architectural model is fundamentally enabled by server-based flash (for low latency transactional workloads as a cache/buffer/tier) and 10GbE. Think of VMware VSAN, EMC ScaleIO, Nutanix, Simplivity and others as early examples of more and more to come.
- All-Flash Arrays (“AFA”) would disrupt the Hybrids, in two particular places: 1) where the dataset coupled with inline dedupe (and importantly RELIABLE, CONSTANT inline dedupe) drive a different economic curve for flash media; and 2) workloads where the dataset may not be dedupable, but is relatively small – and focused more on max IOps, per IO latency, IOps density, and $IOps. Those two spots “define” the AFA sweet spot for now – and will expand as flash media cost of all types continues to drop. BTW – people shouldn’t over rotate no this topic. Disruption != displace. It’s more like overlapping venn diagrams, and also there is an aspect of “time”. Think of all of the AFA players – and you can see these workloads are their target. We did note that this was going to be a brutal game. What’s interesting to me as people talk about AFA startups, people talk about revenues, but they are rarely profitable. Their burn rates are often enormous, and the flame-out is huge. One large early player is feeling this harshly now – and I suspect you know who I mean.
These 5 observations lead to a macro conclusion: a real flash strategy is more than a product. It’s Hybrids, Server Flash, distributed server storage stacks, and all flash arrays – and the goal is to be the best in all of these areas.
One thing I just love about EMC is that do believe that we need to continuously innovate, do it organically and inorganically, disrupt ourselves – and try like crazy to stay focused on the customers. If we do – while we do disrupt ourselves – we continuously gain marketshare as customers vote with their dollars. This, BTW is the answer of “will AFA cannibalize some of the Hybrid market” – the answer is “sure!”, but to be clear – when people are hyperbolic and say “Hybrids are dead!” – they are wrong (and I suspect that when you look, they are an AFA pure play :-) We don’t worry too much about cannibalization. As you’ll see from some of the below, people will keep selecting Hybrids (in our view) for some workloads. In some cases, they’ll even use all-flash configurations of those hybrid architectures (for specific use cases, or for specific data services or host types). But – for the right workloads (which will expand over time), AFA designs that are build ground up for flash will indeed cannibalize other architectures to some degree.
Now, while we embrace self-cannibalization, AND believe that it’s a net positive (for the customer and for us), we’re not perfect, and we make mistakes – but damn the torpedoes, we will fight, and fight to win.
Ok, let’s look at these 5 strategic observations about the disruptive effect of flash and their strategic implications, one by one:
- This first observation lead to: don’t follow FusionIO (and we didn’t), instead focus on the software around server flash (we did), and partner with PCIe flash vendors. It also said to not assume that drive/PCIe will be a differentiator for anyone (and increasingly it isn’t), but to partner as closely as we can with the flash manufacturers and leverage every volume advantage we can (and we did). I don’t think we’ve nailed this yet – but I think that as we continue to do things that add software value to server flash (like ScaleIO), I’m pretty confident that this is the right way to go.
- This second observation lead to: put the pedal to the metal on the hybrids – and continue to invest in innovation there (and we did – VNX MCx and Rockies, VMAX Enginuity 5876 – and things to come), as they will remain (again, for the foreseeable future) the vast majority of the market – while other things will have more buzz. Think of it as steak and sizzle. A good meal needs both :-)
- This third observation lead to: start to re-architect Hybrids like crazy for tiering as a fundamental feature (which we did), look at ways to augment the Hybrids using Flash as a cache (which we did), plan for rare cases where the spectrum of “any hybrid” including “dense GB” all SAS configs all the way to “dense IOps” all flash use cases (focused narrow use cases where either you want no data services, or conversely specific data services like consistency groups/SRDF at scale). It lead to the huge engineering work in VNX land to start to view configurations with 10-20% flash and hundreds of thousands of IOps as “common” (which we did with MCx). You can fully expect that similar work is going on in VMAX-land.
- This fourth observation lead to: VMware going to town on VSAN, and EMC ScaleIO focusing on “larger scale” use – many nodes, and blended hypervisors/physical. If you want to get a sense of why I say this is potentially MORE disruptive than AFA – all you need to do is look at this. 200 scaleIO nodes running in AWS – driving huge bandwidth and low latency. Interestingly, this is a much, much more compelling performance envelope than just using EBS itself :-)
- … and then the fifth AFA observations – which is the critical piece for today, which lead to these conclusions:
- EMC won’t be competitive in the cases where AFA really plays by simply doing 2 + 3 (in other words, just using Hybrid array intellectual property for all flash is a losing strategy WHEN THE WORKLOAD demands AFA characteristics and data services like inline dedupe). It will be interesting to see whether (and how) other folks came to the same or different conclusions. You can look across the industry and SOME are going hard down the “our previous technology loaded with Flash is an ‘all flash array’”. Others are taking a more similar “AFA needs a ground up re-architecture”. Look at HP and HDS, and then on the other side look at NetApp. In the NetApp example, it’s interesting to look at NetApp currently positioning E-Series today as an “All Flash Array” (I would argue it’s more like an “all flash VNX”), but seemingly (?) working on Flashray as their real AFA strategy. It’s not to say that all-flash variations of hybrids aren’t valid, but IMO they aren’t sufficiently “architected differently”.
- Inline dedupe is critical, and not just the checkbox but HOW it works – because it needs to be a “basic artifact” that is always on, because it’s SO critical to the economics of AFA. Expect a lot of competitive “I HAVE/WILL HAVE INLINE DEDUPE!”… and I would encourage customers to look deeply at how people architect their variation of this – it’s central to AFAs.
- Scale-out models – and here I’m talking about real scale-out, not “managing/automating multiple units” (like we can with Unisphere and in a much more sophisticated way with pooling/abstraction/automation via the ViPR controller), or “federated” models (think of NetApp cluster mode, which still has files inherently behind a single “brain” – but can move data and the virtual brains to continuously work to rebalance), but real scale-out (inherently and automatically balanced) is very important. Expect a lot of competitive “I WILL EVENTUALLY DO SCALE OUT!”. Fundamental scale out is not a feature, it is an architectural choice. It’s harder to build up front, but impossible to do well after the fact. And, when I say that scale-out is important in this use case, I that I mean not that it is a “little important”, but a lot. Why? Because the AFA’s will be “islands” that will start small, but tend to grow. I’ll reiterate the “why” in my view… Look back at the quote in the observation on what workloads are the target: “...focused more on max IOps, per IO latency, IOps density, and $IOps.” If you have these workloads, and don’t have linearity under normal load, dyanmic load, during the lifecycle load, and frankly growing load… It doesn’t deliver what customers are targeting AFAs for in the first place.
EMC makes a huge amount of investments – and looked at almost all of the all flash startups – and XtremIO stood out head and shoulders above the rest. This is an XtremIO X-brick:
Here are are the four fundamental, unique architectural reasons WHY XtremIO is so different, why EMC acquired them, why we think it was worth the time to harden and get right, and why we think it’s the best AFA on the market:
- First: Content based data placement:
- Every IO on ingest gets a multi-stage hash value. People will poop on the possibility of hash collisions. IMO, this is competitive noise. Yes, all hash calculations (inherently, as they are a more dense representation of a set of data) involve some insanely remote probability of hash collision – but these are astronomical. If you’re worried about this, you should really be worried about more likely issues. What are astronomically remote scenarios? Well… think of the probability of a comet impact on your city, or the startup company you bought your product from going out of business, or being acquired. I’m not saying that you should worry about startups going out of business or being acquired – but if you worry about statistically astronomically remote scenarios, or someone is encouraging you to worry about that – those sort of business scenarios are WAY, WAY more likely :-)
- This means that inherently, all data is balanced across a cluster (because the hash function determines the balance on a single X-brick and across X-bricks. This (to me) is one of the architectural definitions of “true” scale-out designs (it’s certainly true of ScaleIO, Isilon, and well-configured VMAX… and more true of VMAX going forward). The video below shows this clearly – note how the distribution of “lights blinking” with a random mixed load is very evenly distributed:
- Second: Dual-Stage (and DISTRIBUTED) Metadata Engine
- This is a very important architectural point. The Metadata (the hash value/location) is a critical element in these architectures. If it’s not in memory, inevitably there non-performance linearity. If you destage to SSDs (or god forbid magnetic media), there is an apparent “downshift” in performance. BTW – this isn’t just in AFA land. In VNX, for example, the amount of metadata/memory relationship is the thing that ultimately determines the performance envelope when using pools that use FAST, that use Thin, that use Snaps, that use post-process dedupe. If the amount of metadata (proportional to the size of the pool) destages from memory to SSDs to magnetic media in a pool – the performance of the system takes a dump. I don’t feel bad about saying this – because VNX in rockies does this pretty well relative to the platforms it competes with. Interestingly, all the AFA players I can think of except XtremIO have this design principle (very “VNX like” in implementation). There’s a reason each XtremIO node has 256GB of RAM. There’s also a reason that we’re supporting 10TB Xbricks and 20TB Xbricks at launch – it’s linked to this metadata topic. For XtremIO (for the sake of performance linearity) it’s got to be in memory.
- NOW THIS IS IMPORTANT (as there will be a lot of “how does this work” from competitors) – of course this information needs to be reliably persisted, because if you don’t nail it, you have a bad day. The way one does this matters. In the XtremIO case, metadata is stored in DRAM, and the journal is mirrored across the two brains in each X-brick over a redundant IB interconnect. The journal is also stored on local SSDs in each brain in an X-Brick in the case of an IB interconnect failure. The scenario of “total loss of power”or “isolation of an X-brick” is covered through batteries to destage. As you can see – this is a very robust model with no SPOF. Why design it this way? The view was that the linearity of low latency performance (why people look at AFA) meant that in our view, keeping metadata in DRAM at all times was the right architectural approach. BTW – it’s not an “unproven” model – think of IB interconnected NVRAM on NetApp as analagous (but clearly that wouldn’t work here, because you could never get enough NVRAM), or the mirrored CMI write cache of VNXes that use batteries to destage. The key is that the system memory needs to be bigger, and the metadata journal mirroring/destage to SSDs as another protection mechanism.
- Third: A unique data protection model – XDP (Xtreme Data Protection)
- Classic RAID creates an unnecessary wear load on flash, the extra read (and more importantly write) operations for parity and mirroring have a big downside, as it accelerates the wear process on the flash media. This one of the reasons that most of the media in things like VNX and VMAX arrays started with SLC, and added eMLC to withstand those write cycles over the life of the drive.
- XDP is the protection model used – and has only a 8% overhead (important because customers pay a lot for the media in AFA), and 1.22x IO load on reads and writes.
- The upside also includes that there are no “hotspares” and that there’s a very fast rebuild (with no performance impact)
- There’s a huge difference here that manifests itself in “linear behavior always” that is worth exploring a little more. Most (?) “traditional storage models” (common in both Hybrids and most other AFA designs) is that there is some sort of data layout/log journal (something that looks like the “low level parts of a filesystem” that optimizes around finding “stripes” to dump down a bunch of data. This means that inevitably as less and less pools of space are available, there’s some background “cleaning” process, sometimes called “garbage collection”. This process, when it pops up, breaks the theme of “linear low latency, always”. The video below shows this clearly – and is a challenge for every other all-flash array in the market. The performance (latency and IOps) is the SAME when the array is empty and when it’s full:
- Fourth: Shared (and DISTRIBUTED) in-memory metadata
This is a funky, and intrinsic architectural thing. For this architecture to work, each of the nodes must share and distribute their metadata. The diagram below highlights how these work. Each XtremIO storage processor shares it’s metadata (and the two access unique user data), but use an IB interconnect for inter-node RDMA fabric. Without this sort of model (I’m not saying the specific implementation), it means that each node’s metadata (and indirection model) becomes a scaling bottleneck, and you don’t have a “symmetrical scaling model”.
Now, I’ve notice that some are scratching their heads about “why 4 nodes vs. 8 or more”. This, as you can see from all of the above, is what you would call a “tightly coupled” scale-out model. The more tightly coupled a scale out mechanic = the lower the small IO latency. Think of the shared memory model of VMAX being another example of this). The inverse is also true: the more tightly coupled a scale-out architecture = the harder it is to scale the number of nodes. Think of Isilon and ScaleIO being in the “middle” in the architectural continuum (“loosely coupled”), and things like the ViPR object/Atmos models (“eventually consistent”) being at the far end of the extreme.
With tightly coupled architectures – testing, qualifying, and coding for “more nodes” is non-trivial. It’s for this reason that at GA, up to 4 nodes are supported, but will increase rapidly in 2014 to 8.
It’s also the reason that out of the gate, there is one important thing that customer need to know. On the GA code (v2.2) of XtremIO, the code for redistribution of data is not on (targeted also for early 2014) – so customers should be thinking about their particular needs in terms of total system IOps and capacity, and look at one, two, three, or four X-Bricks.
These four architectural characteristics are ones that I believe NO OTHER ALL-FLASH ARRAY on the market does. If you look back at the criteria that (in our minds) drives the AFA use cases, XtremIO was far and away the best technology we saw out there – and that’s the answer to the first question of “why buy vs. build”, and why XtremIO was the right choice for EMC, and for our customers.
Beyond that – it gets, day one, a bunch of great EMC family goodness:
- VPLEX and Recoverpoint support for stretched active-active use cases and perhaps the most powerful remote replication capabilities on the market.
- Integration with the EMC vCenter plugins day one for strong VMware-integration (including the most facemelting XCOPY implementation on the market due to the inline dedupe/hashing model), and ultimtely via ViPR, rich integration with orchestration frameworks.
- VBlock Specialized Systems focused on VDI use cases – and to be sure, VDI is the most sweet of the sweet spots. This Vblock has the awesomeness of UCS (IMO, the best server platform for VDI), XtremIO for boot images, and Isilon for user data. This is what one of these specialized Vblock systems looks like:
So – if it’s great – why did it take time to bring it to GA?
Is it that it’s not ready? Nope. Remember – 4 XtremIO X-bricks supported almost the entire VMware HoL load this year:
http://www.xtremio.com/vmworld-2013-cool-facts-about-xtremio-powering-the-hands-on-labs/The reason why it took time was simple – we needed to get it right.
- We ran a directed availability period where customers were using it, trying it, putting it through it’s paces. many customers surprised us by insisting (seriously) that they wanted to buy it pre-general availability.
- Throughout the directed availability period, we discovered things that needed fixing in the HA functionality – i.e. “pull wires out” scenarioes. We learnt a lot in the first half of the directed availability, and fixed a lot of code. I fully expect (and see already) that there will be a lot of FUD from competitors on this note. We do encourage customers to put us through the same paces (both planned and unplanned) that our late stage directed availability customers did.
- NDU code – the NDU code is in the GA product. Like anything – you can expect us to be conservative, as we pushed really, really hard on the HA code and scenarioes, and less on a relative basis on NDU scenarioes. We’ll set expectations for a disruptive upgrade for the next upgrade, but it’s not because it’s not NDU, but because we’re exhibiting the same conservative approach we’ve taken throughout the whole directed availability process.
- Throughout the directed availability period, we discovered things in the DA hardware that needed to be fixed before shipping the GA hardware. The first wave of hardware was SuperMicro based, and had lots of issues. LOTS. The GA hardware is EMC manufactured.
- We used the time to build up the services and support organization to get ready for mass market demand.
DA customers loved it, BTW (even ones that hit the HA issues – I think there was one exception that I know about). We got a lot of quotes like these:
“There are few things in history that have a significant impact on advancing technology – I see XtremIO as one of those technologies” – Craig Englund, Principal Virtualization Architect & Sean Collier, Sr. Administrator, Boston Scientific.
Here’s one more thing to think about. When you’re a startup, your ability to get customers is “naturally gated” by your “startup-ness”, and your ramp in the marketplace. If you look at some of the startups that have the largest “volume” to date – getting to 1000 customers is a big achievement (one to be proud of), and takes more than a year, two or three of being in the marketplace. You learn a ton at the 10 customer scale, 50 customer scale, 100 customer scale, etc. Each step up is a period where you can harden and mature your product (both the hardware and the software). Note that as some of the players are learning, even at that point, sometimes profitability can be elusive.
Now ask yourself this: “how long will it take for EMC to get 1000 XtremIO customers?”. Answer is not long. As a huge company, not only do we need to think about our customers first, and our brand – but also the basic premise that there is no “let’s go small” gear in the gearbox. Once it’s GA – rapidly there are ton of customers with the software/hardware, and there is NO TURNING BACK. That’s why we’ve pounded on this so hard.
I was in Israel a couple weeks ago with the engineering and product team, and have been working with many of the DA customers. The product is ready, and the customers love it.
So… XtremIO is here, it’s global! Xpect more performance. Xpect more scale. Xpect more efficiency. Xpect more endurance. Xpect the unxpected. If you have workloads that are an all-flash array fit - push on EMC XtremIO and our partners – you will be amazed. Comments ALWAYS welcome!
Kudos to you, EMC and the XtremIO team on the launch. The adoption rate of flash is exceeding everyone's expectations wether the need is for performance or a means to address data center resource constraints (i.e. power & rack space).
I look forward to our healthy debates around technical details but for the moment, Nice Job!
Cheers,
v
Posted by: Vaughn Stewart | November 14, 2013 at 04:54 PM
Great post Chad! explains well why XtremIO is so unique and the great work EMC is doing
Posted by: Matteovari | November 15, 2013 at 10:23 AM
Great post! Really good to see some transparency at this level.
Sadly, for all it's greatness, flash as a technology has enabled many new entrants into the market at an extremely low cost point. I once heard a wise man say: “A little flash goes along way, imagine what a lot can do” :) Architecturally, a lot of flash allows almost any array (regardless of it's architecture) to perform by today's standards. This is not purely just because flash is fast, but it's because we have not matured in general application development and requirements (most applications are still being developed to deal with the short comings of traditional storage technologies being the bottleneck) and only specific workloads (generally large aggregated ones or poorly designed ones) have flash requirements (again this is why hybrid arrays are a good fit for most workloads). Simply having an array that supports a lot of flash itself does not suggest good design; and realistically when we are talking about performance on scale, is all about good design, the backend will eventually become a problem sooner or later.
My mantra when it comes to performance is “bad design works, good design scales” and as a new generation of applications emerge and workloads characteristics increase the true nature (from a architectural and design perspective) of different flash arrays on the market will become apparent.
I think it was extremely (no pun intended) wise of EMC to spend time on XIO upfront rather than just rushing it to the market. The last thing we need is another ill designed array backended by lots of cheap commodity flash that will work today, but fail to scale tomorrow.
Well done.
Posted by: Cris Danci | November 16, 2013 at 08:12 AM
Great post.
From my perspsctive we run allot of great EMC kit here at Sportsdirect in the UK. For us we adopted Extremio early on with four bricks in two node arrays. I can tell you the results have been staggering. Since removing xenapp servers and rds servers from the vnx and vmax and loading them on the XtremIO clusters. We have seen great de-duplication ratios about 5-1 real world on servers and and around 50k iops per array.
All this performance from a disk response time perspective comes in at 0.1ms. So results have been staggering,surprising and were still loading them daily. I expect to hit 100k iops per cluster array within a year hosting around 400 Virtual servers on them.
Hope this has been helpful for those that have yet to experience how ground breaking this is and what it means for farm servers and vdi deployments.
Posted by: conrad walker-simmons | November 21, 2013 at 07:46 PM
Great to see a customer post this about XtremIO! Tt is indeed a great product but what is more important is that it has the EMC support structure, something that took EMC years to build and all the other AFA startups don't have ...
Posted by: Endre Peterfi | November 28, 2013 at 04:58 AM
...all hash calculations (inherently, as they are a more dense representation of a set of data) involve some insanely remote probability of hash collision – but these are astronomical.
Do you have any supporting analysis of XtremIO's hashing methodology for this claim or is this just a 'trust me'?
btw there are lossless hash algos. collisions only occur in the lossy ones.
Posted by: Thom Moore | January 12, 2016 at 05:53 PM
Do you an have answer to my question on the hash analysis? You say they 'took the time to do it right' so they must have done one. Can you share it?
Posted by: Thom Moore | January 22, 2016 at 06:16 PM
What analysis was done to validate your claim that the odds of collision is astronomical? They took the time to 'do it right' so it must have been done. Will you share it here?
Posted by: Thom Moore | January 30, 2016 at 11:09 AM
@Thom - thanks for the persistent question. My apologies - have been really slammed.
Yes, analysis has been done.
Summary: Less than one in 10 septillion (a trillion trillion) chance of a hash collision after storing one petabyte of data. That's 2.58494E-26 if you want the specific value.
Put another way - even if you stored all the data created on Earth in 2011 on an XtremIO array, the probability of a hash collision is less than one in 10 trillion.
There is a similar probability of a meteor landing on your house (roughly 1 in 182 trillion).
reference: http://preshing.com/20110504/hash-collision-probabilities
This is the formula used, with our given hashing algorithm and our hash function size.
It's also the root of the expression "astronomical" :-)
Honestly, we run into far more practical issues so much earlier than hash collisions, the math is humourous. Human error (by customers or EMC/EMC Partner services, bugs in code) are... materially more likely :-)
I've always found this argument to be an interesting one, because the math is so compelling that "hash collision = something to be scared of!" (often from people who don't dedupe) argument doesn't hold up - at least for me.
Thanks again for asking!
Posted by: Chad Sakac | February 01, 2016 at 01:17 PM
Thanks for the response. Jeff Preshing's analysis assumes a hash function with uniform output probability. He states it up front. But XtermIO uses SHA1 and no one to my knowledge has ever characterized its output probability so Preshing's assumptions can't be met. Recent cryptanalysis provides attacks showing collisions at much less than the expected birthday paradox difficulty (2^63 operations cryptanalysis vs 2^80 birthday) suggesting anything but output distribution uniformity.
I wouldn't call the math humorous, I'd call it incomplete. Did anyone prove the output of SHA1 isn't all concentrated in one small sub-range? If not this is a just a 'trust me' not a 'done right'.
Posted by: Thom Moore | February 02, 2016 at 04:32 PM
@Thom - thanks for your comment.
The birthday paradox (at least to my knowledge) is generally applied to brute force attacks on hash functions when applied to crypto, and of course, in this case we're talking about hash-functions and collision likelihood in a data set. To me those are different.
People who are interested - good reading here:
https://en.wikipedia.org/wiki/Birthday_problem
I suspect (and I encounter many folks, many opinions - and some just want to prove how smart they are). I'm sure you are very smart, and in my experience with that type = this will turn into a pissing match.
So, I'll leave my comment to stand (and yours and any others you choose to post) and people can judge for themselves.
And personally, I DO think it's funny - 2^63 operations (brute force attempt to determine a hash value, birthday paradox) is still ~1:100 Billion odds.
I like those odds.
Thanks!
Posted by: Chad Sakac | February 02, 2016 at 07:58 PM
I'll leave it at this. To know whether its safe to use a hash function you have to know the odds of collision and to know that you have to know the probability mass function of the hash, i.e. characterizing the output probabilities. No one has ever done so for SHA1 that I know of. Without it you're just guessing at the collision rate and you can't claim 'to have done it right'.
And not that I agree with your number, but 1 in 100 Billion is 1E-11, or 10000 times more likely (worse) than the undetected error rate of enterprise grade magnetic disks (1E-15). I doubt most IT people would find that funny.
Posted by: Thom Moore | February 03, 2016 at 08:17 PM
@Thom - thanks for your comment
I get that you are stating that since someone has not done (to your knowledge or mine) a study of the randomness of the probability distribution of the SHA1 hash has not been done - it could have a non-random distribution or "probability mass" - it could be lower than 2^80.
I can tell you as someone that is intimately aware of results of customers - both good and bad (all the EMC execs are informed when we have a Sev 1 issue that is profound, and there is material data unavailable or data loss), this is academically fun and interesting (it really is! Thank you for the dialog).... Hash collision doesn't even enter into the realm of material consideration. There are much more material considerations to make sure that the customers data is secure.
BTW - if I was advocating SHA1 as a core crypto basis - I would be arguing with you, but that's not the case here, and the Wikipedia article I noted also drew that distinction (using how Git uses hash functions as an example).
Thom - sincerely, thank you for adding to the dialog!
Posted by: Chad Sakac | February 03, 2016 at 10:55 PM
I could go on about the lack of necessary analysis but you raise an even more interesting question about error detection and issue handling.
Suppose a collision did happen at a customer using this device. The data would be misevaluated as duplicate, a pointer to previously stored data supposedly the same but not really would be entered and the user's data, not recognized as unique, would be discarded. No error would be recorded or reported to the host on the write. It would look successful.
Then someday the host would read it. From the point of view of both the device and the host, up to but not including the top application, the read would also look successful. No errors recorded or reported.
The top (or near) level application would receive data other than expected, it might detect bad data or it could just erroneously process it.
When the app guy tried to debug the bad processing, it would look like programming error because there would be no device error indication worth chasing.
You would have material data loss but no one would recognize it as such. Stealth loss.
Since you are intimately aware, how is EMC set up to handle a problem that hides in this fashion? Would it even catch it?
Posted by: Thom Moore | February 04, 2016 at 07:53 PM