It’s always interesting to watch vendors (EMC included) make the “you don’t really need _____” when they don’t have a given feature, while all the time, they are furiously working on it.
Likewise, it’s always frustrating (to everyone) when a customer expresses requirements in the form of a feature/implementation, rather than a broader requirement (ergo they’ve fallen for a vendor’s “cheat sheet” selling motion).
This post was triggered for me by two things that happened last week: 1) watching the SNIA twitter feed from @jpwarren (disclosure I wasn’t there); and 2) some conversations at the EMC TC conference that happened in Paris.
First… the SNIA conversation seemed to be about a multi-vendor panel (@Rodos, @jpwarren, you were there, perhaps you can comment):
- Vendor A seemed to be all about automation, virtualization of 3rd party storage, and autotiering. When pressed on the topic of dedupe, they talk about dedupe on their NAS heads, but states that the bulk of dedupe needs are in the backup storage market (where inline is a pre-requisite), not primary storage.
- Vendor B seemed to be all about all about dedupe of ALL storage being the answer, convergence of protocols, vCenter plugins, that mega-caches and primary storage dedupe are all you need - that automated tiering is rubbish.
I’m wondering if readers can guess who Vendor A and Vendor B are :-) They weren’t EMC – but I simply didn’t see EMC in @jpwarren’s twitter feed (though he did a blog post subsequently). Do you think either of them are flat out wrong/right – or maybe they are both kinda right/wrong at the same time?
Second… At the TC conference, I asked a room full of TCs if dedupe of primary storage was, in their eyes, a good or a bad thing (simple question which glosses over a ton of “it depends” variables). On the whole, they responded that primary storage dedupe is a BAD thing. I personally disagree. Of course, the “it depends” is dependent on a ton of factors, but there’s nothing intrinsically wrong with primary storage dedupe.
BTW – this applies to ANY industry, not just storage land. I would also bet that Vendor A is furiously working on primary storage dedupe, and I would be that Vendor B is furiously working on automated tiering.
If you’re interested in a little more examples, including customer perspectives, read on…
Ok – on the topic of one of the “features” primary storage dedupe first.
Directly from the internal IT blog of a customer who deployed their first enterprise Unified platform (previously just used SAN and standalone fileservers).
“…Well, by migrating a File Server to the Celerra, we can in essence eliminate the physical (or virtual) server and free up the Microsoft licensing. PLUS, we can take advantage of the fancy features the Celerra offers. One of the cool tools is the dedupe.
So here are the stats for 1 drive:
- Storage was reduced by 40% or 800+ GB!
- 57% of the files were redundant (over 2million files scanned on the E drive alone)
- Backups were reduced from 94 hours (yes, that's right! 94 hours) down to 10 hours!!! That's almost a 90% improvement! Woo!
Awesomeness. That's all I can say.”
Three comments:
- Customer – you know who you are, but from me to you, THANK YOU.
- Anyone, including EMCers – every customer needs NAS, and I would argue that every customer needs Unified. They often need MORE than that (think scale-out enterprise storage, inline dedupe in the backup space and much, much more), and EMC’s in a great position to do both.
- Anyone, including EMCers – who say that primary storage dedupe/compression/thin/autotiering/megacaches etc. are intrinsically “bad” are smoking something.
But wait – is primary storage dedupe always the answer? Is Vendor A out of their mind (recall, they were pooh-poohing primary storage dedupe)? NO. What about Vendor B (recall, they were all about “dedupe everything”)? NO. Huh?
The answer is you can make a customer happy and efficient in MANY ways.
For example, EMC’s approach to primary storage dedupe and capacity efficiency today is:
- - single instancing of files on NAS - which has virtually no performance downside, and is VERY efficient (look at the customer example above – you can just single-instance 57% of the files, poof, a huge part is done very fast – which is why our approach doesn’t having any 2nd order effects on other NAS parameters (sizing, number, features, whatever).
- - compression (file and block) – which has very light performance downside (~10% on reads) penalty for light point workloads, but heavier (think ~40% on reads) penalty for heavier point workloads (think of something like a database or a VMDK that you run an iometer load against).
- And Mega caches, autotiering, thin, dense disk configs (which aren’t measured in disk physical size, but total TB/U or TB/floor tile), and so on and so on…
- Being as efficient as we can in the overall conversion of raw-to-useable, and in overall utilization efficiency
Together these all effectively lower the $/GB for a given config. This is a set of things where we are leading in some, following in others. This isn’t intended to be a post about EMC vs. the other guy, but rather a more general discussion.
But – back to the pure dedupe/compression discussion.
This means EMC is VERY effective for general purpose NAS. For “other use cases” it’s far to say that we are less effective (but still good) in that compression applies generally – but has variable performance impact (but not on other functional operations). But, we’re not perfect – it’s not as effective in reducing GBs used for VMDK storage (VMFS or NFS).
When you’re in the “I’m virtualizing craplications” phase, they have small performance requirements, but high aggregate GB requirements (ergo dedupe/compression effects on $/GB efficiency are very material).
When you’re in the “I’m virtualizing mission critical, heavy IO workload” phase, they can have any blend of performance/capacity requirements (sometimes $/IO efficiency stuff or other factors outweigh $/GB factors).
In that stage, efficiency is a broader topic - Thin, dedupe, compression have benefit ($/GB), but mega-caches (do more with less spindles for cache-friendly workloads), high bandwidth connectivity (do more with less ports), platform throughput/bandwidth scale (do more before you need to buy another box), solid state/auto-tiering (do more with less spindles for all workloads) – those all start to come into play.
***and customers generally don’t just have one workload***
Another customer example…. I was with another customer in London this week – a huge financial customer. That kind of customer has many PB or EMC stuff, and many PB of all SORTS of non-EMC stuff.
That customer asked me: “what’s the good with your primary dedupe if it’s not good for VMDKs – there your compression is the only thing that adds efficiency?”
First – to that customer – THANK YOU for being an EMC customer, and we look forward to continuing to serve you.
Back to the thread… I asked “how much storage is consumed by your 9,000 VMs today relative to your general purpose NAS?”. Their response “the VMware footprint is a tiny proportion of our overall NAS footprint”. So, you can see that our approach (with the associated tradeoff) may be a very good choice for that blended workload/dataset.
I’ve said it before – efficiency is measured in TOTAL, across MANY workloads/datasets, and is not about any given feature. It’s really about combined featuresets, but MOST importantly, about how people use technologies.
Frankly, I think **know-how** about how to make technologies work together is the key. This is very true with the “mix and match” open systems model. It’s less important in the integrated stack approaches, but still material – we aren’t yet at the point where those are SO integrated that these capabilities are irrelevant.
In general, I would place know-how and how much a given vendor/partner works with you to make your business requirements be met on a pedestal above any given vendor feature.
Beyond that – OF course, every vendor (EMC included) is working furiously in areas where they see things that are useful – across their competitive landscape (including broadening the envelope of primary storage dedupe).
As another example, back at this post here (May 2010) where I was excited to intro what EMC had done in terms of FAST Cache and FAST, and brought some of the data. The post had NOTHING to do with anyone but EMC (except to acknowledge where another vendor had done something similar to FAST Cache, which seemed like the right thing to do). If you look at the comment thread, several competitors jumped ALL OVER the post. The general tone of the comments (go – read the public record) was that “automated tiering and SSD as a tier are not needed – megacaches are ALWAYS the right way”. Low an behold, that vendor just introduced SSD as a tier. I’m sure that they are working on their own automated tiering. Sigh.
Tomato. (read the post to understand that)
Consider that the next time you see a vendor pooh-poohing something someone else does. Challenge them to show HOW they would solve your requirements – not with a feature, but rather as a higher level request. That’s a good challenge – and you’ll get a better basis for comparison, and challenge them all to put away their “competitive cheat sheets”.
Chad, it's really too bad the EMC sales people don't read your blog.
Posted by: Bob Plankers | November 10, 2010 at 02:22 PM
Hi Chad,
You need to be careful about putting words in other peoples mouths. I was there, so I'm can say that the assertion that Vendor B said that "Auto-tiering was rubbish" is incorrect. I also looked at @jpwarren's twitter feed and I didn't see him make that assertion there either.
I did have a slide that was deliberately contentious and designed to stimulate some debate. It was entitetled "What's not so hot" which included the following bullet points.
Old school bulk copy backups
Stub based archiving
Physical Tiering
Closed “stacks”
Scale-out for performance
Compression/Variable block for primary
Databases as blob stores
Badly designed ethernet storage networks
8GB+ Fibre channel
That's not saying they're rubbish, but that compared to other approaches like say megacaching or converged networking and other really high value technologies they're "not so hot".
For more information on what was said, I'd recommend looking at the subsequent blog posts and the subsequent follow-up posts.
http://nsrd.info/blog/2010/11/10/a-tale-of-4-vendors/
http://www.eigenmagic.com/snia-blogfest-2010/
http://rodos.haywood.org/2010/11/snia-blogfest-interview-with-netapp.html
Posted by: John Martin | November 10, 2010 at 06:02 PM
@Bob - thank you. I try man, I try. Many of them do. My philosophy (again, not claiming attainment of nirvana here :-) is "be the change you want to see in the world".
Many EMCer DO read the blog, and hey, if it helps them help their customers the right way - man, that's great.
@John - thanks for the comment. I tried to be clear and used the "seemed to me" words loud and clear, and heavily emphasize that I wasn't there.
I would love to hear from @jpwarren or @rodos (as I'm sure you and I are biased).
I would argue with you on many front on your list (and agree with many other items on the list).
The key point of the post though, WASN'T the SNIA thread per se, rather that: we ALL (you, me, everyone) tend to think the stuff WE have is "hot" and discount what others have. For example, man, for customers I talk to:
- For many, Scale-Out performance is HOT.
- For many, inline dedupe (particularly around backup) is HOT HOT HOT.
- For many, NAS primary storage dedupe is HOT HOT
- For many, BI/DW appliances are HOT HOT HOT.
- For many, storage Auto-Tiering is HOT HOT.
- For many, server-based SSD (think Fusion-IO) is HOT.
- For many, converged infrastructure (as a product, not a reference arch) is HOT HOT.
- For many, Object storage is HOT HOT.
- For many, orchestration is HOT HOT HOT.
There's stuff on that list EMC does, and there's stuff we don't. There's stuff there you probably disagree with, and I bet it has a high correlation with the items NetApp doesn't do. I probably discount the things EMC doesn't do.
I think that's an instinct we need to fight.
Posted by: Chad Sakac | November 12, 2010 at 11:37 AM
Spot on, Chad. Long-winded version here:
http://solori.wordpress.com/2010/11/12/quick-take-is-your-marriage-a-happy-one/
Cheers!
Posted by: Collin C. MacMillan | November 12, 2010 at 12:20 PM