I’ve gone on in the past (see here) on the topic on my personal way of looking at storage efficiency as “3 dimensions” ($/W/sqft per GB, $/W/sq.ft. per IO, $/W/sq.ft. per unplanned change).
Down the dimension of storage efficiency per GB, compression and deduplication capabilities will become more and more pervasive across the industry.
EMC has an incredible deduplication portfolio – from the industry leading target and source based dedupe technologies, and a killer primary storage dedupe technology.
The exciting news is we’re not stopping here. You can expect compression and data deduplication technologies to appear across the EMC family, and continuously expand to being more and more efficient across more and more datasets. Next up to plate for us is the addition of fixed-block level dedupe (particularly useful in the VMware set of use cases).
I love hearing from customers how these technologies are helping them – today. An email came in on Tuesday this week raving about the savings they were seeing in their environment – on some NAS use cases 57% of their storage was reclaimed – totally transparently. Simple, easy, efficient.
I’ve removed the names of the customer – suffice it to say they are a FT500 customer that everyone knows and interacts with in some way every day.
I’ve also blanked out other stuff that would be negative against EMC competitors – as I don’t want this to be construed in a negative way, but rather how this capability can help customers, but otherwise – is the direct email from the customer.
To the customer – thank you for choosing EMC. We are dedicated to continuing to serve you well!
If you’re interested in the voice of a customer – read on past the break.
“In case you run into other _____customers (but of course you can’t mention ___ or ______) – but you get the idea “A fortune 500 firm is getting great de-dup without a severe performance hit on the highly unstructured NFS data – with upwards to 57% de-dup, while also driving down their CAPEX and OPEX costs” :-)
I just wanted to share this with you ______, as I know you spent A LOT of your time A) convincing us that this will work and B) hand-holding and in-person training.
It really shows me that EMC and you are a true partner.
Thanks for all you both do for us.
______
<snip>
From: ______
Sent: Tuesday, March 22, 2011 3:21 PM
To: _______
Cc: #_____Systems Engineering - International; #____Storage Services; ___
Subject: Great De-Dup on the new EMC TIER 3 NAS Solution
Hey _____/______,
The Storage team wanted to share some great news with regards to the exceptional data de-dup rates we are getting for the various NAP7 INTL NFS shares (TIER 3):
/intlshare – 57% (3.2TB! de-dup/saved)
/intlhome – 47% (11GB de-dup/saved)
/intllogs – 37% (361GB de-dup/saved)
/intlsecure – 16% (6GB de-dup/saved)
It’s a prime example where SE and IT Operations work together to provide a solution that meets the BU’s needs, helps drive down costs (going from expensive ______ to EMC), and also maximizes the use of our storage infrastructure assets ( via use of EMC de-dup technology). I hope you and your group take pride in this, as much as the Storage team does, as it was your group’s hard work and contribution that helped us test and integrate this solution in a safe and transparent manner. We really appreciate you all stepping “out on the limb” with us, and migrating the data over to this platform.
Really great work to ____/_____ and _______for doing initial POC load tests in the early stages of the project.
Sample Screenshot from the EMC Array.

Hey cool! I would love to see the Tier 1 and 2 results. BTW: Why is VP disabled?
Posted by: Olli Walsdorf | March 24, 2011 at 10:09 AM
Hi Chad,
what about block level storage? Can customers expect FAST II to be able to do dedupe on primary storage in the near future?
Posted by: pfuhli | March 24, 2011 at 02:48 PM
being an emc customer myself, and running with an older NS502 and NS120, although I'm impressed with the spaces savings when looking @ dedupe, I often question how much space is actually being saved when you take into account the amount of checkpoint storage ends up getting used. am i wrong when i assume the savings doesn't take into account the amount of checkpoint storage required to facilitate the dedupe process?
Posted by: Duane Haas | March 24, 2011 at 09:31 PM
Hi!
I'am using CX4-240 with Compression & I reclaimed something about 35% on compressed LUNs only.
Here is proof - http://clip2net.com/clip/m31849/1298458900-emc_compession_in_real_life-28kb.png
Posted by: philzy | March 25, 2011 at 02:49 AM
Chad,
I am interested in the performance impact of block level compression.
We are intending on connecting up a VNX7500 with a nice large amount of tiered storage (450 TB usable - in 50TB virtual pools, more or less), using FastCache, FC connected to the servers.
Is there any reason for me not to turn on compression on all of it day one? I am getting doubts expressed about the ability of the Storage processors to handle the load. - We are talking about a fairly generic mix of data bases, applications and File (File data is presented as block to MS servers, not as direct NAS) and No email. Most of this storage, over 85%, will be going to VMWare farms through Brocade DCX FC directors.
To be clear about expected performance - if the VNX matches my existing CX4 arrays pre Flare 30 I am a very happy camper.
I think the new VNX should eat this for breakfast from everything I have seen - Is the promo real, or are the cautions I am getting from the local support people real?
Posted by: Alby Cartner | April 05, 2011 at 01:00 AM
@olli - re: Tier 1/2 use cases - for this particular customer, they call general purpose NAS "Tier 3/4". Performance of our primary storage file-level dedupe and compression on NAS is very good (has little to no impact).
@pfuhli - Thanks for your question! EMC's view is that dedupe/compression will start to become more pervasive in all use cases - which means you'll see all sorts of variants. Today we do file-level dedupe and compression for NAS, and compression for block (with the sorts of efficiency gains at our customers like they point out in this post in their own words). Interestingly, in these use cases (general purpose NAS) - block-level dedupe is actually generally LESS efficient. BUT there are use cases and data sets where block-level dedupe is more efficient (VMDKs being one). Good news for EMC customers - it's coming soon. You'll see more in the coming months.
@Duane - great to hear from you again - thanks as always for being a great EMC customer! Not sure if I understand your question? Could you elaborate? Checkpointing is not a central part of the dedupe process.
@philzy - that's awesome to see - 35% savings on your block storage using EMC! Thank you for sharing!
@Alby - thanks for being an EMC customer! Personally, I wouldn't turn on compression on it all day one. While the amount of performance in a VNX7500 is astronomical, AND you can see from @philzy's example that there can be real material savings of compression, the impact for VERY transactional, low-latency workloads of our block target compression does exist. In my experience, EMC tends to be quite conservative in our field - with the best interest of the customer in mind.
Sometimes, though, this makes us inadvertently stop a customer from getting the most out of what they have.
While it's very difficult to make sweeping performance guarantees (performance, unlike capacity is much more "it depends") - I do fee pretty darn confident that a VNX7500 with compression on would still soundly spank a pre-F30 CX4.
Rather than making it a "we just do it everywhere", personally I would do it on a workload-by-workload basis. Remember that you have the control of making everything (FAST policy, FAST Cache use, compression, etc on a device-by-device basis.
The other factor to consider is the answer to @phufli - since most of this will be going to VMware use cases, I can tell you that you can expect much better space savings, with no material caveats from the upcoming block-level dedupe feature on VNX.
Would you be willing to share your findings here one way or another?
Posted by: Chad Sakac | April 07, 2011 at 10:03 AM
I can help with Duane's issue since we experienced this ourselves. If you are a customer who is "living the dream" - by that I mean you have two Celerras setup to replicate and you are using checkpoints on both ends to handle your backup needs, then you will hit this issue if you turn on de-dupe. What happens is that all of the (changed) blocks the de-dupe process flags to eliminate from the primary FS get copied into the savvol. When that happens the savvol grows to accomodate this new influx of data. The issue is that EMC has *still* not released a savvol shrink utility. So any space you "saved" via de-dupe, is now taken up - permanently - in your savvol.
I have the case notes to back this all up - EMC did not properly plan for all the customer use cases here. If you don't approach de-dupe very, very carefully, you save exactly zero.
Oh - this happened 2 *years* ago - and we still don't have a savvol shrink utility. There's a definite disconnect in EMC engineering about how important this issue is.
Posted by: Craig Dodson | April 12, 2011 at 03:58 PM
@Craig - thank you for being an EMC customer. Please email me the case #, I'll help being an advocate on your behalf.
Posted by: Chad Sakac | April 12, 2011 at 04:17 PM
Thanks for the response.
The project is progressing slowly but once we start getting some numbers I will find out what I can share. I do not expect to have real numbers until about 4 months or so though.
The plan for now is at a high level we are going to load the arrays progressively.
All physical hosts will be uncompressed as they are typically Oracle databases or SAP that remain physical because of licensing issues.
We will turn it on for all VMware and watch it like a hawk and map the load to see how it flies. If it looks unhappy we will do a controlled uncompress. a 40% savings/space gain is too huge to not at least make a try for it.
thanks again for the feedback
Posted by: Alby Cartner | April 14, 2011 at 02:08 AM
Full disclosure: I work for EMC. I happen to be one of the folks that designed the file dedupe/compression feature in Celerra/VNX. I just want to offer a little more detail on how the system works and why.
The Celerra / VNX file system dedupe/compress does not cause all freed blocks to be copied into the savvol. It does make changes in the file system (obviously) and hence some portion of the data that it changes may be copied into the savvol, but it copies only that data which is required to preserve a previous point in time of interest (i.e. a checkpoint). It is impossible to predict what how much data will be copied into the savvol during a dedupe run due to the fact that it is influenced by things like the utilization and contents of the file system, the number, age and content of the checkpoints. It is correct that once a savvol has been extended there is no way to shrink it without deleting all the checkpoints for the file system, something that is often not an option for customers who have to meet recoverability SLOs. For this reason the default behaviour of the system to abort the dedupe/compression processing of a file system if it risks causing the savvol to extend. The system will then, by default, try the affected file system again in 7 days. The hope is that by then older checkpoints might have been deleted and there will be more space available in the savvol to allow the dedupe/compression processing to make more progress. All of this means that it may take the system a number of attempts and some time to complete its first full pass through the file system. After that, in most environments, the system can usually complete a full scan of the file system without issue. Of course you have the choice to disable the savvol extension protection and/or configure the system to scan the file system more often. We generally do not recommend disabling the savvol extension protection mechanism – the fact that it is enabled by default is a hint.
BTW – if anyone sees any documentation that does recommend disabling the savvol extension protection mechanism please speak up.
As for the savvol shrink issue – we are working on addressing that as well, but there are no quick fixes available sorry.
Posted by: Chris Stacey | May 02, 2011 at 09:57 PM
Just checking in 6 months after the last post on this thread. Are we any closer to seeing a savvol shrink utility see the light of day? I could sure use the space back...
Much appreciated!
Craig
Posted by: Craig Dodson | November 29, 2011 at 03:14 PM