« Note to self UCS and direct storage target connections | Main | Whats going on in VMware View land part II »

March 09, 2010

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Jason Boche

Pretty cool Chad!

Itzik Reich

The Plugin itself can now be downloaded from powerlink..

Mark Burgess

Hi Chad,

Do you know what impact the compression has on replication and snapshots?

I assume that extra data will be replicated as the write data is compressed.

Or do the replication and snapshot engines operate at a level above the compression therefore they are not aware that the data has been compressed.

If this is the case will the data that is replicated be compressed or in it's original state?

I guess there has to be a trade-off somewhere.

Many thanks
Mark

Vaughn Stewart

Nice work guys, it looks like the VMware integration with Celerra & NFS is progressing well.

A few questions if I may...

File level dedupe, compression, full clones, fast clones. Why so many choices?

What guidance does EMC and/or Acadia provide around when to use each technology? I ask the question in this manner as it seems that providing multiple options will inevitability lead to human error.

(this is similar to when does one use RAID 10 6, or 5?)

I also have to ask you to adddress concerns around performance.

EMC (thsi includes you) have (incorrectly) touted NetApp dedupe as resulting in lowered performance. With this line of logic, should one be scared by the thought of enabling EMC's storage savings technologies?

I ask my question in such a manner, not to be difficult but in order to further my understanding. I have the NetApp perspective where provide block level dedupe and instant zero cost cloning. Both of these technologies produce the exact same results; multiple VMs sharing blocks for any 4KB on identical data. This enables the use with both SAN & NAS.

Thanks for the forth coming reply. Again nice work; while we are competitors it is very cool to see the bar raised around storage integration.

Vaughn

Rainer

Hi Chad,
nice work
could you please fix your link to the demo videos ?
looks like the login to ftp.documentum.com isnt correct
thanks
Rainer

Rainer

here's the working link to the wmv video
ftp://ftp.documentum.com/vmwarechampion/Demonstration_Tools/Celerra_NFS_Plugin/nfs-plugin-v3.wmv

Chad Sakac

@ Mark - the file-level dedupe occurs high up in the stack, the compress happens at a very low level.

Turning on filesystem-level file-level dedupe and file-level or filesystem-level compression has no impact on snapshots (or any of the other Celerra features).

This simple behavior (and a low performance impact) mean that we expect that as people get comfortable with the feature, people will just leave it on.

The trade off is that it's not a block-level dedupe, which with the VMDK dataset can reclaim even more capacity.

@ Vaughn, thank you, we're working hard to try to make NFS use cases with VMware better for EMC customers, as I know NetApp is with theirs.

Why many choices? Well - a cloned VM can be copied between filesystems (analagous to a "hardware accelerated storage vmotion". a "Fast Clone" is a snapshot - and must be in the same filesystem.

As you know the vStorage APIs for Array Integration for fast/full copy (extent based copy which can be described as "hardware accelerated clone/template/storage vmotion") is block only.

Let me clarify the comment about "reduced performance" on NetApp dedupe. I don't believe I said that, but I can imagine how it might be interpreted that way.

let's say you have a 28 spindle aggregate, in which you have a 5TB flexvol. That that flexvol is exported via NFS. You mount that as a datastore to a 16 node ESX cluster. You put VMs in there, and push IOs. Let's just say that between the VM stack, network config, FAS processor resources/cache, the PAM, and the flexvol/aggregate config you can push 5000 IOps (not saying that would be the number, but let's just say).

So, based on that, you put in 100 VMs, each 50GB in size, and each doing 50 IOps on average.

Things are going great, the filesystem is nearly full, and you're using all the available IOps.

You dedupe that flexvol (either via CLI on demand or via the "capacity triggers" you folks have added), and come back after the process is done. While the process is running, I'm sure you and I would both agree that there is a performance impact. There an impact for our compress as well. It runs as a background process, so if the Datamover was maxed out on CPU and never had time, you wouldn't see a space savings.

Performance goes up, let's say by the 38% you point out on your View post. PAM is more effective, as "effectively" it's larger (storing less blocks, as duplicate blocks are eliminated).

You look at the df and see that the filesystem now is mostly free.

Could you reduce the number of backing spindles in the aggregate by 80%? By 50%? Could you put it in

The point is not to say this is bad. The point is that people confuse capacity efficiency and performance efficiency. They are both important factors.

Sometimes you're gated by GB, sometimes by IOps/MBps. If you're gated by IOps/MBps, even if you COULD fit your GB in less backing spindles, you MIGHT not be able to - that's all.

That applies to EMC and NetApp (heck, to everyone).

Does that make sense?

In the fury of sales campaigns, sometimes (by every vendor) these ideas get blurred (think "hey, I just showed you a demo where I took a 5TB filesystem that was full, and all those VMs are the same, and then we ended up with the capacity of just one VM - so I can save you 10x in your configuration!"). I'm not saying YOU do that, I'm just saying that it happens.

So, if in a given case, if a block-level dedupe can save more capacity than a compress+file-level dedupe (which will be the case in some scenarioes), but you are IOps-bound then the net configurations would cost the same to the customer.

In other cases, the reverse is true.

It's not just me that sees that. Fallout of that at a customer is what triggered Duncan's post here: http://www.yellow-bricks.com/2009/12/23/iops/

Likewise, I see "blurring" where NetApp implies that RCU functionality is the same on SAN/NAS. It's not. You cannot do a VM-level snapshot on SAN (and neither can we). We both can do LUN-level, VMware-integrated replicas but those replicate the LUN (and the entire VMFS datastore).

@Rainer - thanks for the catch - fixing the link.

Mark Burgess

Hi Chad,

Just to confirm are the following correct:

1. Deduplication and compression of files will not cause any additional data to be saved for a checkpoint
2. Deduplication and compression of files will not cause any additional data to be replicated with Celerra Replicator
3. When a compressed VMDK is replicated the full uncompressed quantity of data is replicated?
4. When the writes are compressed for a VMDK running over NFS this will not cause any additional data to be saved for a checkpoint
5. When the writes are compressed for a VMDK running over NFS this will not cause any additional data to be replicated with Celerra Replicator

A couple of other points as follows:

1. All of these enhancements to Celerra require DART to be upgraded often many times per year - when will the customer be able to perform a simple automated upgrade?

2. The Fast Clone feature appears to be a more cost effective alternative to View Composer - what do you see as the pros and cons of these two features?

3. Are we likely to see a similar plugin for the block protocols on Celerra and CLARiiON?

4. When will the ESX NFS client be updated so that it supports multiple sessions/paths per file system?

Many thanks
Mark

forbsy

Seems like EMC has nicely followed the Netapp lead in terms of vCenter Server vStorgae API integration - nice to see. Very similar solutions to Netapp deduplication, file-level flexclone and the Rapid Cloning Utility 3.0.
Chad, i really didn't understand your example towards capacity vs performance utilization as it relates to dedupe on Netapp. To me, reducing the #spindles in the aggregate isn't what I'm looking to achieve when I turn on dedupe. Are you saying that if I've filled my aggregate with VM's and those VM's have used almost all the available I/O that there will be no available I/O left for additional VM's after dedupe? So, the fact that I've achieved greater capacity utilization is offset by the fact that there is little to no more performance left for more vm's? Just wanted to make sure I understood :)
Does EMC have anything like PAM to increase read performance? I've read how Netapp and EMC see the value of SSD fundamentally different. Netapp chooses to use "flash as cache" to increase read performance (and eliminate spindles), while EMC seems to want to use SSD as a storage tier. Both seem to have advantages and disadvantages but it would seem it's up to the customer requirements as to which approach they choose to utilize. Thoughts?

Chad Sakac

Okay - taking this one in each's turn...

@Mark Burgess - here were your questions:

1. Deduplication and compression of files will not cause any additional data to be saved for a checkpoint

Chad - file-level dedupe reduces the amount of data to be saved for a checkpoint. This has a VERY large effect for general purpose NAS use case. It has almost NO effect in the VMware use case. Compression can marginally increase the amount of data saved in a filesystem level checkpoint (aka Snapshot). I'm being an engineer about this, as in general, they are small, and transient as they are folded into the file. In general, compression reduces the amount of capacity needed for a checkpoint due to the ~40-60% capacity reduction overall. NET: it generally reduces the capacity for local snapshots of filesystem, though in some cases it's neutral.

2. Deduplication and compression of files will not cause any additional data to be replicated with Celerra Replicator

Chad - ditto to the above.

3. When a compressed VMDK is replicated the full uncompressed quantity of data is replicated?

Chad - will double check this.

4. When the writes are compressed for a VMDK running over NFS this will not cause any additional data to be saved for a checkpoint

Chad - in this case, it's always less

5. When the writes are compressed for a VMDK running over NFS this will not cause any additional data to be replicated with Celerra Replicator

Chad - in this case, it's always less.

A couple of other points as follows:

1. All of these enhancements to Celerra require DART to be upgraded often many times per year - when will the customer be able to perform a simple automated upgrade?

Chad - no easy answer on this one, but we are working on a much, much easier DART upgrade.

2. The Fast Clone feature appears to be a more cost effective alternative to View Composer - what do you see as the pros and cons of these two features?

Chad - personally, I view the Fast Clone as an option where View Composer is not an option (increasingly rare). I think that it's actually a better use case for server use cases. Ultimately the best TCO and most flexibility from from desktop composition and application virtualization being applied wherever/whenever they can be.

3. Are we likely to see a similar plugin for the block protocols on Celerra and CLARiiON?

Chad - VERY soon (and on Symmetrix too!)

4. When will the ESX NFS client be updated so that it supports multiple sessions/paths per file system?

Chad - this one is surprisingly hard. VMware, EMC and others are working furiously on this. Don't expect core NFS VMkernel client changes for some time.

Chad Sakac

Next.. @Forsby

Thanks, I'm glad you like it!

I disagree (though you probably expect that) about following NetApp's lead on vCenter integration :-)

I hear you if you limit the comment to THIS plugin. The EMC NFS plugin and the addition of production compression does compete with two areas where NetApp has led the way (use of array technology beyond thin provisioning for capacity efficiency, and VM/file-level snapshots integrated with vCenter in their Rapid Clone Utility).

In other areas, we have done things first that either they have followed, or not.

In the area of "vCenter Integration":

- the EMC Storage Viewer was GA before NetApp's Virtual Storage Console (they aren't the same, but are roughly analagous in the same way that this plugin is roughly analagous to RCU)
- EMC has developed vCenter plugins to simplify VMware SRM failback based on customer feedback. While SRM failback will be part of a future SRM release, that insofar as I know NetApp still has not followed.
- EMC shipped (as far as I know) still the only vStorage API for Multipathing plugin (PowerPath/VE), while also supporting the native multipathing.
- EMC Navisphere, Recoverpoint integrate with vCenter APIs to provide VM and ESX level detail for common storage tasks in the storage admin's context.
- EMC Replication Manager shipped vCenter API integration nearly identical to NetApp's SnapManager for Virtual Infrastructure, but did it 2 months earlier.
- EMC Avamar integrates with both the vStorage APIs for Data Protection (for faster backup via the new CBT function in vSphere 4, and using the VDDK for simple file-level restore from VM-level agentless backup) and the vCenter APIs (for simple views and management of backup tasks)
- RSA Envision integrates with the vCenter APIs (and the ESX CIM API) to provide security event correlation, and RSA Secure ID integrates with the the VMware View Manager for hardening View configurations.

This is just a SHORT list - the complete list is far, far, FAR longer, and evolving so fast it's ridiculous.

There are of course, areas where NetApp innovates, and we follow. They are an incredibly strong competitor in the mid-range storage part of EMC's business. And of course when we do similar thigns, we each have pros/cons of each of our approaches, usually with the pros and cons applying in different use cases, and in different areas.

Even in the platform features themselves no vendor has a monopoly on innovation. Platform features can have a "affinity" (this is my opinion, like most things on the blog!) to virtualization (dedupe, compression, "scale out", automated tiering as examples) without being "integrated with" VMware. This means that the inherent behaviour of the platform has benefit in the VMware use case (this isn't "less valuable", but technically not "integration").

I'll say this - we have NO "not invented" here syndrome that I can see at least.

EMC has led the way in many technological innovations in storage that become mainstream. Examples include: non-disruptive midrange upgrades; ability to completely change a storage objects underlying construction non-disruptively; flash as a non-volatile storage tier; production storage compression amongst mainstream vendors; non-disruptive IO interface upgrades; use of commodity hardware in enterprise-class arrays while maintaining a global cache model, etc...

Bu in other areas, we see innovations delivered by others we agree with (and customers demand), and in those cases, we work to add value in similar ways.

- fully automated sub-LUN, sub-file tiering - Compellent (EMC's upcoming FASTv2)
- dense storage configurations - Copan (EMC now shipping dense configurations)
- single midrange platforms that support NAS and block protocols - NetApp (EMC Unified - our fastest growing core storage product)
- thin provisioning - 3PAR (Virtual Provisioning is now universal, and our defacto default - but not only - provisioning model)

and so on... In some cases the ideas have become widely used, in others, EMC has added that capability. In some of them, our implementations are arguably better, in others not as good. Where they are not as good, we will work like mad until they are as good or better.

In the end, it's up to the customer to decide, of course.

Ok - on to your next question. You hit the nail on the head. If you can increase the amount of free space in the flexvol by 80%, but are at or near the "carrying capacity" for MBps/IOps, then the savings is not 80%. Now, deduplication and the application of PAM can help performance, I'm not implying that

I'm trying to ensure that EMC doesn't do the same thing.

- If you are currently at 80% utilized from a capacity and 60% performance standpoint, and you turn on thin provisioning, most customers gain about 40-50% of their capacity back.
-Ok, great, you're at 40% utilized from a capacity standpoint, and 60% from a performance standpoint.
- Then you turn on compression, and gain another 40% capacity efficiency.
- Great, you're now at 26% capacity utilized, and 60% performance utilized.
- Then you apply file-level dedupe for your unstructured NAS, which is half of your storage use, and that adds 50% efficiency against that dataset.
- You're now at 13% capacity utilized and 60% performance utilized.

For us to say: "we just gave you back 77% of your capacity - for FREE!" would be accurate, but misleading, as it implies you can put on 77% more - but the performance utilization stayed the same. It's not like more IOps magically appeared. It's not like you could REMOVE 77% of the disks (because they are supporting the performance envelope).

NOTE: this is an engineer's world-view. The reality is that each of the capacity reduction techniques has a little bit of an impact on performance, sometimes positive, sometimes negative. But on the whole, they are only lightly coupled.

Let me try expressing this in another way:

Point 1: Efficiency has 3 dimensions.
- capacity efficiency (measured in $/GB or watts per, or square feet per)
- performance efficency (measured in $/IOps/MBps/ms or watts per, or square feet per).
- a flexibility efficiency (measured in the $ impact or disruptive effect of change) - this third one people don't think about, but it's a big deal. If you're terrified of "getting it wrong", you "stick a fluff factor in there, just in case". Those "fluff factors" add up. Plus, since you're terrified of "underconfiguring", you just start with more.

Point 2:
- features tend to affect one dimension and not others, or majority one, but not the other. Ergo, use of SATA is a capacity efficiency technology, use of EFD is a performance efficiency technology. Dedupe/Compression/Thin provisioning can all have some performance impact (positive or negative), but they are majority a capacity efficiency technique.

Point 3:
- some workloads are much more dominated by one of the dimensions, some are affected by all three. Ergo, B2D is hugely affected by capacity efficiency - so long as the ingest rate means the backup gets done in the window, and you can hit your recovery time objective - you then care most about effective $/GB. In that use case, capacity-efficiency is critical.
- a ton of very light IO VMs are dominated by the capacity-efficiency angle.
- a performance centric VM is dominated by the performance-efficiency angle.
- workloads that have a regular shift between types, or very variable benefit most from flexibility and automation angles.

Point 4 - Lastly - most customers have ALL KINDS OF WORKLOADS ALL AT THE SAME TIME. In otherwords, it's not about one of the three axes, or any ONE given feature, but rather how they all play together.

So - in the VDI use case, increasing capacity efficiency and increasing performance efficiency - together, is the goal, both for NetApp and EMC (and everyone). Arguing about the merits of one absent the other is fruitless (not claiming that anyone specifically does that). A practical example is: If EMC and NetApp (coupled with VMware technologies) can increase capacity efficiency by say 80%, but performance efficiency by 20%, it's incorrect to wave the "80% savings!" flag.

Likewise, I really hope we point out that it's "20% **CAPEX** savings". The bulk of the opex savings in the desktop use case from re-evaluating how new technologies around desktop composition and app virtualization can improve the operational model of clients, and how security technologies can make virtualized clients more secure than laptops/desktops.

Regarding your last question - Does EMC have something like PAM? Well, let me leave that one at this:

- EMC has very large caches today, up to 1TB and more in fact. Those large caches aren't inexpensive, and in our case are currently in the enterprise arrays. The EMC midrange arrays have smaller caches - up to 64GB of read/write cache. I think we can improve here - and I personally think the idea of large Flash as volatile cache has a place.
- EMC has consistently stated that we see Flash having a huge impact across the IT landscape, with no "one answer" being right. It will appear in the server, in the network, in the array as volatile cache, and as a non-volatile tier.
- Remember, we have NO "not invented here" syndrome as discussed above.

Stay tuned.

Personally, if I was a customer, I would prefer a technology partner with the engineering resources to be an innovative leader. But ALSO willing to accept that they don't have exclusive rights to innovation :-) They should also be a fast follower where others do something cool, and the idea is good, not be above looking to see how they could leverage it, and perhaps improve on it.

comment system

It's a great day for being a happy EMC user, getting this free plug-in and improve the great (already) system in a great way. Go on integration! thank you for the post!

The comments to this entry are closed.

  • BlogWithIntegrity.com

Disclaimer

  • The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by Dell Technologies and does not necessarily reflect the views and opinions of Dell Technologies or any part of Dell Technologies. This is my blog, it is not an Dell Technologies blog.