« Interested in a 40K employee companys experience with View 4 deployment? | Main | Insiders Perspective: Ionix and VMware »

February 25, 2010

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Stu

Tweaking and upgrading our Clariion array got our clone times from 40 minutes down to 2 minutes. Our engineers almost bought me a cake.

Primary tweaks: Using a stripled metalun to get give our datastores more spindles (that got us to 20 minutes), and upgrading our SP to enable write caching (that brought it to 2 ;)

Matt Meyer

I have been battling this very issue since upgrading to ESX 4.0. My SAN is not exactly on the HCL, but with ESX 3.5, I was able to push cloning operations to 120 MB/s. With ESX 4.0, they run at about 5 MB/s. This sheds a lot of light on why that happens, so a big thank you for that. Now, can you think of any way to tweak the Data Mover in ESX 4.0?

Ron Singler

So I was just at this customer the past two days installing their new Hitachi USP-V. We were walking through things doing some KT and they mentioned that their new litmus test for performance of their storage systems was a VMware clone operation due to performance issues they found on their V-Max boxes. They stated that their Clariion boxes were way faster than the V-Max at this operation and we put a little wager on whether the USP-V would be faster than the Clariion.

I took that bet and ended up winning. ;) The clone operation finishes in a consistent 10 minutes or less. The resulting time is what the customer called "untuned". We didn't mess with the VMware I/O size and left everything at the defaults. I'm sure we could improve things if we tuned things to the 512KB stripe boundary on the USP-V.

I just thought it was funny that you posted this today because I was getting ready to write something up the next couple days on it. Thanks for saving me the time! Great article!

Mihai

No matter how you try to spin it, EMC V-MAX should be smart enough to recognize a sequential stream and optimize for it, it's not exactly new technology :).

And you even call it V-MAX, the best virtualization storage...

I wonder how it performs with a MS SQL server doing full table scan due to a select query for a big report...

kraigk

Great thread. I think I have this problem with vSphere and a CX3-10c. My vRanger performance when doing iSCSI backup from CX3 is terrible. Half as good as doing non-iSCSI backup.

Conversely I have an Equallogic PS6000XV and MD3000i all in the same fabric and they don't have performance issues. In fact non-iSCSI backup are half as fast as iSCSI.

any ideas?

Andrew Fidel

I actually have the opposite problem on midrange virtualized array, vsphere is able to push enough IOPS during a clone that it can actually push up response time on the volume, can't wait for I/O DRS =)

Toudin

Has anyone been able to identify the reason for this behavior?

John Martin

Interesting post, overall I'd agree with your categorisation of Mid-Range vs Enterprise, though in the past I'd always thought that DMX was better suited to sequential workloads and single threaded large I/Os due to its mainframe heritage with the associated tendency to do batch processing, and that mid-range boxes typically had the advantage when it came to cache-hostile random I/O.

I based that assumption on a casual conversation almost a decade ago, where someone said that Exchange 2000 workloads were fairly toxic to early DMX implementations. I brought up that conversation with an EMC engineer who said that a revision of Enginuity addressed this. Even though this was based mostly on hearsay, it appears maybe this cloning situation is another example of a mismatch between the design center for the array and the particular workload.

I wonder if this had test had been done on an old-school hand tuned DMX whether the result would have been the same as I suspect that the design center for the Vmax workload is more highly optimised for small block random I/O than the previous Sym.

For single threaded workloads little law determines overall throughput, and the additional overheads associated with a scale-out architecture (even when measured in microseconds) can really drop sequential throughput, but I'm surprised that the impact is as large as it was.

As a matter of interest, did the customer try using PowerPath ? Your mention that using round robin would have improved things dramatically inferrs that they didnt, but if they'd gone to the expense of installing a V-Max why not complete the picture ?

Chad Sakac

John - thanks for the comment. The customer was using VI3.5, so use of Round Robin or PP/VE was out of the question, unfortunately. In the testing completed, we did show that going to vSphere 4 and using Round Robin or PP/VE did ultimately drive much higher bandwidth.

James

Great thread. I think I have this problem with vSphere and a CX3-10c. My vRanger performance when doing iSCSI backup from CX3 is terrible. Half as good as doing non-iSCSI backup. http://www.hotfilemediafire.com

The comments to this entry are closed.

  • BlogWithIntegrity.com

Disclaimer

  • The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by Dell Technologies and does not necessarily reflect the views and opinions of Dell Technologies or any part of Dell Technologies. This is my blog, it is not an Dell Technologies blog.