« RSA, VMware and Intel securing Private/Public Clouds | Main | DRS For Storage! »

August 31, 2010

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Mark Burgess

Hi Chad,

I have reviewed the documents above and I had a few questions I hoped that you can help me with as follows:

1. Where is the link to the EMC VDI sizing tool (I am hoping this will allow us to accurately size each tier using FAST Cache, EFDs, FC and SATA disks)?

2. The Windows 7 reference architecture seems a little over the top (i.e. 12GB linked clone capacity per desktop and 15 x 450GB FC and 7 x 1TB SATA) - what are the algorithms for sizing each component?

3. How would you scale the Windows 7 reference to support 1,000 or 2,000 users?

4. At what point does the 100GB FAST Cache on the NS-120 become a limiting factor?

5. The disk sizing suggests a peak IOPS of 875 - how is this possible from a desktop that would in the physical world have a single SATA drive that would support about 50 IOPS?

6. The disk sizing calcultations do not take into account the write penalty for RAID 1/0 - is this a mistake?

Many thanks
Mark

Dustan Terlson

Do you have a link to the powerpoint slides for the charts? I'm very interested in the high resolution version of those.

Thanks!

Tomi Hakala

Chad, with VAAI assisted locking, is there any reason not to use maximum size 2TB (minus 512bytes) datastores only? So can I just put as many VMDKs per datastore that will fit in it and not to worry about locking.

Mark Burgess

Hi Chad - a couple more comments on the 500 user Windows 7 reference architecture:

1. Why was it done using FC - is this not overkill for 500 users?
2. What sizing has been done around host bandwidth - at want point are you likely to need to move beyond GbE?
3. What about performance of iSCSI v NFS - which one is better for View?
4. Am I right in thinking that VAAI hardware assisted locking is not required for NFS as it has always had equivalent functionality?

Many thanks
Mark

Tami Booth

I too would like the slides - the link isn't linked. Additionally, I'm trying to come up with a list of questions to best help architect the solution. Maybe the VDI tool referenced would help, but can't find it either..

Thanks
Tami Booth

Chad Sakac

@Tami - sorry about that, was underwater with VMworld stuff, posted, knowing that I would go back and link the high-rez stuff and PPTs, should be there now.

The tool we use right now is internal only. Pushing like mad to get it posted externally. If you;re an EMCer or EMC Partner, please ping your local vSpecialist. Please bear with us.

Chad Sakac

@Mark - thanks for your questions!

1) FC is still the dominant protocol used in vSphere deployments (though iSCSI is the fastest growing, followed by NFS). As Vaughn and I covered in TA8133 @ VMworld, the real question of protocol is "leverage what you've got, leverage what you know, and if you're deploying greenfield, strongly consider 10GbE converged".

2) ESX host-storage bandwidth is really not the bottleneck in the VDI use case (either View or Xen on ESX) in the VAST majority of cases. In almost all cases, you are IOps constrained. the majority of the client virutalization IOs tend to be small (4-64K )unlike for example a backup or a guest doing datawarehousing (which tend to be in the 256K+ IO size range). If you do some quick math, assuming 20 IOps per user, and 8K IO size on average, each user will be driving about 160KBps. That means that about 500 users will saturate a 1GbE link assuming 80MBps unidirectionally (100% read or 100% write).

3) iSCSI vs. NFS - the battle is REALLY over. Pick what works best for you. Historically there was the question of locking - which while blown out of proportion in most cases - there are use case which drive periods of very busy metadata updates. As you note in question 4 - VAAI makes this similar across VMFS (all block protocols) to NFS.

NFS bigots say "VAAI does nothing but make VMFS catch up to what NFS has always had!". I say, that's partially correct, but of course, NFS needs to catch up with the more robust path scaling (NFS v4, v4.1 and pNFS support will bring this), and more robust failover behavior. It's a silly exercise to argue protocols when there are much, much larger and more important design decisions.

The question of protocol is RARELY the thing that makes client virtualization projects succeed or fail, rather it's the end-to-end system design, and finding the right use cases.

That said, we do have an analagous doc were finishing which is an all NFS based design, so customers can deploy what works best for them, as of course we support both.

Robert Kadish

one question - what LWL report did you run to get your IOPS?

Robert Kadish

Chad,

I recommend running the average peak IOPS report in LWL UX. I think it will change the sizing model.

Robert Kadish

If you are wondering why I'm asking which report read through this Citrix blog including comments. It highlights the dangers of sizing based on Average IOPS.

http://community.citrix.com/display/ocb/2010/08/06/Saving+IOPS+with+Provisioning+Services

Robert Kadish

Your numbers are very confusing. You say this is for XP yet the average IOPS says windows 7? Since most companies are moving to windows 7 and using VDI as a tool to move from XP to 7 why would you test XP?

Looking at the NS-120 array layout you say that it can handle 2250 desktops. Is that windows 7 or XP?

You mentioned the array handled 13,000 IOPS at peak when you were concurrently booting 500 desktops within 30 minutes. How many desktops did you boot per minute?

If this is a configuration to handle 2250 non-persistent desktops wouldn't you need to test a much greater number of desktops booting. The reason I say this is the user configuration your testing suggests a call center where all users would be booting at almost the same time. Probably 3 or 4 times during a 24 hour period. (Shift changes) So the number should be more like 1500 users booting within a 15 minute period.

I know you mention this is not a marketing document but using tools such as Login VSI are not indicative of the real world. For example what is the effect of a cached I.E. session vs. an I.E. session which is not cached. Not to mention that using VMwares new Data Disk could have a huge impact on IOPS.

Also the concern companies have when looking at Average IOPS is they have no way to control how often their users login and out, reboot, and run resource intensive applications.

I mention this because corporations are using reports like this one to size their windows 7 environments and assumptions will most likely lead to a lot of pain.

Chad Sakac

@Robert - thanks for the questions, and sorry for the delay in approving your comments. The deluge of spam made me kick-in comment approval (just to keep the filth off), but that means I need to be more diligent in rapidly approving anything from a human.

Like I noted (in big bold italics) - MARCHITECTURE WARNING! Using average IOPs is indeed dangerous. The purpose of the XP document was to see "how low can we go?". The use case is real, but VERY narrow. Non-persistent, kiosk-type use cases (called out throughout the doc). In those use cases, XP is still very much used, and the client IOps profile is very different that you or I.

If you look at the second document, it's much more around the design center youre describing. 500 users on a similar config (representing therefore a higher cost/client), Windows 7. There, we used the 95th percentile IOps which was 37 IOps. The efficiency technologies still applied, and the cost was 60% lower than it would be otherwise, but it's still in the $100-$120/client range as opposed to $38/client for the light kiosk-style worker.

If you look through the post and the docs again, I think you'll see I tried to be VERY explicit about the different user workloads.

I will say that the mass boot isn't the problem it used to be. With large caches you can get today at low cost-points on EMC and NetApp storage models, as soon as the first one boots, the remainder are largely handled by cache. The tricker periods are the patch/AV periods in the client. In those cases the caches (EMC FAST Cache as an example) still help, but less so. Patch impact can be usually mitigated through app virtualization but not eliminated, and AV can't practically be eliminated via any method, though NAS-based AV can mitigate a lot of the scan against user content - but only if you're able to give up check-in/out.

Ok - now onto load-generation tools...

I reached out to one of the primary folks who worked on the tests, and here's his commentary:

"
1) RAWC – developed in house by VMware - not really used too widely
2) LoginVSI – developed by LoginConsultants.com – this benchmark is very CPU intensive, but not disk resource intensive, so it makes for a great server benchmark, but not so great IO generator
3) Scapa VDI Benchmark – I have no hands on experience with it, but Cisco is using it for the View 4.5 RA they are doing
4) View Planner – Similar to RAWC, but supposedly less resource intensive to set up than RAWC (based on virtual appliance). (Chad's note: my comment is that this isn't out yet).

The View 4.5 PSG used LoginVSI for the load generation. All previous RAs used RAWC. "

Long and short, creating these sorts of workloads right now is REALLY HARD, and none of the tools are perfect.

I would agree with your feedback if the sizing guideline was based purely around the XP document and we said "apply this broadly!"

I disagree if you look at both docs, and for the use case you describe (Win7, non-kiosk use) leverage the second one, which is designed for that purpose. Used that way, they won't lead to "a lot of pain", but will (IMO) reduce it - as they are more explicit (most people don't even think about IOPS until the pain hits).

Thanks for the feedback!

Robert Kadish

I'll look - which doc are you calling the second one?

The comments to this entry are closed.

  • BlogWithIntegrity.com

Disclaimer

  • The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by Dell Technologies and does not necessarily reflect the views and opinions of Dell Technologies or any part of Dell Technologies. This is my blog, it is not an Dell Technologies blog.