Mucho going on in View (and more generally VDI) land. My first part I was posted here.
If you’re interested in a quick catch-up, read on…
View 4.5 beta
The existence of this has been discussed by others (here, and here) – I will neither confirm nor deny. What I can say is that the ongoing march of improved simplicity, scale, function in the hosted virtual desktop use case is well underway, and that every day, more and more customers are starting to embrace it.
I’m part of the internal EMC View 4 pilot rollout. For me personally, Windows 7 and check-in/out is a huge deal (and neither supported in View 4). Then again, I wouldn’t describe me as the idea target user (at least right now) for Client Virtualization (as an extremely mobile laptop user, who is also constantly building my own environment – I’m using Office 2010 beta also right now as an example).
That said – expanding the use cases out to these types of users are very important.
I do use it all the time – but not as my primary machine. How do I use it? The vSpecialist team uses View as the front-end to much of our demo lab gear.
If you want to hear about our own experiences with View 4 in our internal rollout – their was a recent EMC IT webcast which was recorded. You can see that here.
VMware View Launch Tour
The VMware View/EMC Launch tour is coming to a city near you… I should have posted this earlier, as there were dates I missed, but there are several left in Canada and the US. You can register here.
The “who’s better for View vendor histrionics” continue…
At Partner Exchange – my colleagues from NetApp claimed that “9 of 10 customers using VDI use NetApp” . As you can imagine, that caused some eyebrows to be raised. Vaughn reiterated that claim recently at Mike Laverick’s blog/”chinwag” here… (35 minutes in) as well as making some comments about me (calling me “Rupert Murdoch” for hiring good folks, and EMC the “Evil Galactic Empire” about 10 minutes in :-)
BTW, correcting one comment in the recording – as EMC, I can’t hire from partners who are focused and committed EMC partners. But I will also say this – people are the thing that make the world go-round, and I’m certainly OK about taking top-notch people from competitors and competitive partners!
Vaughn is good guy, and you can hear that in Mike’s podcast (which I would recommend listening to). He and I are both passionate about virtualization, and love our gigs. I personally agree with almost everything he said, particularly around “stack”… and disagree vehemently with the characterizations about me and EMC, View market share, and implying that the VCE Coalition (integrated tech, selling, support, and joint venture) and the partnership between Cisco and NetApp are equivalent. but then again, you’d probably expect that :-).
First things first… there is no data (none) to support the NetApp claim. I was going to let it go and not respond publicly, but it keeps getting broad up, so…
While I think NetApp has great technology, and a strong View go-to-market, this claim didn’t sound right to me, so I pinged the View product team. Their comments:
"based on our VDI run rate and overall VDI market size, also supported by what we are hearing from our software partners, we don't see how this can possibly be true."
On a similar thread, I also ran into several customers in Southern California who were told by NetApp that “VMware runs their internal production View deployment on NetApp”. This is also not correct. If anyone would like to the VMware IT folks about what they deploy internally, I’m happy to arrange that discussion.
NetApp has solid solutions and is a fine company – I don’t think they need to do these things. I suppose in some way it’s effective, as it forces me/us to spend time correcting things.
IO scaling in the VDI use case
More on this front… The question of IO scaling point is very real, particularly at larger scales (thousands of clients) and came up at the two biggest financial customers in NYC who I visited last week
To recap: "while the focus on VDI storage costs tends to focus on $/GB (as people are not knowledgeable about storage in general think in “GB”), the question of cost of configuration that supports the IO load through the lifecycle of the client community is often governed by many factors – including BOTH capacity and performance as well as functional use cases”.
Vaughn’s posted on his blog here some comments about the VMware Express van (the van itself is very cool – I would highly recommend checking it out if you can). I think it was interesting to see the comments in response to the thread.
Personally, I was blown away by the claim Vaughn initially made about “12U, two disk shelves, 5000 users”
So, just like any extreme claim (like the incorrect statements in the section above above), I did some digging. I happen to know the fellow in the VMware GETO team he was talking to, and I asked him. His comment: “we currently have enough *compute* capacity to run around 5000 desktops”. He also took umbrage at being taken so out of context, but I won’t post the rest of his comments as they weren’t nice :-)
I did also personally followup with Vaughn – as a statement implies that 28 disks could support 5000 VDI seemed… “off”.
To understand why they seemed “off”, quick back of napkin math assuming the users drive 10 IOps on average, and assuming a 50/50 read/write mix, that’s 25,000 random read and 25,000 random write IOs per second coming into the storage array. Assuming no magic (and there is some magic) between the host and the back-end disks (this is the most conservative assumption) – this would mean each of those disk drives is able to do 1785 random IOps :-) This is about 10x the amount you would expect them to do. Array magic can make the backend do more than it should be able to do with no magic.
So, let’s talk about array magic.
- Caching techniques… To understand cache better (in a generic “across vendors” way) I would highly recommend this post, which is correctly title “Storage Caching 101”). These get positioned furiously as magic that solves everything.
- Read cache can help, but only help in some of the desktop’s life-cycle –in the view case, the biggest win is in reading the common blocks of the boot disks. Vaughn points out in his post a partner’s data (here) that shows a ~38% improvement in the bootstorm case based on dedupe and PAM. Conversely, read cache help far less for things that are more random (remember that a read-cache doesn’t help on the first read, only on the second). So – read cache impact on patching, far less. Reading the more random elements (apps, OSTs, far less).
- Write cache can help in the VDI use case, particularly in absorbing periodic bursts.
- Greater cache efficiency comes from a) more efficient caching algorithms which squeeze more utility out of the same amount of cache; b) deduplication/compression of blocks in the array means you fit more of them into the read cache; c) deduplication of the files above the array (aka composition techniques) mean that the base replica only points to a small number of blocks, meaning they fit into the read cache – aka View Composer or Xen Desktop mean that the read cache on most current midrange arrays can contain the bulk of the boot data; d) application of very large, low-cost cache models.
- Write Caching means you can coalesce and restructure IO patters… When the host I/O write is decoupled from the array actually writing the IO, a bunch of optimizations can happen.
- Restructuring IO patterns: the amount of “randomness” can be reduced if you accept that back-end structure and the front-end structure don’t match. The idea of “Locality of Reference” (LoR) is slowly dying out – thin-provisioning schemes, deduplication, and journaling mechanisms (which transmogrify the IO pattern to try to maximize sequential magnetic media performance). Are there still cases where having the host “view” of blocks and the array “actual location” being related helps? Sure, but it’s a declining set, and some of the disadvantages of ditching LoR (performance losses in narrow use cases that are sequential read dominated) tend to overrule the advantages (large cost savings). It’s handy if you can do both, but that usually comes at the cost of more operational complexity.
- Coalescing: as write IOs come down from the host, the write cache means they can be “batched” from smaller (think 4-8K) host IOs to larger (think 64-1MB) backend IOs, reducing the backend effect.
All arrays have technologies that apply in varying cases to the above.
After the outreach about the number of disks, Vaughn asked some of his technical folks and they ran it through the NetApp sizing tool for VDI. His correction on the post (thank you, Vaughn – and BTW, there have been times where Vaughn has corrected me – like he found an error in our VM alignment docs that got fixed) is that for a 4IOps/user, 50/50 (read/write mix), the configuration would need 56 spindles, not 24. I’m assuming these were 15K drives.
I will note that 4IOps per user is much lower than I’m seeing in practice. In practice, I tend to see 8-15 per user, and it’s not unusual to see 25 at peak…. But – let’s continue with an assumption of 4. If it were larger, the number of drives just scales up.
This makes more sense, as that number translates to 357 IOps per drive (4 per user, 5000 users, divided into 56 drives). This is something to be proud of because absent “array magic” max for a 15K drive would be around 180 IOps – so if they can get 360 “effective IOps” that’s a 2x improvement. Of course, I want to note that the calculations involve no RAID loss considerations.
If a vendor (NetApp or EMC – we do seem to be the ones most focused on the VDI use cases) says they can do a 2x “array magic” reduction in IO in their processors that sit between the host and the spindles, lean forward and listen. Poke at it a lot – because 2x is still a big number, but listen. If they claim 10x “array magic”, walk away slowly – they are dangerous to you as the customer, and themselves as the vendor :-)
Here’s my advice:
- Step 1. Figure out how much you need for your performance envelope,
- Step 2. THEN figure out the capacity angle (drives * capacity/RAID loss * capacity efficiency gains that come from user data dedupe/VMDK compression/thin and more) and see if you need more.
There’s a certain break-point where configurations tend to be capacity-gated vs. performance gated.
Then, go through and look at the cost-to-serve a desktop with that given configuration.
When you go through that process, I’m confident that EMC’s VDI solutions can prove competitive cost-to-serve-a-client (that and functional use cases, and availability requirements). I know because I (and my team) do it every day.
BTW – the way (IMO) to achieve order-of-magnitude capex improvements are to:
- Spend a little time profiling your users. You can use VMware Capacity Planner. I dig a tool from LiquidwareLabs for this purpose (does a lot more, but is not free). You’ll save a lot of money by doing a LITTLE up front design. knowing this answer is critical to Step 1.
- increase the user density per blade (this is one of the largest economic drivers in scaled-up view use cases as it affects all factors, including VMware licensing). Today, this means optimizing around RAM per blade for the most part.
- In the near future, an order of magnitude improvement may be possible by placing vSwap on SSD – expect more support for this from VMware later this year. If confused, think of how much it costs to put 128GB of DRAM in a blade. Then, lookup the price for a consumer 256GB SSD. Then, do the math, and extrapolate to mid-summer. If further confused, read this post. Note that vswap on local SSD performed almost as well as no overcommit.
- Look if you can apply composition and application virtualization techniques – while impacting how you deploy and manage your clients, this is a huge win on both capex and opex, even if you apply it to just SOME of your users.
- Put user data on cheap, deep, deduped, compressed, SATA-based NAS storage. Offload anti-virus to that platform, and use that to make archiving and backup orders of magnitude cheaper.
- Apply VMware’s guest best practices. These can take you from 25 IOps per guest to 10 by making simple, and free changes (silly little things like turning off guest defrag and disk layout optimization, and making swap a fixed size, and avoiding vswap – if not using SSD :-)
Then, move on to “tens of percentages” optimizations.
- Leverage read cache to deal with mass reads against the same blocks.
- Leverage write cache to let the array do some magic, but remember that you must be able to sustain the baseline steady state load (or even long bursts).
- Squeeze out more by putting the base replica on EFD. If your array can auto-tier, this will happen automatically. If not, future View composer releases will allow you to put the base replica in one datastore, and the linked clones in another.
- At the end of this process, further capacity savings for VMDKs may apply via VMDK dedupe and compression.
Remember – whether it’s VMDK block level dedupe (which regularly shows 90% capacity savings), or VMDK compression (which shows 40-60% on top of thin), the question isn’t how much capacity is saved, but rather how much the total configuration costs, and the $/VM, physical space/VM and so on..) You should challenge every vendor (this certainly applies to EMC) to express the solution in those types of metrics. Not in any given feature.
Here’s data from lab analysis of the effect of the base replica being on EFD…
Here’s data from a customer (in this case a Citrix case of user.dat files where behavior is governed by CIFS operations per second, but applies similarly to the View use case). Here we have an NS-960 with 5 EFDs, compared with the older NS-80s with 20 15K drives.
Not only were the NS-960s able to sustain 80K CIFS Op/Sec (which is a LOT) - the EFDs were able to do 33x more random IO per backend spindle, and do it with a 1.5ms response time rather than 93ms response time at the filesystem level observed on traditional 15K RPM disks.
I hate to be so pedantic about this topic (VDI storage design), but it’s important to me for the reason I mentioned back in this post. This IO density question is the number 3 reason I see View projects not starting (#2 being total TCO; and #1 being client experience), and #1 reason why they go sideways in late stage scale-up. Bunch of interesting startups here – Atlantis and others….
These two blog posts discuss this in good detail and additional perspectives:
- Duncan’s IOPs post: Duncan’s blog post (and comments)
- Travers Blog: VMware View, Virtual Desktop Infrastructure
Thanks for setting the record straight Chad.
Sometimes, I wonder what happens at NetApp. They've got a good product (shouldn't that be enough?), but these continual overstatements erode credibility for all of us.
And that's not good for anyone ...
-- Chuck
Posted by: Chuck Hollis | March 10, 2010 at 04:59 PM
Just a quick correction on an otherwise great update! View 4 is licensed on the number of desktops managed, not per physical host - it helps tilt the TCO equation even more towards storage related costs.
Posted by: Dave B | March 10, 2010 at 07:06 PM
@Dave B - agreed, but vSphere ESX/ESXi is licensed per host :-) That's what I meant, but thanks for the clarification.
Posted by: Chad Sakac | March 10, 2010 at 07:20 PM
I think what Dave B. means is: for example when you purchase a View 100 pack, it includes vSphere AND View licenses for 100 concurrent users. It doesn't matter if you spread those 100 users across 5 vSphere hosts or 6 or 10, because you get "unlimited" vSphere licenses for *desktop* use (this is key as you are NOT allowed to run server workloads on vSphere "for desktop" licenses per the EULA). You can install vSphere on as many hosts as it takes to support your 100 users. This is a recent change in View 4 compared to View 3. So technically, the number of view users on a vSphere host has no bearing on licensing TCO in View 4 when purchased this way. In View 3 you got a fixed number of ESX licenses, in View 4 you do not have that restriction with the bundle. You can choose to purchase View as an "add-on" license which does NOT include vSphere licensing only View-- this lets you mix desktop workloads in an existing vSphere server virtualization environment-- useful for small environment that have excess capacity in their vSphere environment and want to introduce some VDI in there.
Again, its just licensing semantics, and probably better left for sales guys :) and not really relevant to the overall technical post, but IS relevant to understanding VDI TCO (VMware have made it more capex "friendly" with this View 4 licensing scheme), so I thought I'd mention it for posterity.
Posted by: Vijay Swami | March 10, 2010 at 08:45 PM
@vijay @ Dave B - thank you for your clarification. I think I need to go back and review the licensing changes :-)
Most of the View deals I'm involved in the customer has a ELA and is looking for a View ELA addendum, or the View stuff is simply incremental standalone licenses, which may be borking my understanding.
BUT, the point you're raising is valid. Will go back and understand the changes in licensing better.
Thank you!
Posted by: Chad Sakac | March 11, 2010 at 11:01 AM
Good luck with the licensing manual - even our sales guys have had issues... ;-)
(There seems to be a sweet spot of 32 or 64 vdi guests per host - we originally were trying to cram guests on hosts before we realised how the licensing works. Just wanted to make sure google had it indexed for anyone else that hits this!)
Posted by: Dave B | March 13, 2010 at 09:30 AM
great article. IOPS is everything in VDI. Quick Note View 4.5 will have storage tiering. This will compleatly change the layout and storage design.
Posted by: Creedom2020 | March 14, 2010 at 04:49 AM
Chad,
Ouch! You're being rather critical here on a number of points.
Regarding market share I'm surprised with your comments as you and I share emails which include market share estimates from a third party which is in the know. Dare I suggest you have cherry picked the quote you have in this post to fit your needs?
On accuracy, we did have one promote a market share and incorrectly cite the source as VMware. As you know, we have addressed this issue.
On the subject of accurate communications I find it ironic that in this post you state "the VMware View/EMC Launch tour..." I trust this is a typo and not a deliberate misrepresentation of the VMware View Launch tour.
http://info.vmware.com/forms/7903_REG?src=KSVIEW4_EMC
Maybe you could clean up your post (for accuracy sake)
I really like the foundation this post provides around storage caching techniques. Unfortunately you fails to cover advanced caching technologies available to non-EMC array platforms.
Any reason why you would misrepresent your competitors technology? Again, I don't believe this is your intent; however, I would advocate that you ask for some feedback before you attempt to tackle storage technologies that are beyond EMC's capabilities
Transparent Storage Cache Sharing is HUGE with virtual infrastructures. Check out this post, it will begin to demistify the 'magic.'
http://blogs.netapp.com/virtualstorageguy/2010/03/transparent-storage-cache-sharing-part-1-an-introduction.html
Cheers,
V
Posted by: Vaughn Stewart | March 17, 2010 at 12:11 AM
@Vaughn, I'm sorry if it hurt, Vaughn. I told the NetApp folks at PEX (ask around) I was justing going to let the crazy claims slide unless it kept popping up, but that claim has no support, and did keep popping up.
Next - you and I exchanged emails on the earlier dialog with the View team, and some data came back that linked View and share with DR use cases. You correctly stated that View+DR may not correlate with DR, so that's not the thread I used - because I agree.
I asked the View team directly what they felt was a fair comment. They aren't saying Netapp doesn't have a great solution.
They are saying that a "9 of 10 customers" is "based on our VDI run rate and overall VDI market size, also supported by what we are hearing from our software partners, we don't see how this can possibly be true."
Heck, they aren't even saying "NetApp isn't a leader in this space" (I'm not saying that either), but simply that a claim of 9 of 10 customers isn't supported by any data of any kind, and doesn't seem reasonable.
Next - the VMware View launch tour is sponsored by parties (sometimes NetApp, sometimes EMC, sometimes both, sometimes others). The link I provided are the EMC sponsored locations, hence VMware/EMC launch tour. If it helps, I will remove that comment.
Next - If I have done any misrepresenting of PAM-II and the impact of block de-duplication on PAM, forgive me and correct me, but please be specific.
Note that in my caching section, I give several props to NetApp (and direct them to you, where you are free to say whatever you would like), but also point out there are other ways of solving the problem.
I think that cache (read and write) efficiency (just like non-volatile storage efficiency) is an ongoing effort across the storage vendors, and glad to say we're delivering a lot there. I DO disagree that all workloads can be served via a read cache backed by SATA, regardless of efficiency. But in the end, customers choose.
NOTE IT HERE FOR THE RECORD. EMC's view is that flash technology will exist on servers, as cache (read and write) and as a non-volatile storage tier. Like most things, there are more than one answer, more than one use case.
Noted for the record?
Speaking of "misrepresentation of competitor's technology" I'll refrain here from pointing out consistent claims made that I think are incorrect regarding vCenter integration (and VMware integration in general), and will make them on your blog, where you can comment. I have noted them several times, but they continue.
Posted by: Chad Sakac | March 17, 2010 at 07:09 PM
Chad,
I'm not sure how we are so far apart in terms of IOPs per desktop. I cited 4-8 in my post, you cite 8-15. Maybe we could cite a 3rd source in order to better educate your readers?
http://blogs.netapp.com/virtualstorageguy/2010/03/vmware-community-podcast-85-vmware-view-with-john-dodge.html
Cheers
V
Posted by: Vaughn Stewart | March 24, 2010 at 02:39 AM
That's quite an informative Vmware update, thanks for the thorough research!
Posted by: Server Virtualization | November 30, 2010 at 01:34 PM