This was an internal thread that was very useful, so asked the PowerPath/VE team to post it to Everything VMware at EMC… Here’s the simple and easy way to add PowerPath VIBs to an image:
For internal EMCers, for EMC Partners, and for EMC Customers:I can’t recommend highly enough to use Everything VMware at EMC (EV@E) – when we share info there (as opposed to internal mechanisms, or emails).
After all – that means Google finds it. That means a lot to getting good info OUT to help people.
One popular topic of discussion in VMware and storage land is the topic of how VMware is affecting core storage architecture design principles.
I commonly summarize this via this slide:
On the left you have “Type 1” transactional (defined as being good for small block random workloads that tend to characterize many VM workloads) systems that are “classic” clustered heads. Think of EMC VNX as being an example (and you can think easily of their competitors) on the left.
Typically a great choice at moderate scale, they often are easy to use (since the architectures scale down relatively well), and have the most “swiss army knife” (ergo do a bit of everything) characteristics (as being one of the most mature architectural models). Their downside tends to be in failure conditions. Since storage is generally “owned” in some way (LUNS, filesystems), the failure behavior to the workload is invariably more complex. There are great solutions to mitigate this (ALUA, NPIV, host failure handling), but it is an architectural thing. There’s also the challenge of balancing workloads across brains, and across storage platforms as you get bigger and bigger.
In the middle you have “Type 2” transactional systems. Think of EMC VMAX as being an example (and you can think easily of their competitors). They “scale out” in the sense that any IO can be served up and through any port, any “brain”.
These tend to pull away for customers as they get larger for several reasons – failure modes are much cleaner (as the storage device – block or NAS) is visible broadly. Done right – the storage model also deals with balancing workload across a broader set of pooled resources (network, memory, CPU, cache, etc).
On the right, you have the much more loosely coupled object storage models. Think of EMC Atmos as an example. These are usually pure software with no hardware dependency (and Atmos indeed can be layered on anything as a virtual appliance).
They tend to be very cloud-like, awesome characteristics for supporting a lot of next-gen web apps – but NOT good for transactional workloads (and therefore hosting VMs). Making these systems transactional (with gateways, caches and the like) takes away their fundamental strengths.
When I present this, I’m commonly asked: “why do you say ‘NAS emerging’” in the middle column?
The answer is twofold:
First, you can’t call it a scale-out model if all the traffic for a given thing ends up at a single brain, a single interface for a given datastore. This is the case with NFS deployments in vSphere 4.x and VI3.x.
Second, scale-out NAS systems (even the best) typically have transactional latencies that are about 2x-3x the latencies of “Type 1” block and NAS systems, and “Type 2” block systems. They also typically have a higher practical $/IOps also. These aren’t functions of “pricing models”, but actually are reflections of underlying engineering challenges that are substantial. You can see this very clearly in the SPEC SFS results.
BUT – MAN, if you could scale-out that way… AND have workloads that are a fit, wouldn’t it be cool if….
…well, at VMworld 2011, in SP03977 we showed that you CAN.
Ok, so let’s take a look at the characteristics of true scale-out NAS model:
Multiple Nodes Presenting a Single, Scalable File System and Volume – a single volume file system from multiple nodes. This is obvious when you think about it. You want a filesystem to be a volume spread across all the nodes, all the way down to a single file.
N-Way, Scalable Resiliency. This is also obvious. There are the clear challenges where RAID double, triple, and other disk-level parity schemes fail as disks grow AND fileystem objects become petabyte scale. This is why erasure coding techniques are generally used at that scale. You also need a model where any one of the nodes can support IO for failed node(s), because BOY that would make failure behavior simple and elegant…
Linearly Scalable IO and Throughput. This is important. If you are using a globalnamespace, but then directing the file-IO to a single node that hosts the file – you’re not spreading the load across a “big global pool”, and invariably failure models are more complex.
Storage Efficiency. This is an outflow of the items above. In a true scale-out model – there is NO SUCH THING as “this data is here”- it’s everywhere. That means that there is no “balancing” (unless, for whatever reason, you WANT to do that).
In vSphere 5, the NFS client is still NFSv3. This means that for a single ESX host, at any given moment – all the traffic for a given ESX host uses a single TCP connection. BUT there are improvements where if you use a DNS name, it will leverage DNS round robin to access this across a pool of IP addresses. (yes, NFSv4 and more are coming in the future)
Put this basic fundamental change together with the core architecture of EMC Isilon (which can distribute that same filesystem not only a set of IP address like VNX and NetApp can, but unlike those more “classic” models can do it across a LARGE number of “brains” where the underlying fileystem is completely distributed) and you get what we did in this amazing demonstration:
What did the demo show?
That the configuration is EXTREMELY simple.
That as ESX hosts are powered up, they automatically balance the load across nodes.
That as filesystems are created (via our simple VSI plugin without leaving vCenter), they automatically balance across all the nodes. Same is true as filesystems are grown – up to a current limit of 15PB for a single filesystem.
That as additional Isilon nodes are added (to grow IOps, MBps, and capacity), all loads are automatically balanced. BTW – this could be up to a current limit of 144 nodes.
That as complete nodes are taken down, there is no disruption or failover behavior (since the filesystem is presented via all nodes all the time) at the vSphere layer.
Is that kick-a$$ or what?
It’s notable (and please, folks, correct me if I’m wrong), there are no mainstream scale-out NAS competitors for whom their behavior would work like we showed in this demonstration.
This is what makes EMC Isilon a KILLER choice for customers today who are deploying VMs that are OK with that only other remaining caveat - 2x-3x higher transactional latency. Think 10-30ms as opposed to the 2-10ms you would see for block or NAS workloads on VNX or block workloads on VMAX. Also, the $/IOps characteristics for EMC Isilon mean that the fit is best for VMs that are relatively large, but do a relatively small number of IOps.
Now if you put that all together 10-30ms and capacity vs IOps is perfectly OK for many VMware workloads today – test/dev, many vCloud director use cases, vFabric Data Director use cases, etc.
Imagine if hypothetically, we were able to make Isilon work well with transactional workloads… :-)
Ok – this could be a REALLY long post – or a short one :-) Dear readers, you know me well enough at this point to know that I suck at short. Also warning – while this was crazy cool tech, there’s opinions littered in this post. With those disclosures out of the way….
…So, I’ve commented before about all the interesting startup action around storage and the trends of flash and virtualization. I’ve also made the observation that (coming from a startup) that storage startup land is VERY tough. You’re talking about persistent data – one of the most difficult things to displace.
That means that to succeed, you need to innovate fast (something startups are very good at), and you need to find a place where the big guys are a little asleep at the wheel. You need both because being a “little” better isn’t enough.
Having one standout feature will win you some customers, no doubt. But – the key is that with massive R&D/M&A budgets, and broad technology portfolios to bring integrated value together can be very compelling – and that’s something the big folks can do.
Don’t take this as arrogance on my part – I have a huge soft spot for the startups, and that is a land of incredible energy and innovation. There are also great examples of startups that broke through. While mine barely made it to acquisition, I remember it very fondly.
Speaking for myself and what I see in EMC – there’s a hyper-awareness of disruptive technologies, and a willingness to cannibalize ourselves where it’s the right thing, and at the right time. We believe in Andy Grove’s “only the paranoid survive” mantra.
So – with all that said – what the heck did we show that would result in THAT intro?
At VMworld 2011, session SPO3977 was called “Next-Generation Storage and Backup for Your Cloud”. We discussed the current state of the art around backup and recovery in the VMware context – which is about vCenter integration, the vStorage APIs for Data Protection for agentless backup and single step file level restore, use of Changed Block Tracking to accelerate BOTH backup and restore. That, and of course the fact that source and target dedupe approaches are now universal “gotta do it” capabilities.
But then we looked a little further out. There is a “weakly addressed” (in vendor speak – in customer speak, I bet they would call it “non-addressed”!) use case – which is vCloud Director.
To backup and restore vCD is not simple:
You have to backup the vCD database.
You need to backup the core vCenter structures which reflect the vCD structures (resource groups as an example)
You need to backup the vApps themselves
You need to backup the vCD catalog.
You need to be able to restore – respecting the core structures of multi-tenancy – after all, you need to restore the objects backed up in the list above without affecting adjacent tenants.
You need to be able to offer this backup/restore service to the tenant themselves – after all, cloud is all about self service.
Ok – to be clear, EMC does have a good answer to the above (and have done it for service providers) based on Avamar 6. It involves scripting, on-site integration and customization (read “not out of the box”). Core lessons learnt have been written up in this whitepaper (which includes big parts of the solution, but not all):
But… Wouldn’t it be cool if instead of doing it that way – we made it SO tightly integrated that it was PART of how vCD worked? If it looked the same? If it naturally linked in to all the core vCD structures at the API level – and fully understood and respected multi-tenancy? That Backup and recovery became a natural part of vCD? That it was all exposed programatically via APIs (another cloud pre-requisite)?
Of course we’re working on it :-)
Here’s the technology preview demonstration we did in the session:
I think the work is amazing… Feedback welcome! Is this something you would like?
One thing we’ve been working on with VMware for years now is around making the virtual, cloud world more secure than the physical one. I’ve said it before, and it’s worth saying again – the view we share is that security will be need to evolve, and will be disrupted by virtualization.
policy enforcement will need to become “part of the virtual infrastructure” and become very close to the information – because the the environments will be so fluid, so elastic – “clamping devices on the network choke points” simply won’t work.
If this isn’t immediately obvious, think about someone taking a VM and the data that constitutes it and using vCloud connector to move it to a vCloud service provider as an example. If enforcement of the policy doesn’t “follow the compute/data” – you’re hosed.
The fruits of years of labor are showing up – the most recent example being the collaboration to embed data security, and checking for compliance into the vShield 5 App capabilities.
If you are a person responsible for security, or have every been audited, or have been told “we can’t virtualize this because we can’t audit” – this demo (shown in Pat Gelsinger’s supersession – SUP1006 during the “Chad’s World” bit) will knock your socks off.
That demo shows:
How easy it is to check for compliance against global standards.
How easy it is to catch “data leakage”
How vShield 5, vCenter, ESX all integrate with RSA Envision to provide an end-to-end view across the datacenter (physical, virtual – everything)
How they all can tie into RSA Archer for a Governance, Risk, and Compliance dashboard and automated workflows (to isolate/remedy/notify).
Remember – that this all is simplified by orders of magnitude as the workload is a virtual machine. The next time someone says:
“we can’t virtualize/go to cloud because of security”
…the right answer is:
“we should virtualize/go to cloud BECAUSE of security”
This is the sister post to the one on a new bandwidth record – here.
At EMC’s mega-launch in January, we commented that 2011 would be the year of EMC breaking records. We weren’t kidding. See here, here, here, here, here, here, here, and so on,….
So.. With the release of every major vSphere release, the EMC and VMware performance engineering teams get together and brainstorm: “what would be a ridiculous, over the top test to see where the performance envelope is today?”
This time – the gang at VMware (Chethan Kumar and others) and EMC Symmetrix Performance Engineering (Dan Ahroni and others) said “lets see we can break the 1,000,000 IOps barrier”. After all, you get to keep saying “one meeeellion” with the Dr. Evil pinky :-)
So they got cracking in the lab in Hopkinton. A few weeks later – here you have it:
The new world record benchmark for IOps through a reasonable vSphere 5 configuration is 1,000,000 IOps.
That’s around 4x the previous record. This is the story behind the story :-) Read on for more.
This is the sister post to the one on a new bandwidth record – here.
At EMC’s mega-launch in January, we commented that 2011 would be the year of EMC breaking records. We weren’t kidding. See here, here, here, here, here, here, here, and so on,….
So.. With the release of every major vSphere release, the EMC and VMware performance engineering teams get together and brainstorm: “what would be a ridiculous, over the top test to see where the performance envelope is today?”
This time – the gang at VMware (Chethan Kumar and others) and EMC (Radha Manga and others) said “lets see how much bandwidth we can drive through a reasonable config”. So they got cracking in the lab in Santa Clara. A few weeks later – here you have it:
The new world record benchmark for bandwidth (MBps) through a reasonable vSphere configuration is 10GBps.
That’s around 4 the previous record. This is the story behind the story :-) Read on for more.
I blogged on the EMC support for View 5 via reference architectures and View 5 testing here. I made a comment about the difference in the core value proposition of the traditional (and still very fine) technology acquisition model of “mix and match” (best of breed, near infinite flexibility) versus the value proposition of converged infrastructure (acceleration of business value).
If it seems weird that I bring this up again – it’s because some folks (ahem) continuously try to suggest that we do one or the other. The reality is we do BOTH. We give customers the choice, and some prefer one model over the other.
Now, let’s examine the VCE new product announcement of the VCE FastPath solution for View.
Look at the core problem they are trying to solve:
This is very different from the problem we’re trying to solve for with the VNX based reference architectures – which is to optimize the set of components to get the densest user design with the minimum infrastructure (ergo optimize best of breed components). We do that – answering things like “how can you minimize the cost of storage for View"? (answer more than anything else is FAST Cache, but also filesystem compression/dedupe, archiving and backup approaches). But that is NOT an end-to-end answer.
That’s what VCE FastPath Solution for View is all about…
It’s pretty darn simple: It is a “End User Computing Appliance”.
View 5 unleashed on the world… And we’re here to support it at EMC. In this post, there are interesting performance testing results of View 5 vs. View 4.6.
But the main things between View 5 and View 4.6 have little to do with storage. The major pieces in View 5 to me are: the enhancements to PCoIP on low bandwidth and high latency links (let’s be frank as we always should be – this was always a View exposure); integrated persona management in Premier Edition; 3D graphics support, and don’t underestimate the importance of better unified communication support. Big release for the gang behind View. And here I am in a dorky video (once again proving I’m willing to do any silly thing) to say so :-)
I’ve sent VNX kit to support the View team as they work on their VMware-home grown View 5 Reference guides. At the same time, the EMC Solutions Group have been busy at work in View land.
They have published a new Reference Architecture/Practioners Guide for VMware View on VNX. The new bits:
lots of NFS datastore best practices, and testing results. Over 2011, we’ve gotten about a 2x-3x improvement transactional NFS performance (measured by IO latency), which helps a lot in these use cases. BTW – all VNXes ship with that code now. Also a neat factoid – customers, you played a key role in that (see this blog post – it helped us optimize around the VMware use case a lot)
View 4.6 and Composer 2.6. While of course not the latest, it’s important to understand – these can only be done properly once you have RTM code. The View 5 work is underway (see early testing results below)
The testing showed the incredible effects of the VNX’s “Flash 1st” approach – using Flash as both a read/write cache (FAST Cache) and as tier (using FAST VP) in the combined FAST Suite. While awesome with View 4.6 – using View 5.0 we saw even faster boot times – about 50% faster.
BUT before I show the 4.6 and 5.0 comparisons, I want to highlight an interesting observation.
Do you like to customize? Do you build your own stuff? Are you an iPad kind of person? Huh?
On the day before EMC launched our View-based reference architectures, VCE launched their FastPath View solution. What’s the difference?
It’s the core difference between reference architectures and converged infrastructure.
The View based reference architectures from EMC help reduce risk, a little. They are infinitely flexible – after all, we did it with Cisco switches, but it doesn’t “break” the reference architecture if you choose something else - what is there to break after all? The more you diverge the less value they have, but there’s no line. They are infinitely flexible. Will it accelerate your deployment – sure. A moderate amount. But hey – you could have any servers, any network, and in fact, diverge from the storage config in the doc to scale up or down. It’s “Flexible”. It’s how most IT is acquired and deployed today. If you like this model, you’re like me, and can’t wait to read past the break where we have chart after chart comparing View 4.6 to View 5 under load.
The View FastPath solution from VCE represents a different consumption model. Sure, it’s a little more rigid – it needs to be within the parameters of a Vblock 300, and comes in 500, 1000, and 1500 user sizing. But, in exchange, you get extreme acceleration. You get a single SKU (including ALL the required software). You get provisioning tools that automatically configure to View and vSphere 5 best practices – in an automated fashion. You get support from one place. Converged infrastructure is less about the ingredients themselves, but rather getting down to business. It’s all about “Acceleration”. It’s how more and more IT is likely to be deployed in the future. If you’re not even REMOTELY interested in the View 4.6/View 5 performance results below, and just need the project to succeed – you’re the kind of person who wants FastPath. More on FastPath in a separate post.
Ok – back to nerdy View 4.6 to View 5 compare and contrast! Read on for more!
The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC. This is my blog, it is not an EMC blog.
Recent Comments