If it isn’t clear, I totally dig working at EMC. There is so much innovation, so much change, so much action, so much opportunity – it’s nuts. I try to stay as even handed as I can on the blog, but sometimes, the enthusiasm just can’t be contained :-)
With Pat Gelsinger leading the products team over the last year – the rate of innovation (both organic and inorganic) has increased, and with Jeremy Burton leading our marketing – the “go big or go home” way that we talk to the market, it makes it a lot of fun… and as a nerd… so many toys to play with!
Today EMC officially makes VFCache generally available. It’s been GA internally since mid-January, and has gone through a lot of beta time – which is why there are customers at launch. We’ve been talking about this as “Project Lightning” since EMC World, and just like we did with FAST – we show, we listen, and then it arrives.
With VFCache now on the market – I think that EMC is the only vendor with an “end to end” solution that leverages the disruptive positive effect that flash can have within an enterprise: from server PCIe based read caches, server PCIe based non-volatile storage, through to flash as shared array cache, and shared non-volatile storage (with automated tiering to boot).
So – what’s the story? What’s the hardware? What’s the software? Where are the v1.0 holes (they always exist!)? What’s the roadmap? What about the startups in this space? What the heck is Project Thunder? This will be a long post (even by my verbose standards), which I’ll break into “Why”, “What”, “What’s next” sections… 'Read on, dear reader, read on!
Why?
It starts by REALLY getting, REALLY grokking the disruption that is flash on the storage business. Also remember that the storage business doesn’t translate into the “storage array vendor” business – but anyone who is in the “data” business.
Think of this simple fact – for the last 2 decades, CPU power (and the cost of memory) has gotten 100x better ever 10 years, and disk performance (not capacity) has been flat.
Storage cost/power/density per GB has improved a lot, but the cost per IOps has been flat. This is why technologies like dense storage, Thin, Dedupe are very important, but for workloads (primary, secondary, archive – whatever) where the primary measure is “GB” (capacity). I’ve been on this soap-box for a while, and always express “efficiency” as influenced by the technologies down these three orthogonal (fancy way of say “independent” variables) vectors. Some internally to EMC have started calling it the “Chad Flux Capacitor” :-)
It really aggravates me when any of us vendors makes it sound like one thing is the one and only thing that matters, and that one thing only. It’s usually strangely correlated with the thing they do :-)
But – when it comes to the question of performance – both in absolute terms (maximum!) and in efficiency (least $/Watts/sq.ft.!) the biggest core disruptor is solid state storage. Solid state (which today is synonymous with Flash, but that isn’t intrinsic) is commonly used today in 4 distinct architectural use cases:
- in non-shared non-volatile use cases (“DAS in servers” – often via a PCIe device, but also via SSDs)
- in shared non-volatile use cases (in arrays) – whether it’s autotiered or not – though to not autotier is, well, silly...
- in non-shared caches use cases (server-side PCIe devices)
- in shared “mega caches” behind things that the servers see as “LUNs/Filesystems” (often in arrays in some way).
There’s an emerging 5th use case too, but we’ll come back to that.
EMC has been leading the way with bringing flash to the mainstream for the last 4 years – so is nothing new to us. Furthermore, it was clear even back 4-5 years ago that flash was going to change everything, so we’ve been investing heavily in all SORTS of ways. How much? I want to put out a couple interesting facts:
- In 2011 alone, EMC shipped more than 24PB of flash.
- In 2011, more than 1.3 Exabytes of data was under management by EMC FAST. Wow.
- Over the last few years, EMC has invested a TON of VC funding into Flash investments.
That last one is interesting. We do a ton of organic innovation, but we also invest in a ton of things (no one has a monopoly on innovation, and during periods of disruptive technology – innovation accelerates and happens all over the place). EMC Ventures is very active. This is something that has been very eye-opening as I’ve seen more and more of it – fascinating to see EMC invest in multiple ways, in multiple vectors. While our organic innovation – culminating in shipping products with EMC logos is most apparent, our non-public investments are a FASCINATING part of EMC’s business. These flash investments are doing very well in every vector: as a hedge; as a vehicle to get more innovation leverage; and also as a straight up investment.
…So, if you look at the list of use cases being transformed by flash listed above, EMC is the leader in 2:
- in non-volatile flash use cases with FAST functionality in all our primary storage targets (EMC VMAX, VNX, Isilon all offer automated tiering)
- Where flash can act well as a large shared “mega cache” in the Array, EMC leads there too (EMC VNX). It’s harder to leverage in use cases in the more enterprise-class array use cases (think EMC VMAX), where the memory/cache model needs to be symmetrical – and be able to withstand multiple component failure without getting material performance impacts. While these classes of arrays are dominated by shared DRAM cache models – IMO you can expect them to (over time) start using cache hierarchies that will use DRAM and Flash together.
But, EMC was not active in the other two use cases, and there is a material fact… server side PCIe flash can deliver a latency and IOps level of performance that is huge – about 4000x more IOPs for a given amount of GB when compared with magentic media, and about 20x more IOPs (and about 1000x lower latency) than SSDs.
Think – if someone offered you an upgrade to 1terabit Ethernet – ergo Ethernet that operated 1000 times faster than the commonly deployed gigabit ethernet today – how much of a technology disruptor would that be? That’s the effect Flash in all it’s forms is having on the storage industry.
So… If PCIe based flash offers the highest IOps and lowest latencies of all the architectural models, why isn’t it used universally?
The answers are simple:
- When used as a cache/extension of memory – they are volatile, which means you can only use them for reads unless you are willing to have some risk of data loss (ok in some use cases, but not others).
- When used as DAS – they can help with some use cases, but this tends to restrict use cases (ergo non-clustered use cases) – and the failure domain is the same as the server itself.
- Server-based PCIe flash cards are “captive” – meaning that the resources (both capability and capital cost) is “trapped” in the host – sometimes this is a great trade-off, but not always.
So – this is what we decided to tackle.
What is VFCache?
EMC Project Lightning was the code-name for VFCache – EMC’s initiative in the server-side PCIe-based flash side of this technology revolution. What does VFCache stand for? I think of it as “Virtual” (all things EMC) “Flash” Cache"… or “Very” “Fast” Cache :-)
This was previewed at EMC World in May of 2011 (showed the hardware), and then again at VMworld in Sept/Oct (showed the vCenter plugin) – and every time we discussed it – people tend to focus on the hardware. Yes, there is hardware in VFCache, but the project has always been around software, and an integrated view of server-side flash as a part of an integrated flash strategy, and an integrated view on storage.
What is the hardware? Our main launch partner is Micron – whose PCIe x8 hardware is state of the art, delivering roughly double the read bandwidth and total read IOps of the leading competitor, and a 300GB capacity.
But – VFCache is not primarily about the hardware. In fact, at launch, we have multiple hardware partners – and perhaps in the future, even “bring your own hardware” models are contemplated. This makes all the sense in the world for the following reasons.
- the importance multiple sources of components has been highlighted by the Thailand flooding – where vendors with more volume, broader supply chains were less affected vs. others – and therefore impact their customers less.
- this is an area of crazy fast iteration and innovation. LSI Logic, Micron, Samsung, OCZ and Intel (Ramsdale) are all iterating on their hardware so fast that the game of leap-frog will play out constantly in the near term.
- Think of the array business. The fact of the matter is that people buy EMC (and EMC competitors) arrays which are REALLY software which provides value on commodity componentry (disk drives from Western Digital, Seagate, Samsung). Our view is that the same thing will happen in this PCIe-based Flash industry too.
- Some use cases require something specific. For example, at the moment of GA, LSI’s vSphere core vmkernel driver stack is one of the most mature, and we are leveraging for customers who are deploying on vSphere immediately. It is important to note that is transient.
The VFCache project is fundamentally about host-side software – that in the first iteration is all about being an extremely robust, extremely efficient bit of logic in the IO stack of the host OS. This, BTW, is an interesting back story. The project had early starts within the EMC PowerPath team (note, VFCache use is NOT dependent on using PowerPath).
Consider – EMC has an unbelievably deep heritage here – more than a decade at a ton of customers in the IO path on the host side. As anyone in IT knows, kernel mode filter drivers are things that have to be very mature. Have a bad day there – and it’s a BAD DAY. Furthermore – when it comes to caching, think of the uncountable IO’s prioritized, cached, and served by EMC over the last few decades. All this know-how and IP applied here. It also translates to broad “day 1” server host support:
VERSION 1 NOTE: on Day 1 there isn’t blade system support. This is commonly asked about re: UCS B-series systems and Vblocks. Mezzanine form factor VFCache cards are coming soon.
Between the efficient, very multi-threaded software and the best-of-breed hardware our view is that we can offer lower latency, more throughput, and dramatically lower CPU utilization that other solutions on the market.
The code is designed to operate either as a read cache (primary use case), or as DAS (for things like log files) – and can also operate in a “split mode”
In the VMware use case (and this is something we demoed at VMworld), the next major EMC Virtual Storage Integrator (VSI) plugin update shows VM-level statistics, and makes enabling and disabling VFCache on a VM-by-VM basis simpler and easier.
VERSION 1 NOTE: as of GA, enabling use of VFCache on a VM links it to a resource that is not shared, and “looks” like DAS at the VMkernel level. This means it will disable any use of vmotion. This means that it’s VMware use case is, IMO really focused on the most performance-centric VM use cases only – where the loss of vmotion flexibility is an acceptable trade-off. Read on in the “what’s next section” for more info.
Architecturally, VFCache v1.0 when operating as a cache is always a write-through cache – which has an huge incremental benefit when coupled with the array-side effects of EMC’s technologies (which can assist in the write latency).
You can expect to see even more leverage between VFCache and EMC’s array technologies in future (see below for “what’s next”)
So – how does it perform in the real world? Well…
First, in the words of a customer:
“Putting Flash in the server was an easy decision for us. However, we need more than just crazy fast I/O. Our data is the lifeblood of our operations. That is why we chose EMC VFCache for our Oracle environment, which complements our Symmetrix VMAX storage running FAST VP. With EMC we get crazy fast I/O coupled with EMC’s trusted networked storage. There is no other solution in the market today that comes close to offering VFCache’s comprehensive performance, intelligence, and protection.” -Frank W. Smith, Senior IT Infrastructure Manager at PPG Industries
Second, from the performance engineering teams, there are several great whitepapers showing testing results (will update as it posts – which will be shortly). For Oracle, we showed that VFCache delivered 60% better response times, and 310% better overall transactions per minute.
What about other apps? We had our friends at Cisco send us a honkin’ UCS C-series host, put a VNX5300 behind it, and used a 750GB database. Adding VFcache to the host delivered 87% better response times, and 360% better overall app performance (TPM).
What’s next?
Like any v1.0 product, the engineering teams have a laser focus on “hit the main use case” – which in this case is as an extremely high performance, very robust read cache implementation (with optional DAS and split-card use cases), using write-through approaches (which can be coupled with EMC arrays to help with the write performance).
But – there’s a very robust roadmap – here are some of the highlights we’re working on:
- De-Duped Cache (applying core Avamar, Data Domain, RP and other EMC IP to get more leverage out of the server-side flash resource)
- Enhanced Array Integration: Hinting, Tagging, Pre-Fetch (this is a whole whackload of stuff that is all rooted in the fact that there is code in the IO path that can be leveraged: apps hinting to the VFCache on content they particularly want; the VFcache tagging content it would like to pin in array cache to accelerate re-hydration; write-back cache models; and even doing more aggressive pre-fetch models)
- Distributed Cache Coherency For Active-Active Clustered Environments (imagine using pieces of VPLEX IP which provides cache coherency across VPLEX nodes to add cache-coherency across multiple hosts – this would be important for Oracle RAC, VMware use cases)
- VMAX & VNX Management Integration (Expect to see host-level VFCache information in Unisphere and SMC soon!)
- Larger Capacities (Expect to see 1TB and more configs)
- MLC Flash (While MLC is still not the right choice for the broad use cases – there’s no doubt we’re keeping VERY close to this – remember that $1B I mentioned earlier?)
- New Form Factors (Mezzanine card form factors front and center).
So – what’s missing?
Even when we deliver cache coherency across the VFCache cards in multiple hosts (important for clustered use cases), they still won’t be a shared, pooled resource – which is how all IT wants to be increasingly. The downsides of server-side PCIe-based flash cache designs is that they are completely captive (the flip-side of EXTREME locality of data). If you wanted to deploy more cache into a host, you would need to… well… pop open the host, and slam in a PCIe based flash card.
EMC’s answer is the next “secret” project to be publicly outed: Project Thunder.
Imagine a 2U/4U appliance with….
- a shared, scale-out deployment model
- could scale to many terabytes of PCIe flash
- supporting 10s of GB/sec of bandwidth
- delivering millions of real-world IOPs
- low latency @ load
- optimized RDMA host data path
- deep VFCache integration
As you can see – there’s always more fun coming from the walls of EMC. We’re never standing still – because our customers demand we don’t sit still.
In the end – in my opinion (and clearly I’m biased) – I think that all storage vendors will need to be able to offer a comprehensive set of answers when it comes to the disruption that is Flash – from the array, through to networked server flash, through to server PCIe Flash – but do it in a way that gets integration and leverage from end-to-end. Some may choose different routes, but all will need to adapt to the continuing changes in the marketplace.
Would LOVE your feedback… Is this something you think is right/wrong? Is it something you see a need for in your shop? What do you think others will say (will be interesting) and do (even more interesting) in response to this?
Hi Chad,
When do you expect vMotion and shared VMFS support?
Also I was hoping that there would be a use case for VMware View - can you comment?
From what I have seen of pricing combined with the 1.0 limitations it looks like it will be very niche in the short-term.
Hopefully as the year progresses most of the limitations will be removed and we will see low cost MLC based cards.
I was hoping that every VNX level customer would want VFCache, but clearly initially this will not be the case.
Just my thoughts.
Best regards
Mark
Posted by: Mark Burgess | February 06, 2012 at 04:09 PM
Great post Chad!
Posted by: Keith | February 06, 2012 at 08:31 PM
I find it curiously ironic that EMC chose Lightning and Thunder as project names for these products. Lightning and Thunder were the names for the Hitachi high-end and mid-range modular products that transformed Hitachi Data Systems to a storage company and an EMC competitor, as seen in the press releases below:
http://www.hds.com/corporate/press-analyst-center/press-releases/2000/gl000626c.html
http://www.hds.com/corporate/press-analyst-center/press-releases/2001/gl010123.html
I know this history well, as I named the Hitachi products.
FYI: I left HDS in 2006. Prior to that I worked at Data General's AViiON and CLARiiON divisions.
Posted by: Carlos Soares | February 07, 2012 at 12:32 PM
Using this in conjunction with FAST is going to be mind blowing...
Posted by: Gary | February 09, 2012 at 02:20 PM
I think that VFCache it's an amazing project and most of the customers will be interested.
But i also agree with Mark. No clusters, no DRS, no automatic vmotions are strict limitations for the VFCache first release.
Let's see.
Umberto.
Posted by: Umberto | February 15, 2012 at 08:52 AM
Hi Chad - great blog. I get to talk to a lot of experienced storage people at some quite large customers and something curious is coming out of all this. The products vary a bit so I will just focus on the VNX. If you count FAST Cache as a storage tier and then add SSD, SAS and NL-SAS we have 4 tiers of storage here. Factor in 10K vs. 15K drives and you could call that five tiers. With VFCache we now have six. Up until only a couple of years ago storage was FC or SATA. The quick applications went on the FC and everything else wound up on the SATA. Many of the people to whom I speak are at a loss as to how to manage the new model and I've been asked more than once "what is everybody else doing?". The technology is accelerating away from the technologists and customer techies are really struggling to keep up. To help we are working on improving our own and our partners' consultative, application-focused skills and use of tools to get levels of application information that were irrelevant before. We live in interesting times.
Posted by: Dennis Ryan | February 16, 2012 at 05:03 AM
Wow. Vfcache is a flawed and weak design. Its an SSD on a PCIe card. Rubbish. Lots of data loss exposure. No wonder best practice is to RAID them like hard drives because that's what this product emulates. While your implementation use cases are cool, they are supported by bad technology. VMWare is best of breed, pity emc storage isn't anymore.
Posted by: Mike Galford | March 29, 2012 at 09:18 AM
VFCache provides a lot of read-only cache capacity. However, for the database use case, the same could be achieved with a lot of RAM. The latest server hardware supports up to 4TB of RAM that could be dedicated for read-only caching (via database memory partitioning). As we know RAM is quite a bit faster than Flash PCIe based cache. Moreover, the price point per GB for RAM and VFCache is on par, with RAM actually being cheaper in the 1-2TB server capacity range. What we are struggling with is to understand the VFCache v1.0 business case? If I can get the same amount of RAM for the same or lower price than Flash PCIe-based cache (and RAM being so much faster with lower latency), what is the main use case for VFCache? I would really appreciate your input.
Posted by: Arkadiusz | May 14, 2012 at 09:30 AM
Great post Chad, really like the way EMC's flash story is panning out. Tons of options and lots of power when used in combination = customer win! great things coming out of EMC lately both internal dev, and aquisition wise.
Joe
Posted by: Joe Onisick | June 05, 2012 at 10:08 PM