[UPDATE: 5/7/2016 – 9:30am ET] – general updates and corrections.
One thing I really love about EMC is everywhere you look, there’s secret projects, innovative efforts that are forked off, efforts to disrupt ourselves, investments and acquisitions always in flight.
I take pride in trying to keep a mental map of everything going on, and staying as dialed in as I can be. But sometimes, I totally miss something.
That’s not a bummer, because when that inevitable moment of discovery comes – instead of discovering something that is half-baked, I discover something that is nearly done. update: by “done” what I meant here is that unlike a “from a zero start” project, or an investment in a new startup, this is something that has dates, targets, and has a target to “first customers” that is measured in months, not years. Note that later in the article the date we’re putting out is “2017” for release.
That’s the case with Project Nitro. We were in a senior staff meeting in April when WHAMMO, I see this thing for the first time (not much earlier than the world at large found out!) – now THAT’S a pleasant surprise!
First – understand the use case… There are several extreme-performance NAS markets – Electronic Design Automation (EDA) is one, media/CGI is another, HPC is another, and some analytics use cases is another.
When I say “extreme” performance – it’s a case of “as much as possible please”.
Project Nitro aims to tackle that – and we think it will smoke anything on the market (including emerging players efforts that are still NOT on the market).
This is really, really facemelting.
Project Nitro is several things coming together:
- A new bladed architecture for Isilon (you won’t see it without the faceplates or turned around). This designed for extreme flash density. How much? Think 200TB with the 3.2TB SSDs and 900TB with the 15TB SSDs… in 4U nodes. Like everything else EMC does – we are pushing ourselves to be right on the edge of NAND/SSD/NVMe (and NGNVM of several forms) – and Nitro is planning to be right on the edge. TONS of flash. And.. TONS of bandwidth - each node would have 8 x 40GbE interfaces front-end, and 8 x 40GbE back-end interfaces. UPDATE: many people have scratched their heads at this – note that no where have I explicitly stated the relationships between blades and nodes (how many blades/node). That’s intentional. Lots of time before GA, and through that time, more will become evident. It’s not uncommon for some details to be left blank (sometimes to keep cards close to one’s chest, sometimes because there’s still variations likely in the plan). With Project Nitro we’re keeping some blade details back. In similar pre-GA statements from EMC and from almost everyone, there are some details kept back.
- Re-architected OneFS stack focused on all-flash use cases. How fast? Think 15GBps per node, 250,000 IOps per node. Much lower latency than what people expect from OneFS. What are we talking about latency wise? Think about a 10x improvement relative to OneFS today (which is usually in the 5-10ms band).
Of course, if you compare the density and performance stats to something insane like DSSD, they seem a little pedestrian. Of course – that’s missing one important point. Nitro will have all of the Scale-Out NAS awesomeness that is in Isilon. So… If you have 400 of these bad boys, you get something like this…
We’ve been engaging with customers in those vertical markets – and they are STOKED.
What’s particularly important is that many of them use Isilon today, and love it… but would really love an “extreme performance pool” they could snap in – while still enjoying all the things they love about Isilon.
Let me repeat that:
- Mature scale-out NAS stack (takes YEARS to make scale-out NAS stacks solid) – and not scaling out to 10’s of nodes – but scaling out to hundreds of nodes.
- All the features like rich snapshots, SyncIQ, Cloud Pools, and more…
- Multi-protocol (SMB 3.x, NFS v3/v4, HDFS, Object interfaces) transparently – not just NFS.
…. All things they LOVE, now with facemelting, record setting performance for these use cases.
For fun, we looked at the stats relative to what’s likely going to be positioned as Nitro’s primary competition (not too many flash-optimized, bladed NAS offers targeted at EDA, Media, HPC :-)
Now – neither of these two are generally available yet – so time will tell. We’re aiming for Nitro to be generally and broadly available in 2017. If you want more detail sooner – reach out to your EMC Isilon Specialist!
Chad,
please help me out w the math, if 400 nodes produce 1.5 TB/s it means each contributes only 3.75 GB/s, how does it map to the 15 GB/s number ?
each 40Gb/s link is 5 GB/s, so assuming 15GB/s per box it means the 8/16 wires are significantly over-subscribed ? and if its 3.75 GB/s that is even more over-subscribed. BTW for a box that would GA in 2017 i suggest looking at 50/100GbE as an option
re the IOPs, the S210 model (out in 2014) does ~100K IOPs, so 250K IOPs is sort of aligned with Moore law, i believe new All-Flash solutions should aim at >1M IOPs, and I know its possible even for File & Object (if you have the right architecture)
Yaron
http://SDSBlog.com
Posted by: yaron haviv | May 05, 2016 at 12:40 PM
As I walked into the keynote yesterday, I said to a colleague "you know, the only primary storage product that isn't all-flash is Isilon, I wonder when that will happen?" Then 14 minutes later it did...
Posted by: David Holmes | May 05, 2016 at 06:30 PM
After reading the complete BS of a premise of this post - Chad Sakac being unaware of a "new bladed Isilon architecture" and a "complete OneFS re-architecture for Flash", I couldn't read on and take the rest of this post seriously.
Things sure have gone down the toilet at EMC if the President of the technical Presales organisation and the President of VCE is unaware of this "facemelting" new project in the works. And if you were, then why say you weren't??
Sad.
Posted by: TheDude | May 05, 2016 at 10:36 PM
@yaron - thanks for the question! The final specifications won't be landed until we GA. From my understanding, some of the limits are indeed associated with the switching fabric. And yes, you can count on the fact that we will be evaluating the state of the state with 100GbE at that time. Also, I would consider the node performance specs also to be likely to go up. The main point here isn't node IOps (if all you need is IOps, you likely would use a block target)... the main point is the very low latency (and the high file ops/s).
@David - glad you dug it!
@TheDude - ah, internet trolls :-) Hiding in anonymity and casting aspersions. The fact that I didn't know about a secret project inside the company? That's not BS - it's a fact. Personally, if you were in my shoes, you'd know that it sounds MORE crazy to claim that I track all the stuff that happens inside EMC, VMware, and Pivotal - there's a LOT. You know what I think is sad? Your Debbie-downer attitude. Have a GREAT day!
Posted by: Chad Sakac | May 06, 2016 at 09:37 AM
"...instead of discovering something that is half-baked, I discover something that is nearly done."
"(including emerging players efforts that are still NOT on the market)."
"Now – neither of these two are generally available yet...We’re aiming for... 2017."
Someone proof read this, right?
Hilarious.
Should be writing for comedy or a Presidential candidate.
Discs aren't the only things still spinning at EMC.
Posted by: Peter | May 06, 2016 at 02:12 PM
Chad,
thanks for the clarification, as a Hardware guy you would know producing 12GB/s on a distributed storage node requires at least 40 PCIe lanes, sort of max out Intel dual socket capabilities.
Today with winds shifting to distributed cloud-native architectures, BigData, IoT, .. block is somewhat becoming irrelevant, and App developers need Files, Objects, and NoSQL. so we have to provide both faster & higher level storage, i can see a bunch of IoT scenarios that easily drive Millions of IOPs on small files/objects. so we cant relay on Block for the performance part. Same for Latency.
it is possible to drive Millions of IOPs and bare-metal latency for the upper layer abstractions, you do some of it in DSSD, but that is not something you can do with a 10 year old stack, require a complete redesign.
can read on the fundamental sw/hw principles to get there in:
http://sdsblog.com/wanted-a-faster-storage-stack/
Yaron
Posted by: yaron haviv | May 07, 2016 at 03:19 AM
Yaron - I read your blog, thanks for adding to the dialog. I've also updated the post (intentionally, we're keeping some of the node/blade relationship and blade details to ourselves).
Personally, I think you might (?) underestimate the difficulty in the upper levels of the stack, and your observation in your post that no one seems to have cracked it all, might be rooted in "it's harder than you think". For example, a team of smart folks have been cranking on it in DSSD (which is after all a complete redesign, and a native Object storage model) - but note that it doesn't have the data services, or scale-out models that many of these HPC/EDA use cases want.
Suggestion - you have a passion for this. Consider giving it a shot! If you do succeed to do this in a software stack, you'll make a fortune and remake the this part of the world. If you gather together an engineering team, build a prototype, I'm happy to help connect you with the VC community (at least those I know).
Posted by: Chad Sakac | May 07, 2016 at 10:22 AM
Thanks for the suggestion, keep an eye on what iguaz.io will announce in the not too distinct future, you will be amazed :)
indeed this is far from trivial and require an exceptional team ...
Yaron
Posted by: yaron haviv | May 08, 2016 at 12:58 PM
@Chad,
hey Chad, long time, no talk! (it's Dave Graham from back in the Atmos days...lol). Nice article.
@Yaron,
so, as a hardware guy, I think you're missing a few key pieces.
a.) PLX PCIe switches. 40 Lanes of PCIe Gen3 from Intel Xeon E5s can easily be split into n-number of locally switched or orthogonal lanes within the complex of a 4RU box. As someone who was privvy to some internal EMC architectures that are more recently come to market, this makes complete and logical sense. At some point, however, yes, you're contextually over-subscribed on your internal/external links for bandwidth.
b.) One thing that's always been curious to me (and is really a relic of the days when SM and Mellanox ruled the inner fabric bus of Isilon) is the relative pervasive use of IB as a transport bus given the older 8/10b encoding schema. moving to EDR/FDR obviously changes the encoding overhead (iirc 64/66b) and allows for better utilization of bandwidth but with no specific tunneling offload present in Xeons...oy. this again points to an ASIC-based approach to handle front end connectivity (more than likely Mellanox though I know EMC has been rather loathe to use their technology in a directly correlated box).
c.) Intel Omniswitch/Omnifabric. This makes the most sense as an INNER node fabric technology. EMC has significant vested interest in Intel (and vice-versa) and their technology curves and with something that keeps a proprietary edge (though can be commodity-built as evidenced by SM building Omnifabric switches) while looking like 100Gbps EDR Infiniband but combining the simplicity of a switched transport layer like Ethernet...well, you get where I'm going. Omnifabric has the capability of offload (e.g. through using a many-core solution like Knights Landing) or through DMA-style access to QPI/ring-bus technologies.
I like the sound of Nitro and it'd be an interesting application of some "tools" that are lying around in the hardware/software space. Heck, if OneFS got more GPFS-like, that wouldn't be bad either...rumour has it, Lustre is getting there. :P
cheers,
D
Posted by: Dave Graham | May 09, 2016 at 07:24 AM
Dave,
i consider myself a software guy who knows a lot about HW :)
for the 12GB math you need 40 Lanes w/o over subscription just for the networking side 16x for front and 24x for back assuming they use erasure coding (not that i think its realistic to expect 12GB/s Erasure w/o HW offload, nor do i think they know how to process TCP/IP at those rates) and 32x for 2 replica, if you have SATA/NVMe that will be more lanes and you need at least 16x of those which may be split using PLX.
Knowing Intel Omnipath schedule and EMC test cycles (and OneFS clustering stack), its not practical to expect EMC GA in 2017 for an Omnipath based product.
so my guess is Nitro has 2-4 Blades per node, which help sort-out the math
BTW you can use Mellanox 50/100GbE NICs would make a better choice than IB IMO, and the new model has a line rate Erasure code engine i helped to design :)
Yaron
Posted by: yaron haviv | May 09, 2016 at 03:37 PM