I won’t “bury the lead” here. The big news is that ScaleIO is available for frictionless and free unlimited download and use.
I’ll say that again:
- The industries BEST,
- Most PERFORMANT,
- Most SCALABLE,
- Most BROADLY APPLICABLE,
- Most FEATURE RICH,
- OPEN SDS transactional storage stack…
… is available for free and frictionless unlimited download and use.
Wow.
Why “wow”?
ScaleIO is perhaps the most disruptive technology in the EMC arsenal in the sense that it is capable of out-performing, out-scaling everything out there (including a lot of our own other stuff), doing it with a totally different economic model (software, bring your own hardware – and grow elastically in small increments). That makes it disruptive in the sense that it is not only innovative, but disrupts the giant “base” of the storage ecosystem.
Further, while it’s not the only SDS data plane we have - ScaleIO can cannibalize our (and the industry’s) cash cows – the transactional block storage market that service broad general use cases. Conversely things like ECS Software (SDS stack that covers Object/HDFS) is for the explosion of new Platform 3 use cases and new unstructured use cases.
Hmmm… Maybe we can “protect” the business by isolating it to specific workloads… Let’s see if that works:
ScaleIO – does it support vSphere? Yup. Hyper-V, KVM, Openstack cinder integration? Yup. Linux general use (Redhat, SLES, CentOS, and soon Ubuntu LTS releases and CoreOS), Yup. Oracle and other databases? Yup – and with a level of performance that will blow your mind. You want fully hyper-converged? Yup. Want a 2 layer SDS (pool of storage/IOps on dense rackmount consumed by blades that scale independently)? Yup.
Making something so awesome, so potentially disruptive available freely and frictionlessly is a BOLD, and TERRIFYING move – but that’s what makes it awesome.
It’s also a reflection of a clear understanding for our emerging intellectual property: if the only way to get to know about it is to talk to salespeople, come to briefings, watch marketing and then sign a big cheque… well then customers will frankly tend to select inferior stacks that are available in a frictionless and easy way.
And hey – while this has risk for us as EMC – it’s not like we’re flying by the seat of our pants :-) We are THAT SURE that the core intellectual property in ScaleIO is awesome, that you will buy it.
Since (of course) the free version has no formal support (only best effort community support – just like Fedora/RHEL) – we’re confident you will fall in love with ScaleIO and want to buy it with full enterprise support. And BTW – don’t be stupid – I personally would never deploy something in production, any software, anything open source – without a clear support model :-)
Right now, I’ll be more black and white than I am normally (and bend/break Presales Manifesto principle #5: never go negative):
- ScaleIO smokes Ceph for transactional use cases in every dimension: ease of use, performance, latency, failure behaviors.
- ScaleIO crushes NDFS (storage stack in Nutanix) on scale-out performance at scale, overall scalability, latency, resilience.
- ScaleIO demolishes the performance of Oracle’s own Exadata results.
- What about for VMware use cases? It’s true that nothing is simpler and more integrated with vSphere than VSAN when it comes to vSphere-only use cases. VSAN limits (cluster sizes, workload support) and design center (data locality for example) make sense and aren’t really limits if every workload you have or ever will have is a VM on vSphere. But… ScaleIO’s performance and scaling is in a different universe. If you want an open SDS layer (inclusive of, but not limited to vSphere), one with rich QoS, rich partitioning/multi-tenancy functions, and something that scales to infinity and beyond… you want ScaleIO.
Don’t believe me?
Read on – I’ll share the data. I’ll even list out some of the weaknesses! Want benchmarks? We’ve got ‘em! Think a million IOps is hot? How about a 128 node cluster doing about 31,000,000 IOps (!) Think your Oracle 12c environment is fast? How about a 8-node UCS B-series config doing 20GBps of bandwidth (!) nearly 1M IOps, and a ~700 us IOWAIT time.
Not interested in reading on? Skeptical about bold performance and scaling claims? Fine :-) Download and try for yourself.
On May 29th, the ScaleIO 1.32 bits will be available for download here. The ScaleIO community is here. Have at it.
I **CAN’T WAIT** to see what people do (please share – good/bad/ugly!). A great EMC SE (Matt Cowger) in the past took the ScaleIO bits – and ran it on hundreds of AWS EC2 instances to see what it could do (he didn’t buy the “scales to infinity” claim). What will YOU do?
I’m generally not a fan of marketing videos, but this one does a good job of summarizing how awesomely facemelting ScaleIO really is, and does it to cool dramatic music :-) If you’re not a reader (and don’t want the detailed scoop) watch the video and download the bits on 5/29.
Conversely, if youre the type of person that wants to learn more, access the whitepapers, see some awesome demos, understand how it works (and weaknesses – nothing is perfect), and also get insight of what’s next (we talked about ScaleIO 2.0 today at EMC World) – read on!
At its core – the magic of ScaleIO is its simplicity. There’s a core design principle that says only things that are simple scale (complexity builds on complexity – and you end up in a bad place). ScaleIO is very simple.
Let me give examples of what I’m talking about. I’ll do it with a picture of how ScaleIO works (and for virtual geek followers – ScaleIO is a “Type III loosely coupled cluster” persistence model).
At its core – there are two simple components – the SDS (Server) and SDC (client).
The SDS consumes any local HDD, SDD (persistence/read/write cache), PCIe NAND devices (persistence/read/write cache) you give it, and also can leverage local host RAM for a read cache.
You can specify as little or as much of any of the server resources as you want. Heck, nodes don’t even need to be symmetrical in any way.
The SDC is a light weight client that communicates to any number of SDS nodes across an ethernet or IB network via a proprietary protocol (not iSCSI) designed to be super-lightweight.
There’s a 3rd component which is the management cluster (which can run on any nodes) – which is used to manage and resolve split-brain states of the cluster. The Management Cluster is NOT in the IO path. In other SDS scale-out architectures, there is sometimes a “centralized mapper” – that’s not good for scaling. ScaleIO has a completely distributed map model (each SDC knows where its data is).
ScaleIO can be deployed in a myraid of ways – with compute and storage together, not at all, or anything in between.
Further, I’ll do something that happens so rarely – enumeration of architectural strengths and weaknesses.
Detail |
Strength |
Weakness |
ScaleIO has No coupling between the client (SDC) and the server (SDS). |
|
|
ScaleIO has a very simple random data distribution model with no data locality. ScaleIO slices each volume into many medium sized slices and shotguns them and mirrors them across the whole cluster. |
|
|
ScaleIO has a very, very lightweight client. There is always a single hop between the SDC and SDS. |
|
|
ScaleIO is a loosely coupled SDS stack – and scales to hundreds/thousands of nodes. |
|
|
If you’re still with me, you should be thinking: “this thing sounds awesome”. It is. And you should be thinking “I don’t buy it…”. Don’t just listen to me – download and give it a shot :-)
Want more info? How about some benchmarks? Here’s some data from some of the EMC World breakout sessions.
The first question is: what performance can I expect from each SDS participating?
The answer depends of course on the host configuration.
Here, we showed a SDS node that was a UCS C240M3 (pretty modern solid server, but nothing exotic).
The “NULL” device eliminates the storage subsystem as a variable – and therefore gives you the upper limit for that particular host – in this case, around 200K reads/sec and 90 writes/sec.
Now – of course – the null device example above is the theoretical maximum for that particular host. The actual throughput will be a function of the IO devices themselves – whether that’s PCIe Flash devices, SSDs, or HDDs. An AFA has an IO stack that presumes that ALL IOs will land on NAND, and have designs that minimize write amplification, garbage collection, and carefully handle wear levelling. BTW – this is a quick test to see if you have an AFA or a IO stack designed to be a hybrid that masquerades as an AFA by simply configuring with all SSDs. ScaleIO can be configured with gobs of NAND (and will be a rocket) but it is not an AFA. In those cases, you need to select your NAND devices carefully (some are surprisingly poor at writes), and need to consider wear. ScaleIO will protect you of course (because all the data is distributed) – but this highlights why the answer isn’t always one way…
The next question is: what performance can I expect from a single SDC?
Like with the SDS, the answer depends on configuration, but using the same host, and the maximums, it shows that a single SDC can do about 260K reads/sec and writes/sec
In short – wicked fast.
Does it consume some CPU resources? Yup, but not a lot. ScaleIO SDC and SDS are pretty thin and lightweight (max CPU utilization at those peak values in the charts above is around 20%). Does it use network bandwidth? Sure – but less than you think, particularly at scale where the load for data distribution is completely scaled out horizontally (in those cases, the ToR switch selection becomes paramount – because the traffic load there is heavy).
Next question: is the scaling really linear? A: Yup. We knew there would be skepticism around this, so what better way to resolve than to test it out? Here’s a test where we went from 32-64-128 nodes (the same Cisco UCS C240M3 config).
That’s linearity.
For fun (and this really underscores how disruptive ScaleIO can really be – and why we’re nuts to put it out there for people to just use) we thought it would be fun to do a comparison against an Enterprise “Type II” (tightly coupled cluster) storage array (rich enterprise data services to be sure – would compete against a VMAX3) in terms of performance. That’s the unique power of the transactional scale-out SDS that is ScaleIO.
Now – because there would be skeptics on this topic, we did this under the careful gaze of ESG – who wrote up the whole thing here:
What about the other open SDS we see customers trying to deploy in support of some transactional workloads… like Ceph? Well – here’s what we found:
Now, Ceph is really first and foremost an object stack, so the more real compare would be with the ECS Software stack (and those do compete) – but we see a lot of customers trying to make Ceph work as a transactional storage model. When we would ask “why, when it’s so hard to get working and the performance is really, really bad?” – the answer tended to be “well, it’s easy to get, and openly available”.
Now ScaleIO is easy to get and openly available as well. Oh, and it costs less than Ceph Enterprise if you want to compare the TCO inclusive of support. Don’t buy the performance tests above? Why would you – I’m clearly biased! Download both and try for yourself.
What about benchmarks with classic transactional workloads like a TPC-C or TPC-H workload?
OK - we took Oracle 12c, and ran it on a hyper-converged configuration that looked like this:
Result? 20GB/s+ on full table scans. Almost 1M IOps. 690 us IOWAIT times. That’s FREAKING FAST. Try any other way to get that sort of performance at that cost, in that footprint.
Here’s a link to the detailed writeup:
The voice of customers is always the most powerful thing. Don’t listen to me. Listen to them instead (then download the software!).
Look – ScaleIO isn’t the answer for world peace :-)
- It doesn’t do NAS/Object. For Scale-Out NAS/HDFS – we lead the market with Isilon.
- It doesn’t have the linearity under all circumstance and rich data services people expect of AFAs. It doesn’t have the rich wear-leveling and write amplification avoidance schemes you see in AFAs (who, by definition have designed their whole IO stack presuming the ONLY persistence media is NAND). For Scale-out AFAs – we lead the market with XtremIO.
- It doesn’t do Object/HDFS. For Object/HDFS with rich geo dispersion, we have ECS appliances and ECS software for those that want to go down the SDS route.
- It doesn’t do classic Enterprise Data Services that many apps depend on, or support Mainframes, or iSeries. For that, we lead the market with VMAX3.
- It’s not a workhorse able to do a broad set of things in a small footprint. For that – we lead the market with VNX/VNXe.
- It doesn’t act as a Data Protection Storage target. For that – we lead the market it Data Domain.
- It doesn’t act as a persistence layer for in-memory databases. For that – we are innovating with DSSD.
… but boy oh boy, ScaleIO can be used for a LOT.
This is why EMC is a portfolio company – in persistence layers and in CI. There’s no one tool that does it all. The name of the game is to select the minimal set of persistence layers and CI architectures that you need. For the persistence layers, you can abstract and automate them all one way – using the ViPR controller.
What’s next for ScaleIO? Well – you giving it a shot on your own stuff starting on 5/29! :-) Oh – you mean like roadmap? Without getting too specific – over this year you’ll likely see the following:
- ScaleIO 2.0 will have the following:
- Integration with ESRS
- Deep end-to-end integrity mechanisms
- Recoverpoint Integration (for rich replication capabilities) – note that in vSphere environments, we’re finding customers use vSphere Replication or if they need more, Recoverpoint for Virtual Machines. Today, in Oracle environments, some customers are using ScaleIO as a smoking fast storage layer, and Oracle DataGuard for remote replication.
- … and more.
- More ScaleIO integration with Openstack. There’s currently great support for the SDC in Cinder, but it’s not in the core trunk and distribution (it comes from EMC). We think we may be best off to contribute this directly. More work needed though.
- More “P3” platform support. We’re seeing more and more Ubuntu and Mirantis OpenStack deployments – so beyond Cinder itself, integration with their management tools would be good. Furthermore – we’re seeing all sorts of interesting use cases emerging at customers with CoreOS for very thin/light containerized stacks.
What do you think about us making this free and frictionless?
What are you waiting for? Download ScaleIO and start playing! Please give us feedback and start to contribute on the community!!!
Hi Chad
Congrats on the relaunch of your product. Data locality with Nutanix still allows you to rebuild from the remaining nodes as the secondary copies are evenly redistributed with the exception of block awareness. In a cluster of 16 nodes 3 will not have data to rebuild. In a cluster of 64 nodes, 3 nodes couldn't be used to rebuild data. So i think that is fair in the overall rebuild times.
What is the use case for creating such a large failure domain ie 1,000 of nodes? Large failure domains also mean need more than 2 copies of data if you care about losing multiple nodes. How many copies of data were used in these performance tests? What was the CPU and RAM used by the client? What was the working set? Was it in cache? Seems like a lot of focus on performance for Tier 2 apps.
Thanks for the post.
DL
Posted by: Dwayne Lessner | May 06, 2015 at 02:23 PM
I am a bit confused Chad.
April 17, 2015 - SalesAdvantage.
On April 13, 2015, EMC announced ScaleIO Splitter for RecoverPoint general availability. This announcement provides an optimized data protection solution for hyperconvered server SAN infrastructure by protecting ScaleIO systems with virtual RecoverPoint Appliances (vRPAs). ScaleIO customers can have the confidence that an integrated EMC solution will provide disaster recovery and business continuity for their software-defined ScaleIO deployment
Posted by: Tony Watson | May 06, 2015 at 06:59 PM
I'd love to play with this, but; Release date: May 29, 2015
Posted by: Chet Walters | May 06, 2015 at 08:53 PM
Hi Chad,
Great stuff.
As the software is now "free" and you only pay for support does this mean that a fully supported solution will be much cheaper than before?
Also is the licence still based on raw capacity?
My only concern is the product is all about scale (performance and capacity), but in pretty much every other dimension cannot match a conventional array (i.e. double disk protection).
I did a comparison of VSAN and an array at http://blog.snsltd.co.uk/are-vmware-vsan-vvols-and-evorail-software-defined-storage-and-does-it-really-matter/ and I think it is fair to say ScaleIO has most of the same limitations as VSAN.
As always these things come down to use case and cost, but it is great to see that ScaleIO is now free to use.
Many thanks
Mark
Posted by: Mark Burgess | May 07, 2015 at 04:22 AM
Is this free download also unlimited in use? So suitable for production use cases or only for test purpose?
Most FEATURE RICH SDS product?? I must be missing something here,..
ScaleIO doesn't do:
File,
Object,
Dedupe,
Compression,
Native Replication,
SacleIO also does not offer:
Selectable copies for DP (2 only and 2 max),
Protection against multiple node failures,
Dynamic tiering,
Client side caching,
Hybrid disk pooling,
Distributed metadata,
Unlimited Heterogeneous OS support,
As I understand all ScaleIO does is pool a bunch of disks together and stripe the data over these disks, additional features are snapshots, QoS and limited RAM caching and that's it right?
So please elaborate on how this is as you say the Most FEATURE RICH SDS product.
Thank you
Posted by: BS | May 07, 2015 at 10:08 AM
"Maybe we can “protect” the business by isolating it to specific workloads"
I'm really excited about everything ScaleIO can do but right now EMC/VCE is "protecting" legacy business by repeatedly saying ScaleIO is for "tier 2" workloads under the guise of "feedback from customers". I just don't buy it. Which customers are saying "nah, instead of putting my tier 1 workloads on this system that is incredibly easy to manage and can perform and scale on ridiculous levels, I'll just put tier 1 on less performant and scalable solutions that are harder to manage".
Remove the "tier 2" legacy business protecting lingo from all your documentation and then I will be VERY excited.
Posted by: Nick | May 07, 2015 at 12:25 PM
Just deployed three PetaBytes using ScaleIO. We are very pleased by the ease of use, especially since availability of the loadable VIB in ESX. ScaleIO really can fill your 10Gb Ethernet interfaces with sub 1 ms latency iops. Being released from traditional silo complexity and scaling locks the questions are now quite simple "Is using 2 x 10Gbit interfaces enough for my workload or should we just add two more?" Our setup uses 'just 56 SDS servers' but can deliver way more then most of the client SDC servers can handle. We solved the Compression and DP wishes using ZPOOL's with mirror sets using two autonomous ScaleIO Clusters. ZFS adds the well known qualities of check-sums, LZ4 compression and real-time snapshots. All data stored is "tier 1" and looking at the Roadmap more reasons to stay on this Type III technology are added. To be fair there are some limits and weaknesses - specifically the mdm should be dynamically discovered when using the GUI and it could use some historical monitoring graphs. Otherwise the notification of Events is very clever because your drill down from Protection Domain -> Storage Pools -> SDS into the raw block devices and finally the SDC health.
Re to the post of BS: It does provide Protection against multiple node failure by using Fault Sets. We actually use these because our servers are grouped by four in each chassis. So every 2nd copy of 1MB gets directed to another chassis. Last weekend Power was lost to 4x60TB SDS and availability was not affected - the Rebuild and Rebalance algorithms intimidatingly solved the outage.
Posted by: Leroy van Logchem | May 07, 2015 at 06:07 PM
When will ScaleIO be able to do basic storage operations like migrating a LUN from one storage pool to another without having to do host based migrations? This is a feature that other EMC storage arrays have had for a long time and seems to be lacking with ScaleIO.
Posted by: CMD | May 07, 2015 at 07:14 PM
So as long as your are able to control which servers fail and within which fault set your are safe? Sounds like a great protection against multiple nose failure,.... Easier way would be to write 3 copies instead of 2 but that would increase the required capacity to much if you don't have compression and dedupe.
Sounds to me like it's time for a real feature rich distributed storage platform,...
Posted by: BS | May 09, 2015 at 04:56 AM
@Dwayne - thank you!
Respectfully, it's much more than a relaunch. It is a ScaleIO Appliance (VxRack), it's an update on the next release, and it's making it freely and frictionlessly available for download and use (with community vs. EMC support, of course).
On your first comment - I want to be clear - the very, very simple data distribution (and lookup) model of ScaleIO makes rebuilds extremely, extremely fast. I would encourage you to download and try it yourself (use AWS EC2 if you don't have sufficient hardware), and compare to anything else - perhaps the Nutanix community edition. It's not simply a question of "where is the data", but "how much work needs to be done to map/discover/copy" and "how much parallelism is there in the rebuild process".
On your second comment - I agree, a 1000 node cluster would be a crazy failure domain. In ScaleIO deployments, generally, the customers create protection domains within their clusters (failure domain limiting), as well as logical partitioning for tenants and their storage pools.
The beauty of making this stuff available (and I **believe** Nutanix has done something similar) - don't listen to me :-) Download, build a 64 node cluster, and try for yourself!
@Chet - agreed, I pushed hard (and everyone did) to make the bits available at the moment of the announcement at EMC World. They will be there by the end of the month, with no barrier. The delay was a update (v1.32) that further improved the ease-of-install over the current version (v1.31). We've been putting v1.32 through it's paces amongst the EMC SEs, with very good results. Please download at the end of the month, and please let me know what you find! (good/bad/ugly!)
@Mark - thanks for the feedback. The freely available version has no capacity, feature, or time limits. It comes with community-only support. If you need EMC Support, you license the software, and pay for maintenance and support.
I would encourage people to evaluate all their SDS options - there are material differences in behaviors, and economics (which vary customer to customer).
@BS - thanks for your comment!
Note that I am calling out transactional use cases. Transactional SDS stacks on top of object stacks perform very poorly for transactional use cases. Don't trust me? Fine - give it a shot.
Likewise, there are NAS SDS stacks, but none that have the performance, scale, and QoS behaviors that ScaleIO has. Most of the commercial NAS SDS stacks aren't scale-out NAS stacks either. Some that are (Gluster as an example) have extremely poor transactional behaviors. This doesn't mean they are BAD, but they are bad for VMs, bad for relational databases - and hence are less "disruptive" to the mass of the storage ecosystem.
It's interesting to note that right after posting this, A customer (see Leroy below your comment) who was deploying several PB of ScaleIO as a low-level hyper-transactional system - and was using a ZFS variation of top where they needed NAS and some of the other things you point out. The voice of a customer is, shall we say, more powerful than BS (IMO). Thanks for the comment!
Re your subsequent comment re "controlling server failure" - I suspect you might (?) be coming from a point of view of a distributed object storage model. These invariably have much richer geo-distribution models, and multiple copy writes - and many, many other attributes. I AGREE :-) That's why ECS Software exists - and competes in the market for SDS object stacks. What I have consistently noted is that putting transactional storage ON TOP of object stacks means you get the worst of both worlds rather than the best of both worlds. IMO (and I'm sure you would disagree) - there's no need.
Posted by: Chad Sakac | May 13, 2015 at 04:58 PM
Can you confirm if ScaleIO handles: Dedupe, Compression, Native Replication, Dynamic tiering?
Posted by: VL | May 14, 2015 at 05:42 PM
Amy idea if 2.0 will support Auto tiering using Storage and Performance disks in the same storage pool, but with intelligence to place the hot/cold data as required
Posted by: Michael | July 10, 2015 at 07:27 AM
"ScaleIO is available for frictionless and free unlimited download and use."
User guide states different: "Using ScaleIO in a production environment requires a license."
For me when someone says unlimited use. It means unlimited. Not limited to test clusters.
So whats the case? Since this blog doesn't represent EMC I guess that the user guide is correct.
Posted by: Elias | November 16, 2015 at 05:10 AM
ScaleIO does not do Dedupe, Compression, Replcation or Tiering.
Posted by: Chris | November 19, 2015 at 12:56 PM
All - thanks for the comments!
@VL @Michael @Chris - ScaleIO doesn't do Dedupe/Compression/Dyanmic tiering. It's not uncommon that people implement filesystems on top of ScaleIO to provide additional data services. Think of ScaleIO as doing one thing very, very well - a very horizontally scalable transactional engine - delivering every ounce of performance and latency out of a collection of industry-standard systems.
The whole architecture (for now :-) is what I think can be characterized as "performance optimized" vs. "capacity optimized" (or a balance of the two).
@Elias - it is available for frictionless and free unlimited use. The clause in the user guide is there to represent an important idea. The download is there with only community support - PERIOD. If you want EMC to take your support calls, you have to buy and license the product. I think that's a pretty common and reasonable position - don't you?
Posted by: Chad Sakac | November 23, 2015 at 09:50 AM