I remember when we first started to really look at the Big Data Analytics space 2 years ago – one of the first realizations was a simple, clear, and blunt observations: If we think we can provide value and compete in the data analytics space with a storage array, whether it be SAN, DAS, NAS – we’d be smoking something.
Don’t get me wrong – there’s a place for external storage, but man, it’s a tiny niche (for “not so big big data”).
In these markets – the software stack was and is the most important thing, and to offer it as pure software and as appliances. Yes, people wanted it to be packaged in a variety of forms – some as integrated software/hardware appliances (think the GreenPlum appliance, Netezza, Exadata), but sometimes they wanted it as software and run it on their own hardware.
Hence – the acquisition of Greenplum which is the leader in the software-based scale-out shared nothing structured data analytics space, and then shortly afterwards, the first EMC Data Computing Division appliance – the GreenPlum Appliance.
It’s also important to note that the Greenplum acquistion not only brought critical technology, but as importantly – a ton of new DNA into the EMC family. That DNA around Big Data Analytics is captured in the brains and the hearts of the Greenplum folks.
In Pat Gelsinger’s keynote – Matt from Greenplum showed the power that can be delivered by the model – doing advanced queries on massive datasets (live) – in seconds, by leveraging hundreds of Intel Xeon CPUs, terabytes of RAM, a ton of SSDs – all in a shared nothing pile of servers running Greenplum.
But – right adjacent to the structured big-data use cases are the semi-structured use cases where Hadoop, and open-source software framework designed analytics with that type of data dominate.
So – with the announcement of GreenPlum Hadoop Enterprise and Community edition – EMC is now a huge player in this space.
Don’t take my word for it.
It’s a big deal. It also makes EMC’s Big Data Analytics stack more complete (I would argue one of the most complete and advanced in the industry)– and one appliance can accelerate analytics of both structured and semi-structured data.
Furthermore – the EMC Hadoop distribution can deliver 2-5x the performance of what customers are doing today with Hadoop, and also adds a lot of other things that Enterprises are looking for (snapshots, replication and a lot more on both the technology and packaging/support front).
Within the community, response I’ve seen so far seems very positive. A big part of participating in anything that at it’s core is open source is about partnering, and also contributing source code back into that community. It’s important that’s what the Community Editions of BOTH the Greenplum database and the Greenplum HD are there for.
Exciting stuff – and it’s amazing to see (and be a small part of) the ongoing evolution of EMC – I love how we continue to adapt, expand, change, integrate as the market changes. If all we did was DAS/SAN/NAS, wow – it would be tough to have a serious conversation about the world of Big Data.

I really enjoy reading your blog, and am routing big-time for EMC in the big data analytics space. From your blog posts during EMC World, everything is lining up beautifully for EMC, and I think I understand your long-term strategy -- sheer brilliance!
Posted by: Michelle Agul | May 13, 2011 at 08:44 PM
What would your BI stack look like with this baby at the core?
Posted by: Twindude | May 15, 2011 at 03:07 PM