At it’s core – a simple idea: People are spending way too much time on the analytics lifecycle versus on actionable intelligence – and the Analytics Insight Module (AIM) makes it simple.
Like many of the blueprints we develop, AIM was born out of a lot of customer interaction.
What we found when we were talking to customers about their efforts at Digital Transformation was a pattern:
- Our customers deploying the Native Hybrid Cloud (the industries fastest, easiest way to get Pivotal Cloud Foundry on elastic, cost-effective hyper-converged infrastructure from Dell EMC) were looking for an easy way to tackle the topic of data, analytics and actionable insight at the same time that they were focused on building new Cloud Native applications.
- There were a common set of “mega” Hadoop ecosystem players (Cloudera, HortonWorks/Pivotal/ODP) that customers tended to want to be able to mix and match, but there were a set of common challenges above and beyond the Hadoop distro and various data tools themselves:
- “Data Curation” – the simple ability to search across many datasets, find patterns, quickly index, and ultimately ingest into the Data Catalog. We found that data scientists were spending 80% of their time just finding what data they HAVE versus getting insight from that data
- Building a “Data Catalog” – a simple single “stop” for a broad set of data sets, data types, and data analytic toolsets – without “pulling it all into a single monster data lake” (which never works)
- “Data Governance” – the ability to look at the lineage of data, apply strict governance (including data obfuscation) and security and access controls – amongst the customers, we found that 71% of the data scientists were using data sets which they shouldn’t have a access to, or minimally should be obfuscated.
- There was a very, VERY dynamic ecosystem (if you go to Strataconf, every year there are a ton of new players, and a few who are gone) – and customers wanted some trusted party to “pull it all together”.
Our customers wanted the Analytics piece to be as turnkey as we have done for the developer with NHC. They wanted the solution as focused on the data scientist as NHC is focused on the developer. They wanted the ability to immediately bind the new data to an application that can make it useful – thusly bringing together the cloud native application world and the cloud native data world, and making the insight actionable.
That’s what AIM is all about.
In the diagram below, all the pieces that are in the “grey/black” are what’s in the AIM solution, which runs on top of the Native Hybrid Cloud.
Let’s look at the solution “layers” – and we’ll start at the bottom.
- At the infrastructure layer, the purpose is to expose a simple, easy turnkey IaaS and PaaS. It runs on Hyper-Converged Infrastructure (VxRail) to make starting small and scaling up easy. On top of the HCI, we package the Native Hybrid Cloud Foundation – which is the easiest path to a Pivotal Cloud Foundry powered PaaS. In addition – there is the necessary infrastructure for a Data Lake – Dell EMC Isilon x340 nodes for (Unstructured file/HDFS) and Dell EMC ECS (Object/HDFS) – which like the compute component are scale-out architectures, so the solution can start small and scale up. There is the necessary ToR networking to connect the IaaS/PaaS to the Data Lake.
- At the engineered components layer, you have several components where we have taken software with key partners to fully integrate into frictionless workflows that are key for the data scientist. We carefully evaluated all the players in the marketplace that were solving these problems better than anyone else.
- AIM Data Curator - find, evaluating, and bringing data into Lake using:
- Attivio for Search and Data discovery, sampling, and evaluation.
- Zaloni for Data Ingestion and transformation onto the Data Lake
- some software and workflows the team built (patent pending).
- AIM Data Governor - policy based security and Data Lineage using:
- BlueTalon for policy driven, fine grained, attribute based access control
- Zaloni - keeps track of lineage (data source and multiple transformations) of data onto the Data Lake
- AIM Data and Analytics Catalog – this is developed by the Dell EMC platform and solutions team, and is software which makes viewing all these data assets, analytics tools and all the associated metadata simple and easy
- AIM Platform Manager – this is developed by the the Dell EMC platform and solutions team – and delivers a persona focused user experience for business value delivery from analytics. It creates a simple portal used to access to all aspects of platform - Self service Workspace (sandbox) provisioning, analytic tool wizard deployment, data set deployment and more.
- AIM Data Curator - find, evaluating, and bringing data into Lake using:
- At the open analytics layer (blue part of the diagram, and outside the scope of the solution support model), customers can bring in tools of their choice – including, but not limited to:
- Choice of Hadoop distro - Hortonworks or Cloudera
- Pivotal Big Data Suite with all it’s goodies
- MongoDB
- … and pretty well anything they want :-)
The whole thing – just like the Enterprise Hybrid Cloud and the Native Hybrid Cloud generally – is engineered, sustained, and supported as one. Yes, that means single-call support for the whole thing noted in grey/black in the diagram.
Now – I suspect that this will be gobbledygook for many readers. BUT – for some of you that scream:
- “YES – I know how long it takes my team to wrangle and transform data from multiple datasets!”
- “YES – I know that data curation and data governance are a trainwreck today!”
- “OMG – are you telling me that at the end, I can just share the new dataset and platform and just make it a simple bind for our developers?!”
… Then you are for whom we have built AIM :-)
If you aren’t sure – we have services that can help you navigate this data analytics domain. Just reach out to your Dell EMC team and probe. I guarantee it will be a good learning experience!
Congrats to the Analytics Insight Module team who has been working so hard and so long – your baby is born!
Chad
Is x340 a typo or a new isilon node type?
Posted by: Michael Duke | October 20, 2016 at 12:13 AM