Embedded in the Monday VxRack tech preview and the Wednesday Project Caspian tech preview – there was an interesting tidbit: OnRack.
What the heck is OnRack? Answer: a thin, low-level hardware abstraction layer (HAL) for all sorts of industry standard hardware. It’s not a product yet, and something that we’ve been working on organically and inorganically.
This is something you don’t think about unless you are in the hardware and software business, or operating a large at scale datacenter (certainly if you’re a hyper-scale player).
- Q: How easy is it to remotely boot a server? A: If your servers are homogenous = easy. A: If you are heterogeneous or at hyper-scale = ridiculously hard. Why? Because things like IPMI are so variant across the industry, and so linked to the proprietary tools from one vendor or another… And where they exist, those tools are not designed for hyper-scale.
- Q: How easy would it be to update the firmware on 1000 hosts? A: freakin really, really hard.
- Q: How easy would it be to blam down a low-level persona (think vSphere + VSAN) onto 1000 hosts, the supporting network, and COTS disk enclosures? A: freaking really, really hard. It would take a lot of engineering.
- Q: How easy would it be to add something like “low level fault detection/reporting” to EVO:RAIL? A: Hard. We needed to customize the BMC hardware in the EMC VSPEX Blue hardware to do this fundamental low level function. If we wanted a broad range of hardware, we would need to do that work effort over and over again.
This is all very low level – well before something like Puppet/Ansible (or vSphere’s build from bare metal) kicks in – but it is a management/orchestration/abstraction level.
Those answers highlight something that gets glossed over in the “Software Defined” era – and yet is super-important when using open, industry standard hardware. The problem above (a lack of a good standard low level hardware abstraction) has been tackled in one of two ways:
- Constraining the variation of the hardware down to a tight degree. This the world of appliances, and tight constraints.
- The hyper-scale folks have tackled this with teams of people that maintain their own “bare metal as a service” standards. They of course have buying power with ODMs that no one has (not even the biggest enterprises or vendors like EMC).
We use a ton of industry-standard hardware, and these problems still plague us. When we have to update the firmware in the infiniband controller in a Isilon node – the amount of work to do this is IMMENSE. Furthermore – slight variations mean that if we have to do the same with an XtremIO Xbrick – the work is exactly double. No less.
Strangely – there are no open projects to tackle this particularly well. It could be in OpenCompute (but isn’t to my knowledge) or Openstack (Ironic is starting to tackle it).
When you start to contemplate things like VxRack and Project Caspian that run on hundreds or thousands (heck even tens of thousands) of industry standard servers, enclosures, switches in many variations… well then you need to solve some problems.
OnRack is a project/technology designed to tackle this head on – to industrialize what the hyper-scale folks have done internally: an open, programmable low level hardware abstraction layer.
OnRack can interrogate and program a broad range of hardware at a low level. It can blam on personas (KVM, vSphere, ScaleIO on baretal, CoreOS, etc). It can do it for servers, switches, and HDD/SSD/NAND enclosures. It can instrument and detect and aggregate low level fault information. It can gather telemetry data at scale. In fact, it can SCALE – the design target is for hyper-scale deployments.
Here’s a quick demo of OnRack.
We don’t think this is a “product” – but rather an ingredient of a product – and frankly the way to have the broadest impact is to be an open-source and community effort. More work is needed to figure out how to do this best – but just like with the ViPR Controller becoming an open-source project, we are all in.