Well – this was a popular session :-) Was in one of the super-session rooms, and was packed to the gills. The recording of the session is below:
PDF of the session can be downloaded here.
Long-Distance VMotion can be used for many, many things – datacenter load balancing, maintenance, and disaster avoidance (among other things). In the long term – it also is the route to internal/external cloud import/export.
The key is that the use case demands that it is non-disruptive in all use cases – and customers generally expect that to include moving the actual storage in full transit.
This was a VMware, Cisco and EMC joint session where we were showing the results of joint solution testing around the use case of VMotion between vSphere 4 clusters that are geographically dispersed across various distances/latencies, and with various workloads (SQL Server OLTP workload and Exchange 2007)
Shudong Zhou (VMware), Balaji Sivasubramanian (Cisco) and I presented. I want to also thank Ravi Neelakant (VMWare), Stephen Spellicy (EMC), and Shawn Roberts (EMC) who did a lot of work behind the scenes during testing and validation.
One thing that was fascinating to me was when Shudong asked “how many of you would want this” – almost everyone said “YES!”. Now, that was before some of the WAN infrastructure requirements (which are significant) were discussed - but man, clearly there is interest :-)
The biggest news was VMware officially shifting it’s general support stance to support bounded long-distance vmotion use cases. Pay close attention to Shudong’s closing comments on this front which are near the end of the video.
Now – as cool as this was, there’s still a lot of work to do.
To be VERY explicit on the storage front: only option 1a (vmotion across distance – note the data doesn’t move) and 1b (svmotion across distance before vmotion across distance) are widely and generically possible today (and deliver the use case, though ~15 minutes of transit time for a 20GB VM). While the solution validation applied to VMFS on block, as we noted, you could use option 1a and 1b today on NFS (heck even blended) scenarios. You can also do it on any array from any vendor.
Option 2 is a preview of something to come from EMC. We had a lot of internal debate about whether or not to show this – as historically, EMC didn’t show things prior to GA, though this is starting to change. We thought there was: 1) a lot of interest; 2) we had data on solution behavior; 3) enough customers that would like Options 1a/b, but desire a faster transit time; 4) the solution is relatively close. Based on all that, we decided we should share the current data and demonstrate it. This also allows us to start to get customer feedback on our approach.
There are other vendor solutions on the market that are around this use case (MetroCluster, stretched active/passive storage virtualization designs, distributed software RAID and logical volume management). Every customer should evaluate them for themselves. Personally, I STRONGLY suggest you hold them against the list of storage solution requirements I outlined in the presentation – as I don’t think there is a good clear solution on the market yet that meets those requirements. Obviously that’s the customer call – not mine. I think that list define the critical solution behavior from the storage standpoint (and of course, are the product requirements in our approach).
Option 2 EMC has a primary locus of effort for this use case (as we think it meets all the requirements the most broadly) and will be the first one available from EMC as a “hardware accelerated” option (it simply looks like a vmotion – the underlying storage mechanism is transparent to vSphere). But Option 2 is not the only way we will support this use case long term. In classic EMC fashion (our strength and our weakness), we are working on several ways to solve this (not every customer is the same). These various approaches including next generation replication variations (which we didn’t discuss in the session) – which have the advantage of “leveraging what you already have”.
I also REALLY want to reinforce another item. As I stated in the session, Option 3 has not been tested, validated in any way. This doesn’t mean it’s right around the corner, or one little validation needed. The slide denoted a short distance – but it’s more accurate to state “very short distances only”. We take support seriously. Until that particular solution is done and validated, this discussion is purely academic.
You can see the level of excitement in the videos below that show the “post session dialog” with customers. Each of these were interesting, as some liked one Storage model (Option 2 which we demonstrated), some wanted it for SRDF and Recoverpoint, some wanted it in the model of Option 3. All great feedback, and validates why this use case needs more than a single approach.
I know that this is very exciting, but PLEASE: don’t immediately reach out to your EMC team and ask to get in on this – it will only slow us down. We’re on it around the clock – let us focus on finishing with the quality customers expect from EMC.
The demonstration showed the behavior of the various options on the storage front and the time to complete the operation of the 3 main options discussed - Option 1a: simple vmotion across distance, storage remains in one place; Option 1b: simple svmotion + vmotion over distance; Option 2: Storage Virtualization over distance.