In this webcast, I talked about “All Paths Down” or APD – an important storage state for the vmkernel – where all the paths to a known device are down, and it’s unknown whether it’s transient or permanent. The topic of APD behavior has been a sensitive one since the early vSphere 4 days (where some APD bugs, long since fixed, caused still-remembered pain)
Two fascinating (at least to me) things with vSphere were:
- There was no inherent “timeout” for guest (and even vmkernel) IO when a device enters APD (BTW – this is still true even with vSphere 5. This is easy to test. Yank a device (like I did here) and watch what happens.
- There was no easy, proper way to remove a device. Heck, unmounting a filesystem wasn’t easy :-)
In vSphere 5, the engineering team introduced a new device state called Persistent Device Loss (PDL).
PDL means that ESX host can see the array target (implying that connectivity is OK), but the array is saying “hey, I don’t have that device”. This is done via SCSI sense codes: e.g. the target returning 5/25h/00h (ILLEGAL REQUEST; LUN NOT SUPPORTED) or 4/3Eh/01h (HARDWARE ERROR; LUN FAILURE). It’s important to note that if for whatever reasons the device isn’t responding (but those sense codes are not indicated) – the vSphere host will go down the APD code path – so think of APD as something general, and PDL being something specific.
But – once it’s a device is in PDL, the removable is not expected to reappear.
PDL can be planned, or unplanned. The main thing that people should now know is that device removal is safe and simple (which is a very popular thread on an oldie-but-goodie post I did here).
Step 1: Unmount the filesystem – like in the screenshot below.
Step 2: Remove (detach) the device from the host.
Simple and easy. Gotta love the new vSphere 5!
Hi Chad,I have been in APD situation before and this is very good info.Now that vSphere5 has a "Datastore Maintenance mode" option.How does this change APD situation? If a datastore in vCenter is put in maintenance mode and the source storage is removed or not available.Do you think we would still see APD issue? From past experience , remember seeing APD when source storage was removed but the associated LUNS/Datastore's remained in vCenter and appeared in "disconnected" state. Am curious, "if having the datastore in maintenance mode will take the APD away".
Posted by: Parikshith Reddy | July 12, 2011 at 04:10 PM