In this webcast, I talked about “All Paths Down” or APD – an important storage state for the vmkernel – where all the paths to a known device are down, and it’s unknown whether it’s transient or permanent. The topic of APD behavior has been a sensitive one since the early vSphere 4 days (where some APD bugs, long since fixed, caused still-remembered pain)
Two fascinating (at least to me) things with vSphere were:
- There was no inherent “timeout” for guest (and even vmkernel) IO when a device enters APD (BTW – this is still true even with vSphere 5. This is easy to test. Yank a device (like I did here) and watch what happens.
- There was no easy, proper way to remove a device. Heck, unmounting a filesystem wasn’t easy :-)
In vSphere 5, the engineering team introduced a new device state called Persistent Device Loss (PDL).
PDL means that ESX host can see the array target (implying that connectivity is OK), but the array is saying “hey, I don’t have that device”. This is done via SCSI sense codes: e.g. the target returning 5/25h/00h (ILLEGAL REQUEST; LUN NOT SUPPORTED) or 4/3Eh/01h (HARDWARE ERROR; LUN FAILURE). It’s important to note that if for whatever reasons the device isn’t responding (but those sense codes are not indicated) – the vSphere host will go down the APD code path – so think of APD as something general, and PDL being something specific.
But – once it’s a device is in PDL, the removable is not expected to reappear.
PDL can be planned, or unplanned. The main thing that people should now know is that device removal is safe and simple (which is a very popular thread on an oldie-but-goodie post I did here).
Step 1: Unmount the filesystem – like in the screenshot below.
Step 2: Remove (detach) the device from the host.
Simple and easy. Gotta love the new vSphere 5!