When first presented with the question of “what does USB have to do with file system drivers”, an easy response might be, “not very much”. After all, file systems are largely independent of the details of the connection state used to get data to and from the disk – a file system’s model of disk I/O is generally quite simple (“it’s a block store”). But in an effort to ensure that our file systems readers had something of interest to read this issue, we decided we would discuss the peculiar semantics of plug and play when it comes to file system drivers.
Of course, if you have a file system filter driver layered on top of the file system, you can also observe the same fundamental behaviors; so while we will be talking about “file systems” we expect that most everything we talk about will be of use to filter drivers as well.
USB devices are the traditional example we use in terms of file systems for the class of devices that are removable. Of course, to confuse this conversation, we note that the ability to remove a device is independently considered from the ability to remove the media in the device. In other words, a CD-ROM drive might use the IDE bus (in which case it is a fixed device supporting removable media) or it might use the USB bus (in which case it is a removable device supporting removable media). Of course, the most interesting of these devices from our perspective is the USB disk storage drive (“flash drive”).
USB flash drives come in two varieties: those with partition tables, and those without partition tables. A typical “flash drive” or “pen drive” (we hear both terms used) will not have a partition table, which in turn means it will not have a “stick” drive letter, when inserted into the machine.
In all fairness, most of this will not matter to a file system driver. What does matter to a file system driver is when the OS advises us that the volume is going to disappear on us. Thus, the four PNP requests about which a file system “cares” are: query remove, remove, surprise remove, and cancel remove. File system drivers do not typically care about any other PnP requests, because most of the other requests are related to resource management and file systems do not normally handle hardware resources.
Another thing that file systems do care about – and have cared about since before plug and play was introduced into Windows – is volume dismount. Thus, the way we usually think of device removal is that it is just one more way in which a volume may be dismounted. This means that a volume may be dismounted:
- Explicitly via the FSCTL_DISMOUNT_VOLUME (either with or without the containing FSCTL_LOCK_VOLUME and FSCTL_UNLOCK_ VOLUME calls)
- Implicitly via a media change detection (CD-ROM removal, Floppy removal, etc.)
- Implicitly via a device removal (PnP)
Regardless, each of these really only becomes “effective” when the device itself is dismounted. In turn, the actual dismount really only occurs when the file system driver gives up control of the volume. For example, the function FatCheckForDismount (strucsup.c) demonstrates one way in which a volume dismount is handled (because all the references to the volume are now gone) versus FatDismountVolume (fsctrl.c) which demonstrates the case in which the volume is explicitly dismounted. A quick scan of the FAT source code will demonstrate that the “check for dismount” logic is invoked in many cases – including the PnP path – while the explicit dismount is done in response just to the FSCTL_DISMOUNT_VOLUME call.
In fact, one of the real complexities of dismounting a volume is that even when the underlying device is “gone,” a file system must continue to maintain state to ensure that applications with open files on the device continue to work properly. Even a cursory review of the FAT code base clearly demonstrates the many concerns here. For example, here’s a typical type of comment we see in FatCommonClose:
//
// We have our locks in the correct order. Remove our
// extra open and check for a dismount. Note that if
// something changed while we dropped the lock, it will
// not matter, since the dismount code does the correct
// checks to make sure the volume can really go away.
//
A common issue in handling device removal is that at the same time someone is trying to pull the plug on the device, an application program might be closing out a file. Thus, code of this type is complicated to “get right” and must guard against potential race conditions.
For file system filter drivers, much of this is simplified by waiting for the file system to do the proper tear-down. A file system filter driver will, in turn, then receive the relevant call back to indicate that the file state is being “torn down.” Despite that, filters must still be written properly to handle the potential situation in which data structures are being dismantled at “unusual times.”
Let’s return our attention to “what happens when a device is removed from the system.” As we noted, there are four different plug and play operations. The manner in which they are handled by file systems is not the same as it is for devices. So, let’s describe the handling of each of the four types of PnP operations with respect to the way file systems handle them. For example, if you examine the PnP implementation in the FAT sample, you will note that all of the processing is done “bus first” – file systems do not perform “driver first” processing.
IRP_MN_QUERY_REMOVE_DEVICE
This operation is requested when something is attempting to perform an orderly shutdown of the device to prepare for removal. If the file system says “no” at this stage, the user will normally see an error that the device is in use. Of course, such indications are, in our experience, mostly useless because the user will simply remove the device any way.
Processing here is generally fairly straight-forward: we attempt to push any state out to disk and then send the request to the lower storage stack. Once we’ve passed the request down to the lower stack, subsequent I/O operations might fail. See FatPnpQueryRemove (pnp.c) for an example of how FAT implements this.
While it takes a bit of digging through the sample FAT code, the primary reason that a query remove is rejected is because the volume is still in use – see FatLockVolumeInternal in fsctrl.c. This would include critical files (paging/registry) as well as normal “in use” files. Of course, this can often be a bit frustrating, since any open file will cause the query remove to fail. If you construct a file system filter driver that opens files on the underlying file system, it is important to close out those files at this point – otherwise the remove operation will fail.
IRP_MN_REMOVE_DEVICE
The remove device operation is the OS’s indication to us that the device is, in fact “gone” with respect to the OS view of the current system (the user might not have physically removed the device yet, but the storage stack below us will reject any attempts to perform I/O on the device). If you review the FAT implementation (FatPnpRemove in pnp.c) it actually does send the request down to the underlying device stack first and then attempts to clean up any “dangling” state. Naturally, there is no guarantee that all the files are closed and the code is written with an understanding that state may not “go away” fast enough.
IRP_MN_SURPRISE_REMOVAL
Surprise removal occurs when the user physically removes the device without performing the “clean shutdown” – or perhaps the user tried and was denied so they removed the device anyway. The FAT model for handling this is essentially the same as remove device handling. Of course, after we see the surprise removal, the OS will send us a normal remove, so we have an opportunity to repeat these steps.
IRP_MN_CANCEL_REMOVE_DEVICE
A cancel occurs when something else in the system refuses to allow a remove operation on the given device. For a file system (or filter) this means that normal operation resumes.
By and large, PnP’s impact in file systems is modest in terms of the code used to implement it, but the complication is that numerous other “common case” paths must now be written to properly handle the case in which the underlying device “goes away”. Thus, when I/O operations arrive inside the file system driver (or perhaps the filter driver) they must be properly rejected. Of course, this is just one more complication in the overall process of developing file systems.