It’s a known fact these days that Microsoft is feeling the Git love. As stated by Microsoft and reported by Ars Technica, the Windows operating system is even moving from its long lived centralized source control system to Git. Strange days indeed!
However, in all of this there’s an interesting technical bit related to Windows file systems. As it turns out, decentralized source control has some hiccups when your project contains ~3.5 million files. In addition to making some upstream changes to tune Git to these scenarios, Microsoft has also introduced the “Git Virtual File System for Windows.” The goal of this file system is to optimally store local copies of Git repos, allowing for the appearance of a complete local repo but delaying the actual population of these files until the data is needed (they refer to this process as “hydration”).
This type of technology isn’t new. Folks have been creating Hierarchical Storage Managers (HSMs) for decades. There are even lots of Windows implementations that do exactly what the Git Virtual File System does through use of a file system filter, reparse points, and sparse files. So, when we first heard of this feature it honestly didn’t sound all that exciting. However, the Microsoft and Ars posts referenced above got our attention with this line:
GVFS relies on a new Windows filter driver (the moral equivalent of the FUSE driver in Linux)
Wait…We’re getting FUSE for Windows now?? That is a pretty bold claim and certainly cause for some excitement. Given that we have successfully ported FUSE based file systems to Windows with our User Mode File System Development Kit, we’re pretty familiar with all of the complexities involved in creating this sort of a solution. Our heads started swimming with possibilities. “Maybe we’re getting user mode file system filters!”, cried one OSR staff member. “Or maybe third party file systems will start getting some love in Windows!”, cried another. The excitement was too much to not sit down and start playing with the feature.
What we have discovered is that this new feature is something far less broad and transformative than a true incarnation of FUSE for Windows. Instead, the Git Virtual File System looks a lot more like a traditional HSM solution. Looking at the (impressively commented, seriously) interface header file GvLib.h, the user mode interface to the virtual file system is much more narrowly focused than the FUSE interface.
Instead, this interface appears to be tuned for the creation of a user mode HSM provider. The user mode component claims a “Virtualization Root” and is then responsible for satisfying directory enumerations (GV_GET_DIRECTORY_ENUMERATION_CB) and hydrating files as they are accessed (GV_GET_FILE_STREAM_CB expected to call GvWriteFile). There’s also APIs for creating empty placeholder files (GvCreatePlaceholderFile) as well as converting existing files into placeholders (GvConvertFileToPlaceholder).
From this interface header it appears that all we’re really able to do is control the view and population of an existing directory on an NTFS volume. Once the file is hydrated, it’s just a Plain Old File (POF) on an NTFS volume.
Not to say that this isn’t a cool feature, especially if they allow for general purpose use of this API or if they open source the kernel mode pieces. Using NTFS for the local storage also makes sense in terms of application compatibility and complexity, in the end it’s just NTFS. We’re not quite ready to call this equivalent to FUSE, though.