Sunday, September 18, 2011

The OpenAFS IFS Edition is Finally Here

I first proposed the idea of a native redirector based OpenAFS client at the 2004 AFS Best Practice Workshop held at SLAC in March 2004 as part of my Future Directions for th AFS Client on Windows talk.   The talk was my first public assessment of the OpenAFS client for Microsoft Windows.  In fact it was my first presentation as an OpenAFS gatekeeper having only been working with the code base for four months.  In that time a large amount of low hanging fruit was picked but there was so much more to be done.  I wonder how many of the attendees actually believed that even half of the known issues would be resolved in the years to come let alone an installable file system driver.  Prior to 1.3.60 it wasn't even possible to deploy OpenAFS clients on Microsoft Windows with a uniform name space.  Instead of accessing resources via the \\AFS\cellname UNC path, all paths were accessed via \\%HOSTNAME%-AFS\ALL\cellname where %HOSTNAME% was the local machines Netbios name.

By September 2004, CITI at the University of Michigan agreed to fund a graduate student, Eric Williams, to develop an IFS interface for the OpenAFS cache manager.  Eric's implementation was delivered during the Summer of 2005.   The first code dropped in mid-June and the final code dropped in early August.  Eric's implementation was built using Microsoft's IFS Kit and implemented a mini-redirector interface.  It provided support for anonymous \\AFS access without the use of a loopback adapter but did so by mimicking the SMB message flows.  Eric was able to demonstrate 5x performance improvements over the SMB interface.  At the end of the Summer Eric moved onto other obligations and work on the redirector interface stalled.

On August 28, 2006, I was introduced to Peter Scott of Kernel Drivers.  Peter is a Microsoft MVP and a world renowned Windows kernel specialist with a passion for file systems.  Peter volunteered to review the goals I had laid out for the OpenAFS client and the code that Eric Williams had developed.   Three major issues were identified during the review.  First, OpenAFS is a caching file system and the method used to deliver data to satisfy paging requests made it impossible to guarantee that data cached by Windows would be purged in response to a data version change produced by another machine.  Second, the mini-redirector interface underwent a significant change with the introduction of Microsoft Vista and maintaining a common code base across XP, Vista and beyond would have been impossible.  Third, the implemented functionality was sufficient to create, open, close, read from, write to, etc. but the OpenAFS client failed to support a large number of features required by Windows applications such as Unicode character sets, 64-bit file sizes, 64-bit kernels, the WNet API, volume information queries, security information queries, quotas, RPC services such as WRKSVC and SRVSVC, reparse points, and more.

The long term goal for the OpenAFS client for Microsoft Windows was not simply a file system that did not rely on the Microsoft SMB redirector and a loopback adapter.  The goal was to produce a best in class file system that integrated AFS into the Microsoft Windows experience.  Peter and I concluded that we should start over and design an architecture that could support all of the functionality that I desired for OpenAFS and meet some very aggressive performance goals.

Peter had developed a full redirector file system called KDFS which he used for the development of custom file systems for Kernel Drivers clients.  Peter agreed to license the code under a BSD style license to OpenAFS.  This permitted us to use KDFS as a starting point.  On April 21, 2007 we began coding.

We designed an architecture that would not only permit use of a native redirector on Windows XP SP2 through current and future Windows releases but provide a low-risk transition strategy for individuals and organizations to use when migrating from SMB to redirector based interfaces.  One of the key decisions was to maintain both the SMB and IFS interfaces as peers and require that all application visible functionality be implemented in both.  This approach permitted all new functionality to be deployed to end users as updates to the existing 1.5 release series.  Major functional improvements that were shipped prior to the 1.7.1 included:
  • Unicode (UTF-8) encoded file names [1.5.50]
  • Interface independent Path Ioctl processing [1.5.50]
  • Pipe Service RPC emulation for wkssvc and srvsvc [1.5.62]
In addition, literally hundreds of bugs in the cache manager were uncovered and corrected as part of the isolation of the SMB server from the generic AFS cache management layer.  All of these improvements were released as the work was completed providing the end user community immediate benefits and a guarantee that when the IFS interface did ship the cache manager would be unchanged.

The selected architecture permits a single afsd_service.exe to be used either in conjunction with an AFS Redirector driver (afsredir.sys) or with the AFS SMB Server that has been in use for the last fifteen years.  When the AFS Redirector driver is present and active on the system, the SMB Server is disabled.  If the driver is not active, the SMB Server is automatically started.  In addition to the afsredir.sys driver there is one other new component, the AFSRDFSProvider.dll which comes in both 64-bit and 32-bit flavors.  This Network Provider permits the Explorer Shell to browse \\AFS and its cells under the "Network" object as its own category "OpenAFS Network".  To switch back and forth between the SMB-mode and the AFS-Redirector-mode, all that needs to be done is to disable the AFSRedirector driver in the registry.

In general the application behavior when using the AFS Redirector interface should be the same as the AFS SMB Server.  However, there are some differences:
  • The AFS Redirector interface publishes AFS mount points and symlinks as file system reparse points using a Microsoft assigned OpenAFS reparse tag. 
    • Applications that are reparse point aware may no longer cross the reparse point without explicit direction.
    • Applications that are reparse point aware but not OpenAFS tag aware will not understand what to do with the reparse point data.  Ask vendors to contact openafs-gatekeepers@openafs.org to learn how to make their applications OpenAFS aware.
  • Drive mappings to UNC paths that were made using the SMB interface will not be accessible via the AFS Redirector interface until they are removed and recreated.  This is because Windows assigns a drive mapping to a particular file system driver.  When the SMB interface was used, the network in use was "Microsoft Windows Network".  When the AFS Redirector interface is active, the network is "OpenAFS Network".
  • Drive mappings made with the SMB Redirector were not considered to be available when the target path could not be resolved due to either no network access or lack of appropriate authentication credentials.  The AFS Redirector does not disable a drive mapping due to lack of network access or necessary permissions.
  • The AFS Redirector does not require the presence of the Microsoft Loopback Adapter.  When the AFS Redirector is in use, the loopback adapter is ignored.  There are no delays in accessing the \\AFS name space after a suspend or reboot.
  • Applications that report the speed of file copies will report the speed of writing to the Windows cache, not the time writing to the AFS file server.   This is because the AFS Redirector does not require synchronous writes to the file server for each write by the application.  The behavior is closer to that of the Unix cache manager where data is written to the file server only when the Windows cache manager (not to be confused with the AFS cache manager on Windows) flushes dirty extents to the backing store.
  • Due to the existence of the new Network Provider DLL, it is extremely important that the 64-bit WOW MSI be installed on 64-bit systems.  Otherwise, 32-bit applications will not be able to open files in \\AFS when using UNC paths.
  • There is no support for Offline Folders when using the AFS redirector interface.  This is because Offline Folders is a feature of the SMB redirector and not a generic capability layered above arbitrary network file systems.
  • Drive letter substitutions (SUBST D: \\UNC\path) to \\AFS paths will appear as a disconnected network file system when SMB is used but will be connected when the AFS redirector is active.
  •  When the \\AFS name space is viewed via the SMB redirector the directory pointed to by the share name is assumed to be the root directory of the entire name space regardless of how many AFS mount points are crossed.  When the AFS redirector is used, every AFS volume is recognized by Windows as a separate file system.
On the whole, the behavioral changes when switching from SMB to AFS redirector favor the new implementation.  This is especially true when the performance improvements are taken into account.

There are a number of subtle design decisions that are worth discussing.

One of the benefits of the SMB only OpenAFS service is that it ran entirely as a user-space service that could be stopped at any time, be replaced with new binaries, and restarted.  Microsoft Windows file system drivers once loaded cannot be unloaded.  In order to permit upgrades to the afsd_service.exe and kernel driver to be applied without a reboot Peter and I decided to implement the afsredir.sys driver as a framework only driver which in turn loads a kernel library driver, afsredirlib.sys that contains the vast majority of the AFS specific implementation details.  When the OpenAFS Service is stopped, the afsredirlib.sys library is unloaded by afsredir.sys and all operations on \\AFS file objects are suspended until the OpenAFS Service is restarted.  This permits upgrades to be performed on live systems with active applications.

The major benefit of AFS redirector architecture is an improvement in data throughput between the OpenAFS Service and the AFS redirector.  Both the service and the kernel driver share access to the memory mapped AFS cache file.  As a result, instead of sending data in-band within a FetchData or StoreData ioctl, the service and redirector simply exchange ownership over file extents within the cache.  This avoids a large number of data copies and reduces the cpu cost of each ioctl.  With this model in place reads from AFS cache of nearly 800MB/second have been observed.  This is approximately 12 times the best performance ever observed with the SMB interface.

The AFS redirector has a sophisticated Authentication Group implementation.  For those that are unaware, the UNIX AFS client implements Process Authentication Groups (PAGs).  A PAG is a collection of processes that share a common set of network credentials.  A process inherits PAG membership from its parent process but can choose to remove itself from the PAG or create itself a unique PAG.  This permits different processes running as local root to execute with different sets of network credentials.

For Microsoft Windows where a Thread object is just as prime as a Process object the Authentication Group model has been extended to permit processes to belong to more than one authentication group at a time.  Each process has one default authentication group active at a given time and each thread can select its own active group or use the process default group.  This approach permits applications such as IIS to create a unique authentication group for each remote identity and activate that authentication group for each thread handling a request on behalf of that identity.  When a new process is created it only inherits the one authentication group that was active.

Authentication groups are tracked as part of the Windows DACL in the Process or Thread Token.  When a process or thread performs a Local Procedure Call to a background service these tokens permit the background service to impersonate the caller.  When impersonation is active, the background service inherits not only the Windows SID of the calling process but also the active authentication group.  This ensures that LPCs execute with exactly the AFS permissions of the calling process.

Microsoft Windows supports multiple subsystems.  The most well known is the Win32 subsystem.  When NT was originally shipped there were also OS/2 and Posix subsystems.  On 64-bit Windows in addition to Win32 is the Wow64 subsystem which provides the 32-bit application compatibility layer.  The AFS redirector tracks which subsystem is in use and can use the active subsystem to select which @sys search list should be used.  A separate list is maintained for each subsystem.

The first official OpenAFS.org release to include the new AFS redirector was 1.7.1 published on September 15, 2011.  Seven and a half years after the initial proposal and 1608 days after Peter and I began the current implementation.  The Basic COCOMO model (with coeffcients a=2.4 and b=105) estimates the cost of implementing the AFS redirector and the changes to the OpenAFS Service at approximately US$1.2 million.  It can be honestly said that this project would never have been completed if it weren't for the fact that Peter Scott and I were willing to work unpaid for long stretches of time while we searched for additional funding to bring the project to completion.

The release of 1.7.1 does not mean that the project is complete.  There are still many features that I want to see implemented.  Here is a partial list:
  • The Windows File System Volume Query Quota Interface is not implemented. As a result, AFS quota information is not available to application processes or end users via Windows dialogs.
  • The Windows Volume Shadow Copy Service is not implemented. As a result, AFS backup volumes are not accessible via the Explorer Shell.
  • There is no support for storing DOS attributes such as  Hidden, System, or Archive.
  • There is no support for Alternate Data Streams as required by Windows User Account Control to store Zone Identity data.
  • There is no support for Extended Attributes.
  • There is no support for Access Based Enumeration.
  • There is no support for Windows Management Instrumentation.
  • There is no support for Distributed Link Tracking and Object Identifiers.
  • There is no support for storing Windows Access Control Lists. Only the AFS ACLs are enforced.
  • There is no support for offline folders or disconnected operations.
  • There is no Management Console for the OpenAFS Service
The funding for the AFS redirector came a handful of organizations.  Now that OpenAFS 1.7.1 is available I request that any organization that relies on the use of the OpenAFS client on Microsoft Windows contribute US$20 per copy to cover unfunded expenses and future development.

To end on another positive note, the OpenAFS 1.7.1 release has been tested on the Microsoft Windows 8 Developer Preview and it runs flawlessly.  Now all we need  are some nice Metro applications to take advantage of \\AFS.

2 comments:

Daniel said...

nice flow chart...post more
project management services

Ken Dreyer said...

Massive congratulations to you and Peter. Thank you very much.