Monday, November 5, 2012

OpenAFS Windows IFS Thirteen Months Later

On 18  September 2011, I discussed the release of the first OpenAFS release that included a native installable file system redirector.  It is often said that it takes ten developer years to shake out all of the bugs and performance glitches in a new file system.  The last year has certainly seen its fill of BSODs, deadlocks, hiccups, and application interoperability issues.   Today, I am releasing version 1.7.18.  Over the last thirteen months more than 750 changes have been implemented improving performance, stability, and application compatibility.   This post will highlight some of the challenges and lessons learned in the process.

Antimalware Filter Driver compatibility
The vast majority of problems that end users have experienced with the AFS redirector have been related to interactions with Anti-Virus and other forms of content scanners which install filter drivers on the system.  Life would be much easier if there was a standard set of hooks that these products could use to scan files and deny access, quarantine, or otherwise alter the normal application data access patterns.  Unfortunately that is not the case and learning what works and what doesn't has often been left to trial and error.

Since AFS is a network file system that relies upon credentials that are independent of the local operating system there are added complexities.  For example, when Excel opens a spreadsheet file it uses the AFS tokens which are available to the active logon session.  The anti-virus service on the other hand is running as an NT service as the SYSTEM or other account in a different logon session.  As such, it does not have access to the user's AFS tokens unless the requests to scan the file content is performed by borrowing the File Object from Excel or impersonating the Excel process' security context.    Most anti-virus products do impersonate the calling thread or borrow the File Object but not all do.   Versions of Microsoft Security Essentials prior to 2.0 did not and it was a significant problem for OpenAFS.

Anti-virus scanners can choose to scan during the CreateFile operation and during the CloseHandle operation (aka File Cleanup.)  The challenge here for the AFS redirector is that it must hold various locks in order to protect the integrity of the data and provide cache coherency with the file server managed data versions.  Anti-virus scanners can hijak the thread performing the CreateFile or Cleanup and inherit the locks that are already held or they can spawn a worker thread to re-open the file perform a scan and close it again while the application initiated CreateFile or Cleanup is blocked.   Any locks that are held across CreateFile or Cleanup which are required by the anti-virus worker thread will result in a deadlock.   Failure to hold the locks can result in data corruption.   Sophos and Kaspersky were two of the most challenging products to learn to interact with safely.

Microsoft periodically organizes File System Filter Driver PlugFests which provide file system developers, anti-virus vendors, encryption products, content scanners, and others to test their forthcoming products against Microsoft's upcoming operating system releases.  The PlugFest is also an opportunity for third-party vendors to perform interoperability testing with each other.   It was unfortunate that due to increased secrecy regarding the development of Windows 8 and Server 2012 that Microsoft was unable to hold a PlugFest for more than a year.  But in 2012 there were two events in February and August.

The February PlugFest was the first opportunity to interop with a broad range of vendors since the release of 1.7.1.  At that event every Interop session was a painful experience.  During that week 1.7.7 was scheduled to be released but it had to be pulled because of the many problems (deadlocks, BSODs, and data corruption) that were identified during the interop testing sessions.

This past August's experience was the complete opposite.  The code that would become the 1.7.17 release including Windows 8 and Server 2012 specific functionality was tested.  Other than a minor error that was uncovered during the first interop session with Microsoft's own anti-virus engine used in Security Essentials and Windows Defender there was not a single hiccup the rest of the week.  As it turns out, the AFS redirector was the only non-Microsoft file system to implement all of the required new interfaces for Windows 8.

Application Compatibility
Of course, compatibility with deployed applications is the goal.   Whenever possible applications should be unaware that its data is being stored in AFS as opposed to Windows built-in file systems such as NTFS and CIFS.  This challenge is made more complicated by the fact that most applications do not implement feature tests for optional file system APIs.  Instead they just assume that every feature implemented by NTFS or CIFS will be available everywhere.  The deciding factor between whether the file system is local or remote is often decided by whether or not UNC path notation is used.   Things should become easier for non-Microsoft file systems now that Microsoft has introduced ReFS, a new file system that does not implement many features of NTFS including transactions, short names, extended attributes or alternate data streams; none of which are implemented by the AFS redirector.

Still, it is worth noting that the AFS redirector is a very complete implementation of the NTFS and CIFS feature set including support for CIFS Pipe Services such as WKSSVC and SRVSVC and a full implementation of the Network Provider API.  Both the Pipe Services and the Network Provider API are used by applications to browse the capabilities of the network file system and the available resources such as server and share names.   The Network Provider API is also responsible for managing drive letter to UNC path mappings and a path name normalization.   One example of a Network Provider incompatibility was the failure to implement network performance statistics which resulted in periodic 20 second delays from within the Explorer Shell.

Reparse Points
One of the most significant visible changes between the SMB gateway interface and the native AFS redirector is the use of file system Reparse Points to represent AFS Mount Points and Symlinks.  Unlike POSIX symlink which are unstructured data, a Windows File System Reparse Point is a tagged structured data type.  Microsoft maintains a registry of all of the tag values and which organization they are assigned to.  More than 50 reparse point tags have been registered and OpenAFS is the proud assignee of IO_REPARSE_TAG_OPENAFS_DFS (0x00000037L).  The OpenAFS Reparse Tag Data has three sub-types (Mount Point, Symlink, UNC Referral) which are used to export the target information for each.

When the SMB gateway was used, the entire AFS name space appeared to applications as a single volume exported as as single Windows File Share.  It was not possible for Windows to report volume information (quota, readonly status, etc) or detect out of space conditions prior to the application filling the Windows page cache.  Now that reparse points are in use, Windows applications can recognize that a path might have crossed from one volume to another.  Tools such as robocopy that are Junction (aka Reparse Point) aware can perform operations without crossing volume boundaries.

While this is a major improvement in capability, it is also a dramatic change in behavior for applications.  Some applications rely upon the assumption that a Windows File Share can only refer to a single volume and further assume that any file path using UNC notation is a path to a Windows File Share.  Such applications can become confused when they query the volume information of \\afs\example.org\ and told that the volume is READ_ONLY when the full target path \\afs\example.org\user\j\johndoe\ is not.  This is a deficiency in the application and not a fault of the file system.

One downside of the reparse point model is that applications need to understand the format of the structured data to make use of it.  Tools such as JPSoftware's Take Command are reparse parse point aware but can not at present properly display the target information.  The same is true for Cygwin and related tools.

Authentication Groups
The SMB gateway client associated credentials with Windows account usernames (or SIDs).  The AFS redirector tracks process creation and associates credentials with Authentication Groups (AG).   Each process inherits an AG from the creating thread and can create additional AGs to store alternate sets of credentials.  When background services such as csrss.exe and svchost.exe execute tasks on behalf of foreground processes they impersonate the credentials of the requesting thread.  By impersonating the caller, the background thread informs the AFS redirector which credentials should be used.

Sometimes a mistake is made and the background service fails to impersonate the caller and instead attempts to rely upon the service's own credentials to perform its job.  This is the case with conhost.exe when it attempts to access or manipulate the contents of the "Command Prompt.lnk" shortcut.  As a result the contents of cmd.exe shortcuts are ignored when initiating command prompt console sessions.

When Will 1.8 Ship?
Users frequently ask "when will 1.8 ship?  I don't want to deploy the new OpenAFS client until it is production quality."  The reason that the OpenAFS client is 1.7.x and not 1.8.x has less to do with stability than it has to do with the rate of change and unfinished work. The Windows platform has new releases issued every one to two months whereas the rate of issue for the servers and UNIX clients is one every six to twelve months.  The rate of change to support new features or improve compatibility and performance on Windows is significantly higher.  Nearly 1/3 of all patches contributed to OpenAFS.org are new functionality for Windows.  Please do not focus so much in the version label.

1.8 will be issued when the rate of change in the Windows client drops to the point where a new release each month is no longer desirable.  The two most significant areas of work that need to be addressed before a 1.8 release are in the Kerberos bindings and the Installer.  At present, the 1.7.x binaries are built directly against the MIT KFW 3.2 libraries. This permits OpenAFS to work with KFW 3.2 and the KFW translation layer provided by Heimdal 1.5.  However, the KFW 3.2 API does not permit fined grained control over the use of DES encryption types nor is it guaranteed to work with future KFW releases from MIT.  The installer requires ease of use improvements.  The user should not be prompted when files are in-use but should always be prompted to provide a cell name unless the installation is an upgrade.

What Comes After 1.8?
With large scale deployment comes operational experience.  The AFS Redirector design has been shown to have weaknesses that result in a larger than desired in-kernel memory footprint.  There are three areas in which a redesign would be desirable:

1. The File Control Blocks (FCB) and the Object Information Control Blocks (OICB) are bound to one another even though they could very well have different life spans.  An FCB must exist as long as there is an open HANDLE.  Multiple open handles for the same file system object refer to the same FCB.  The FCB contains metadata about the file object that is specific to the file system in-kernel.  It tracks the allocated file size, the list of data extents that are present in-kernel, etc.  For each FCB there must exist an OICB which contains the AFS specific meta data associated with the file object including AFS data version, AFS FileID, etc.   While an OICB must exist for an FCB, it does not have to be the other way around.

The mutual binding of the OICB and the FCB makes garbage collection more difficult than it needs to be.  Some of the race conditions that were fixed in the 1.7.18 release were the result of this complexity.  One of the important goals of a redesign is to break this mutual dependency and instead only maintain a reference from the FCB to the OICB and not the other way around.   Doing so will permit FCBs to be garbage collected when the last handle is closed and OICB objects to be garbage collected with their active reference counts reach zero.  The garbage collection worker thread will hold fewer locks and have a smaller impact on file system performance.

2. The Directory Entry Control Blocks (DECB) also maintain a reference to the OICB.  In fact, each time a directory is enumerated to satisfy FindFirst/FindNext API requests, not only is a DECB allocated but an OICB is as well.  Permitting the OICB to be allocated only when a FCB is allocated instead of as part of directory enumeration will reduce the in-kernel memory footprint.

3. Directory enumeration is currently performed for the entire directory not only when the directory object is opened by an application but also when a FindFirst API is issued for a non-wildcard search.   The vast majority of FindFirst searches are non-wildcard searches for explicit names.  Instead of populating the full contents of the directory in-kernel, the memory footprint can be further reduced by pushing those queries to the afsd_service process.

4. File data is exchanged between the afsd_service and the Windows page cache by sharing a memory-mapped backing store between the AFS Redirector and the afsd_service.   The control over specific file extents is managed by a reverse ioctl interface between the redirector and the user-land service.  This protocol is racy and can result inefficient exchanges of control.  Replacing the existing protocol with one that tracks extent request counts and active reference counts will reduce wasteful exchanges and improve data throughput.

These proposed changes are a significant undertaking and they will not appear in the 1.7.x/1.8.x release series. 

Credits
The OpenAFS for Windows client is the product of Your File System, Inc., Kernel Drivers, LLC, and Secure Endpoints, Inc.  To support the development of the OpenAFS for Windows client, please purchase support contracts or make donations.  The recommended donation is $20 per client installation per year.

Saturday, November 3, 2012

I want my Windows IFS OpenAFS Client to be fast

In 2008 I wrote I want my OpenAFS Windows client to be fast which described the options I used to tune the Windows OpenAFS client that used the SMB server gateway.   As of this writing the current release of OpenAFS for Windows is 1.7.18 which is based upon a native Windows Installable File System, AFSRedir.sys.  This post is an update describing the configuration values I use with the native redirector interface.

The most important related to throughput fall into two categories:

How much data can I cache?
CacheSize
Stats

How Fast Can I Read and Write?
BlockSize
ChunkSize
Daemons
RxUdpBufSize
SecurityLevel
ServerThreads
TraceOption












All of these options are described in Appendix A of the Release Notes.  Here are the values I use:

CacheSize = 4GB (64-bit)  1GB (32-bit)
Stats = 60,000 (64-bit)  30,000 (32-bit)

BlockSize = 4
ChunkSize = 21 (2MB)
RxUdpBufSize = 12582912
SecurityLevel = 1 (when I need speed I use "fs setcrypt" to adjust on the fly)
ServerThreads = 32
TraceOption = 0 (no logging)

None performance related options that I use:

DeleteReadOnly = 0 (do not permit deletion of files with the ReadOnly attribute set)
FollowBackupPath = 1 (mount points from .backup volumes search for .backup volumes)
FreelanceImportCellServDB = 1 (add share names for each cell in CellServDB file)
GiveUpAllCallbacks = 1 (be nice to file servers)
HideDotFiles = 1 (add the Hidden attribute to files beginning with a dot)
UseDNS = 1 (query DNS

Sunday, October 2, 2011

Heimdal: Now Playing on Windows Near You

Today, Heimdal 1.5.1 was announced including support for Microsoft Windows.  Asanka Herath gave an excellent presentation on the design plans at the 2010 AFS and Kerberos Best Practices Workshop.  The Heimdal port began in December 2008 in response to several motivations:
  1. Several large Secure Endpoints clients were experiencing significant upgrade problems with MIT Kerberos for Windows due to backward compatibility problems between versions 2.6.x and 3.x.  The problems were due to what is affectionately known as DLL Hell.  Applications built against old versions of KFW do not work with newer versions and vice versa because the list of function exports and the ordinal bindings changed.  To make matters worse, it isn't possible to have more than one version of KFW installed on a system at any given time.  This is because KFW libraries must be installed in a directory listed in the system PATH environment variable.  To address this problem Secure Endpoints issued a proposal to MIT in July 2008 that KFW be converted to use Windows Side-by-side Assemblies.  This proposal along with others to improve Network Identity Manager went over like a lead balloon at the Kerberos Consortium.
  2. Secure Endpoints began work on incorporating Hardware Secure Modules such as Thales' nShield into a Kerberized Certificate Authority that could be approved of by The Americas Grid Policy Management Authority.  TAGPMA requires that all certificate authorities store their keys in hardware.  This naturally led us to wonder if we could do the same for a Kerberos Key Distribution Center (KDC).  Heimdal already supported the OpenSSL crypto library which could be used with the nShield HSM.  Asanka presented our ideas at the 2009 AFS and Kerberos BPW.
  3. Finally, OpenAFS needed a number of changes to Kerberos and GSS-API in order to be able to implement the rxgk security class.  There have been numerous presentations on the need for rxgk over the years. Love gave a talk in 2007, Simon gave one in 2010, and another in 2011.  In fact, the rxgk work began back in 2004 at an AFS hackathon in Sweden.  Implementing rxgk requires that all supported platforms provide a Kerberos Crypto Framework (RFC 3961) and the GSS Pseudo-Random Function (RFC 4401).  MIT Kerberos doesn't export a 3961 compatible crypto framework in any version and with the failure to put any resources behind the Windows product there was no GSS PRF support.  The OpenAFS development community has found the Kerberos Consortium quite difficult to work with whereas Heimdal welcomed the proposed changes with open arms.  Heimdal redesigned their repository layout to make it possible for OpenAFS to import core functionality such as the cross-platform compatibility library libroken, the hcrypto library, and the rfc3961 framework.  This in turn permits OpenAFS developers to focus on building a best of breed distributed file system and avoid the need to build and support a Kerberos v5 and GSS-API implementation.  Heimdal is more than just a Kerberos implementation which will permit OpenAFS to more easily support non-Kerberos authentication mechanisms once rxgk is deployed.
The Secure Endpoints distribution of Heimdal is more than just a port to Microsoft Windows.  In order to properly address the needs of existing KFW users and developers, the Heimdal distribution includes a set of KFW 3.x compatible DLLs that act as a shim layer that converts requests issued using the MIT API and forwards them to the Heimdal assembly for processing.

For developers, Secure Endpoints is now distributing a Kerberos Compatibility SDK that will permit applications to be developed which can work seamlessly regardless of whether Heimdal or MIT Kerberos in installed on the system.  OpenAFS and all future Secure Endpoints applications such as Network Identity Manager and the Kerberized Certificate Authority will be built against this SDK.  Applications built against the SDK first search for a compatible Heimdal assembly.  If an assembly is not installed on the system, KFW DLLs are searched for in the PATH and manually loaded.

One important difference between Heimdal and KFW related to how credential caches and keytabs are implemented.  Instead of compiling all supported cache and keytab types into the Heimdal libraries, Heimdal loads credential caches and keytabs as registered plug-ins.  This permits weak cache and keytab implementations to be removed on systems where they shouldn't be supported and permits new implementations to be developed independently of the Heimdal distributions.  This functionality is going to become very useful for OpenAFS users on Microsoft Windows now that OpenAFS 1.7.x includes native authentication groups.  For the first time it will be possible to develop secure Kerberos credentials cache and keytab implementations whose contents become accessible to processes that are impersonating other processes something that has only been possible with the Microsoft Kerberos SSP up to this point.

All in all, the release of Heimdal for Microsoft Windows is an important step forward.


Sunday, September 18, 2011

The OpenAFS IFS Edition is Finally Here

I first proposed the idea of a native redirector based OpenAFS client at the 2004 AFS Best Practice Workshop held at SLAC in March 2004 as part of my Future Directions for th AFS Client on Windows talk.   The talk was my first public assessment of the OpenAFS client for Microsoft Windows.  In fact it was my first presentation as an OpenAFS gatekeeper having only been working with the code base for four months.  In that time a large amount of low hanging fruit was picked but there was so much more to be done.  I wonder how many of the attendees actually believed that even half of the known issues would be resolved in the years to come let alone an installable file system driver.  Prior to 1.3.60 it wasn't even possible to deploy OpenAFS clients on Microsoft Windows with a uniform name space.  Instead of accessing resources via the \\AFS\cellname UNC path, all paths were accessed via \\%HOSTNAME%-AFS\ALL\cellname where %HOSTNAME% was the local machines Netbios name.

By September 2004, CITI at the University of Michigan agreed to fund a graduate student, Eric Williams, to develop an IFS interface for the OpenAFS cache manager.  Eric's implementation was delivered during the Summer of 2005.   The first code dropped in mid-June and the final code dropped in early August.  Eric's implementation was built using Microsoft's IFS Kit and implemented a mini-redirector interface.  It provided support for anonymous \\AFS access without the use of a loopback adapter but did so by mimicking the SMB message flows.  Eric was able to demonstrate 5x performance improvements over the SMB interface.  At the end of the Summer Eric moved onto other obligations and work on the redirector interface stalled.

On August 28, 2006, I was introduced to Peter Scott of Kernel Drivers.  Peter is a Microsoft MVP and a world renowned Windows kernel specialist with a passion for file systems.  Peter volunteered to review the goals I had laid out for the OpenAFS client and the code that Eric Williams had developed.   Three major issues were identified during the review.  First, OpenAFS is a caching file system and the method used to deliver data to satisfy paging requests made it impossible to guarantee that data cached by Windows would be purged in response to a data version change produced by another machine.  Second, the mini-redirector interface underwent a significant change with the introduction of Microsoft Vista and maintaining a common code base across XP, Vista and beyond would have been impossible.  Third, the implemented functionality was sufficient to create, open, close, read from, write to, etc. but the OpenAFS client failed to support a large number of features required by Windows applications such as Unicode character sets, 64-bit file sizes, 64-bit kernels, the WNet API, volume information queries, security information queries, quotas, RPC services such as WRKSVC and SRVSVC, reparse points, and more.

The long term goal for the OpenAFS client for Microsoft Windows was not simply a file system that did not rely on the Microsoft SMB redirector and a loopback adapter.  The goal was to produce a best in class file system that integrated AFS into the Microsoft Windows experience.  Peter and I concluded that we should start over and design an architecture that could support all of the functionality that I desired for OpenAFS and meet some very aggressive performance goals.

Peter had developed a full redirector file system called KDFS which he used for the development of custom file systems for Kernel Drivers clients.  Peter agreed to license the code under a BSD style license to OpenAFS.  This permitted us to use KDFS as a starting point.  On April 21, 2007 we began coding.

We designed an architecture that would not only permit use of a native redirector on Windows XP SP2 through current and future Windows releases but provide a low-risk transition strategy for individuals and organizations to use when migrating from SMB to redirector based interfaces.  One of the key decisions was to maintain both the SMB and IFS interfaces as peers and require that all application visible functionality be implemented in both.  This approach permitted all new functionality to be deployed to end users as updates to the existing 1.5 release series.  Major functional improvements that were shipped prior to the 1.7.1 included:
  • Unicode (UTF-8) encoded file names [1.5.50]
  • Interface independent Path Ioctl processing [1.5.50]
  • Pipe Service RPC emulation for wkssvc and srvsvc [1.5.62]
In addition, literally hundreds of bugs in the cache manager were uncovered and corrected as part of the isolation of the SMB server from the generic AFS cache management layer.  All of these improvements were released as the work was completed providing the end user community immediate benefits and a guarantee that when the IFS interface did ship the cache manager would be unchanged.

The selected architecture permits a single afsd_service.exe to be used either in conjunction with an AFS Redirector driver (afsredir.sys) or with the AFS SMB Server that has been in use for the last fifteen years.  When the AFS Redirector driver is present and active on the system, the SMB Server is disabled.  If the driver is not active, the SMB Server is automatically started.  In addition to the afsredir.sys driver there is one other new component, the AFSRDFSProvider.dll which comes in both 64-bit and 32-bit flavors.  This Network Provider permits the Explorer Shell to browse \\AFS and its cells under the "Network" object as its own category "OpenAFS Network".  To switch back and forth between the SMB-mode and the AFS-Redirector-mode, all that needs to be done is to disable the AFSRedirector driver in the registry.

In general the application behavior when using the AFS Redirector interface should be the same as the AFS SMB Server.  However, there are some differences:
  • The AFS Redirector interface publishes AFS mount points and symlinks as file system reparse points using a Microsoft assigned OpenAFS reparse tag. 
    • Applications that are reparse point aware may no longer cross the reparse point without explicit direction.
    • Applications that are reparse point aware but not OpenAFS tag aware will not understand what to do with the reparse point data.  Ask vendors to contact openafs-gatekeepers@openafs.org to learn how to make their applications OpenAFS aware.
  • Drive mappings to UNC paths that were made using the SMB interface will not be accessible via the AFS Redirector interface until they are removed and recreated.  This is because Windows assigns a drive mapping to a particular file system driver.  When the SMB interface was used, the network in use was "Microsoft Windows Network".  When the AFS Redirector interface is active, the network is "OpenAFS Network".
  • Drive mappings made with the SMB Redirector were not considered to be available when the target path could not be resolved due to either no network access or lack of appropriate authentication credentials.  The AFS Redirector does not disable a drive mapping due to lack of network access or necessary permissions.
  • The AFS Redirector does not require the presence of the Microsoft Loopback Adapter.  When the AFS Redirector is in use, the loopback adapter is ignored.  There are no delays in accessing the \\AFS name space after a suspend or reboot.
  • Applications that report the speed of file copies will report the speed of writing to the Windows cache, not the time writing to the AFS file server.   This is because the AFS Redirector does not require synchronous writes to the file server for each write by the application.  The behavior is closer to that of the Unix cache manager where data is written to the file server only when the Windows cache manager (not to be confused with the AFS cache manager on Windows) flushes dirty extents to the backing store.
  • Due to the existence of the new Network Provider DLL, it is extremely important that the 64-bit WOW MSI be installed on 64-bit systems.  Otherwise, 32-bit applications will not be able to open files in \\AFS when using UNC paths.
  • There is no support for Offline Folders when using the AFS redirector interface.  This is because Offline Folders is a feature of the SMB redirector and not a generic capability layered above arbitrary network file systems.
  • Drive letter substitutions (SUBST D: \\UNC\path) to \\AFS paths will appear as a disconnected network file system when SMB is used but will be connected when the AFS redirector is active.
  •  When the \\AFS name space is viewed via the SMB redirector the directory pointed to by the share name is assumed to be the root directory of the entire name space regardless of how many AFS mount points are crossed.  When the AFS redirector is used, every AFS volume is recognized by Windows as a separate file system.
On the whole, the behavioral changes when switching from SMB to AFS redirector favor the new implementation.  This is especially true when the performance improvements are taken into account.

There are a number of subtle design decisions that are worth discussing.

One of the benefits of the SMB only OpenAFS service is that it ran entirely as a user-space service that could be stopped at any time, be replaced with new binaries, and restarted.  Microsoft Windows file system drivers once loaded cannot be unloaded.  In order to permit upgrades to the afsd_service.exe and kernel driver to be applied without a reboot Peter and I decided to implement the afsredir.sys driver as a framework only driver which in turn loads a kernel library driver, afsredirlib.sys that contains the vast majority of the AFS specific implementation details.  When the OpenAFS Service is stopped, the afsredirlib.sys library is unloaded by afsredir.sys and all operations on \\AFS file objects are suspended until the OpenAFS Service is restarted.  This permits upgrades to be performed on live systems with active applications.

The major benefit of AFS redirector architecture is an improvement in data throughput between the OpenAFS Service and the AFS redirector.  Both the service and the kernel driver share access to the memory mapped AFS cache file.  As a result, instead of sending data in-band within a FetchData or StoreData ioctl, the service and redirector simply exchange ownership over file extents within the cache.  This avoids a large number of data copies and reduces the cpu cost of each ioctl.  With this model in place reads from AFS cache of nearly 800MB/second have been observed.  This is approximately 12 times the best performance ever observed with the SMB interface.

The AFS redirector has a sophisticated Authentication Group implementation.  For those that are unaware, the UNIX AFS client implements Process Authentication Groups (PAGs).  A PAG is a collection of processes that share a common set of network credentials.  A process inherits PAG membership from its parent process but can choose to remove itself from the PAG or create itself a unique PAG.  This permits different processes running as local root to execute with different sets of network credentials.

For Microsoft Windows where a Thread object is just as prime as a Process object the Authentication Group model has been extended to permit processes to belong to more than one authentication group at a time.  Each process has one default authentication group active at a given time and each thread can select its own active group or use the process default group.  This approach permits applications such as IIS to create a unique authentication group for each remote identity and activate that authentication group for each thread handling a request on behalf of that identity.  When a new process is created it only inherits the one authentication group that was active.

Authentication groups are tracked as part of the Windows DACL in the Process or Thread Token.  When a process or thread performs a Local Procedure Call to a background service these tokens permit the background service to impersonate the caller.  When impersonation is active, the background service inherits not only the Windows SID of the calling process but also the active authentication group.  This ensures that LPCs execute with exactly the AFS permissions of the calling process.

Microsoft Windows supports multiple subsystems.  The most well known is the Win32 subsystem.  When NT was originally shipped there were also OS/2 and Posix subsystems.  On 64-bit Windows in addition to Win32 is the Wow64 subsystem which provides the 32-bit application compatibility layer.  The AFS redirector tracks which subsystem is in use and can use the active subsystem to select which @sys search list should be used.  A separate list is maintained for each subsystem.

The first official OpenAFS.org release to include the new AFS redirector was 1.7.1 published on September 15, 2011.  Seven and a half years after the initial proposal and 1608 days after Peter and I began the current implementation.  The Basic COCOMO model (with coeffcients a=2.4 and b=105) estimates the cost of implementing the AFS redirector and the changes to the OpenAFS Service at approximately US$1.2 million.  It can be honestly said that this project would never have been completed if it weren't for the fact that Peter Scott and I were willing to work unpaid for long stretches of time while we searched for additional funding to bring the project to completion.

The release of 1.7.1 does not mean that the project is complete.  There are still many features that I want to see implemented.  Here is a partial list:
  • The Windows File System Volume Query Quota Interface is not implemented. As a result, AFS quota information is not available to application processes or end users via Windows dialogs.
  • The Windows Volume Shadow Copy Service is not implemented. As a result, AFS backup volumes are not accessible via the Explorer Shell.
  • There is no support for storing DOS attributes such as  Hidden, System, or Archive.
  • There is no support for Alternate Data Streams as required by Windows User Account Control to store Zone Identity data.
  • There is no support for Extended Attributes.
  • There is no support for Access Based Enumeration.
  • There is no support for Windows Management Instrumentation.
  • There is no support for Distributed Link Tracking and Object Identifiers.
  • There is no support for storing Windows Access Control Lists. Only the AFS ACLs are enforced.
  • There is no support for offline folders or disconnected operations.
  • There is no Management Console for the OpenAFS Service
The funding for the AFS redirector came a handful of organizations.  Now that OpenAFS 1.7.1 is available I request that any organization that relies on the use of the OpenAFS client on Microsoft Windows contribute US$20 per copy to cover unfunded expenses and future development.

To end on another positive note, the OpenAFS 1.7.1 release has been tested on the Microsoft Windows 8 Developer Preview and it runs flawlessly.  Now all we need  are some nice Metro applications to take advantage of \\AFS.

Sunday, September 6, 2009

When the impossible happens, reconsider the assumptions of what is possible

Ever since Secure Endpoints started receiving OpenAFS for Windows crash reports from Microsoft there have been a small number of reports each month in applications that load libafsauthent.dll (afscreds.exe, netidmgr.exe, ...) and others that perform afs pioctls. It has been the rare case that a mindump has been available. The dumps that have been provided have made no sense. Its been clear that the stack or heap has been overwritten but other than that there has not been enough data to provide a clue where to start looking.

Last week OpenAFS 1.5.62 was released. It was an important release that fixed a long standing data corruption error. Something I have been trying to find for more than a year. Combine it with the support for WKSSVC and SRVSVC services providing vastly improved share name enumeration and Windows 7 compatibility and 1.5.62 was a release that I wanted everyone to upgrade to. Unfortunately, the release proved to have two downsides that did not come out during testing. First, Cygwin applications could not access /afs. Second, roaming profiles in some environments failed to work. The Cygwin compatibility problem was traced to the addition of (supposedly mandatory) extended responses to NTCreateAndX requests. The roaming profiles issue was caused by previously unseen requests to open directories as "Directory::$DATA" instead of "Directory".

Given the importance of the 1.5.62 release and the show stopper nature of the two issues that had been introduced with it, I spent a good portion of this Labor Day weekend testing it. Lo and behold, during testing Network Identity Manager crashed in the Visual Studio 8 CRT memcpy(). The crash signature looked similar to many I have seen in the past but this time I had access to not just the stack trace but the entire memory image to examine in a live debugger. Not surprisingly, the state of the process made no sense. It was unclear if the stack had been damaged. Could the data be real? The memcpy() was attempting to read data out of a buffer populated by a pioctl(). The buffer size is 16KB. The data that should have be returned should not have been more than a few hundred bytes. Yet, the memcpy() was attempting to read beyond the end of the buffer. Examining the contents of the buffer closely showed that the data in the buffer did not match the request. Instead of the buffer containing a GetToken response it contained a WhichCell response. Parse the string "Freelance.Local.Root" as if it were a marshalled token and all hell breaks loose.

Two questions came to mind. First, why is there no data validation of the data received via the pioctl()? Second, how in the world did the wrong response end up being received in the first place? The lack of data validation although completely wrong is not all that surprising. This source code has not been modified since the original IBM contribution. It wasn't causing any problems and therefore didn't attract attention. The response confusion was surprising.

The OpenAFS pioctl() interface on Microsoft Windows works by implementing a transceive (an atomic write request / read response) operation using CreateFile(), WriteFile(), ReadFile(), CloseFile(). The OpenAFS SMB server treats a NTCreateAndX operation on the magic file name "_._AFS_IOCTL_._" as the trigger to indicate that a pioctl() is being performed. Each time the file is opened a new smb file identifier is allocated. The caller writes the pioctl request to the file and then when the first read is issued, the requested operation is performed and the response data is queued up and sent in response. The caller issues ReadFile calls until end of file is reached and then the file is closed. Given this model, how is it that the response could possibly get confused?

My first theory was that a bug in the OpenAFS SMB server was issuing the same file id to two requestors. After close examination of the code it turns out that due to a thread safety issue there was a race that could result in that scenario. After fixing the race, I attempted to prove that the race was the cause of the problem. I kicked off five scripts executing a different pioctl operation 100,000 times. The client side bug was obviously being triggered but there was no evidence that the race I discovered had anything to do with it. Especially considering the fact that the problem continued to occur after the fix to prevent the race was installed.

The next step was to examine the behavior of the five scripts using Sysinternal's Process Monitor while filtering on all access to paths beginning with "\\afs". The output was quite revealing. It showed that requests and their responses based solely upon the length of the response were mismatched. Some ReadFile() operations failed with end of file errors on the first read.

At this point it was time to start examining the trace output of afsd_service. What I discovered was that the smb_IoctlPrepareWrite() and smb_IoctlPrepareRead() functions were being called multiple times on the same smb file id. The theory that the same pioctl instance was being used for requests from multiple processes proved to be correct. The question remained, why was it happening? Further examination of the trace output showed something even more curious. A large number of NTCreateAndX calls were missing from the output. I expected to see one NTCreateAndX operation for each pioctl request. In fact, that was a basic assumption that the original author of the pioctl interface must have assumed was true. Too bad for all of us that it isn't.

As it turns out the Microsoft SMB redirector chooses to avoid multiple NTCreateAndX calls for a file if all of the active requests have the same security privileges and request the same access modes. Instead, the SMB redirector manages the various open/close operations locally and only closes the file after it has been idle. The CreateFile operations were issued with FILE_SHARE_READ|FILE_SHARE_WRITE share mode. This permitted multiple apps to open the file simultaneously and perform writes and reads. If two processes open the file and write a request before the first process reads its response, the first process will receive the response meant for the second process and the second process will receive an end of file error. One solution is to remove the FILE_SHARE_WRITE in order to ensure that only one process can open the pioctl file at a time.

It is now possible to run the five simultaneous pioctl performing scripts without a single error. Even so, data validation checks have been added to libafsauthent.dll to prevent invalid input from crashing applications in the future. I'm now looking forward to the 1.5.63 release and examining the Windows Error Reporting logs in a couple of months to confirm that the random crashes are no longer being reported.

Monday, February 23, 2009


Its been nearly two years since the release of Network Identity Manager 1.3 as part of MIT Kerberos for Windows. Network Identity Manager is preparing to breakout on its own with version 2.0.

With version 2.0 the door is opened for identities based upon authentication technologies other than Kerberos v5. Whereas version 1.x is limited to providing a single sign-on experience when the initial authentication is performed with a Kerberos v5 principal name and password, version 2 permits KeyStore and Certificate initial authentication identities to be implemented. A KeyStore authentication can be used to automatically obtain Kerberos v5 ticket granting tickets for multiple Kerberos v5 identities. Each identity in turn can be used to obtain its own derived credentials such as AFS tokens, Kerberized Certificate Authority issued short lifetime X.509 client certificates, or various forms of web authentication credentials. Certificate based identites might be used with Public Key Initial Authentication for Kerberos (PKINIT) or the Globus Global Security Infrastructure.

Version 2 also improves the end user experience with:
  • a new identity creation wizard
  • progress dialogs
  • a streamlined and less error prone mechanism for obtaining new credentials
  • an updated credential display that is cleaner, less confusing, and more informative
For additional information on the upcoming Network Identity Manager version 2 see:
http://www.secure-endpoints.com/netidmgr/roadmap.html

Saturday, August 2, 2008

OpenAFS for Windows with Unicode is Available

A couple of weeks ago OpenAFS for Windows with Unicode path name support was released.  I thought this was going to be a big deal.  Due to the lack of Unicode support there were all sorts of problems for organizations that wanted to use roaming profiles and redirected folders.  Even more important is the fact that the vast majority of the world does not limit their writing to the characters represented in Windows OEM Code Pages 437 and 850.   For years these individuals could not save their data into AFS using the language of their choice.  

Up to this point, 1.5.5x has had one of the slowest adoption rates of any OpenAFS for Windows release over the last five years.  Is this because it is Summer?  Is it because most users are Americans and they do not require Unicode?  Is it because everyone has given up on AFS?  I don't know.

What I do know is that the Unicode version has been downloaded (in small numbers) by a broad range of top-level domains other than the United States including Malaysia, Russia, Canada, Germany, Taiwan, Brazil, Hong Kong, Poland, Yugoslavia, Croatia, Japan, and Indonesia.  Hopefully, users from these countries will write in to describe how Unicode support has made their lives easier.