Sunday, October 2, 2011

Heimdal: Now Playing on Windows Near You

Today, Heimdal 1.5.1 was announced including support for Microsoft Windows.  Asanka Herath gave an excellent presentation on the design plans at the 2010 AFS and Kerberos Best Practices Workshop.  The Heimdal port began in December 2008 in response to several motivations:
  1. Several large Secure Endpoints clients were experiencing significant upgrade problems with MIT Kerberos for Windows due to backward compatibility problems between versions 2.6.x and 3.x.  The problems were due to what is affectionately known as DLL Hell.  Applications built against old versions of KFW do not work with newer versions and vice versa because the list of function exports and the ordinal bindings changed.  To make matters worse, it isn't possible to have more than one version of KFW installed on a system at any given time.  This is because KFW libraries must be installed in a directory listed in the system PATH environment variable.  To address this problem Secure Endpoints issued a proposal to MIT in July 2008 that KFW be converted to use Windows Side-by-side Assemblies.  This proposal along with others to improve Network Identity Manager went over like a lead balloon at the Kerberos Consortium.
  2. Secure Endpoints began work on incorporating Hardware Secure Modules such as Thales' nShield into a Kerberized Certificate Authority that could be approved of by The Americas Grid Policy Management Authority.  TAGPMA requires that all certificate authorities store their keys in hardware.  This naturally led us to wonder if we could do the same for a Kerberos Key Distribution Center (KDC).  Heimdal already supported the OpenSSL crypto library which could be used with the nShield HSM.  Asanka presented our ideas at the 2009 AFS and Kerberos BPW.
  3. Finally, OpenAFS needed a number of changes to Kerberos and GSS-API in order to be able to implement the rxgk security class.  There have been numerous presentations on the need for rxgk over the years. Love gave a talk in 2007, Simon gave one in 2010, and another in 2011.  In fact, the rxgk work began back in 2004 at an AFS hackathon in Sweden.  Implementing rxgk requires that all supported platforms provide a Kerberos Crypto Framework (RFC 3961) and the GSS Pseudo-Random Function (RFC 4401).  MIT Kerberos doesn't export a 3961 compatible crypto framework in any version and with the failure to put any resources behind the Windows product there was no GSS PRF support.  The OpenAFS development community has found the Kerberos Consortium quite difficult to work with whereas Heimdal welcomed the proposed changes with open arms.  Heimdal redesigned their repository layout to make it possible for OpenAFS to import core functionality such as the cross-platform compatibility library libroken, the hcrypto library, and the rfc3961 framework.  This in turn permits OpenAFS developers to focus on building a best of breed distributed file system and avoid the need to build and support a Kerberos v5 and GSS-API implementation.  Heimdal is more than just a Kerberos implementation which will permit OpenAFS to more easily support non-Kerberos authentication mechanisms once rxgk is deployed.
The Secure Endpoints distribution of Heimdal is more than just a port to Microsoft Windows.  In order to properly address the needs of existing KFW users and developers, the Heimdal distribution includes a set of KFW 3.x compatible DLLs that act as a shim layer that converts requests issued using the MIT API and forwards them to the Heimdal assembly for processing.

For developers, Secure Endpoints is now distributing a Kerberos Compatibility SDK that will permit applications to be developed which can work seamlessly regardless of whether Heimdal or MIT Kerberos in installed on the system.  OpenAFS and all future Secure Endpoints applications such as Network Identity Manager and the Kerberized Certificate Authority will be built against this SDK.  Applications built against the SDK first search for a compatible Heimdal assembly.  If an assembly is not installed on the system, KFW DLLs are searched for in the PATH and manually loaded.

One important difference between Heimdal and KFW related to how credential caches and keytabs are implemented.  Instead of compiling all supported cache and keytab types into the Heimdal libraries, Heimdal loads credential caches and keytabs as registered plug-ins.  This permits weak cache and keytab implementations to be removed on systems where they shouldn't be supported and permits new implementations to be developed independently of the Heimdal distributions.  This functionality is going to become very useful for OpenAFS users on Microsoft Windows now that OpenAFS 1.7.x includes native authentication groups.  For the first time it will be possible to develop secure Kerberos credentials cache and keytab implementations whose contents become accessible to processes that are impersonating other processes something that has only been possible with the Microsoft Kerberos SSP up to this point.

All in all, the release of Heimdal for Microsoft Windows is an important step forward.


Sunday, September 18, 2011

The OpenAFS IFS Edition is Finally Here

I first proposed the idea of a native redirector based OpenAFS client at the 2004 AFS Best Practice Workshop held at SLAC in March 2004 as part of my Future Directions for th AFS Client on Windows talk.   The talk was my first public assessment of the OpenAFS client for Microsoft Windows.  In fact it was my first presentation as an OpenAFS gatekeeper having only been working with the code base for four months.  In that time a large amount of low hanging fruit was picked but there was so much more to be done.  I wonder how many of the attendees actually believed that even half of the known issues would be resolved in the years to come let alone an installable file system driver.  Prior to 1.3.60 it wasn't even possible to deploy OpenAFS clients on Microsoft Windows with a uniform name space.  Instead of accessing resources via the \\AFS\cellname UNC path, all paths were accessed via \\%HOSTNAME%-AFS\ALL\cellname where %HOSTNAME% was the local machines Netbios name.

By September 2004, CITI at the University of Michigan agreed to fund a graduate student, Eric Williams, to develop an IFS interface for the OpenAFS cache manager.  Eric's implementation was delivered during the Summer of 2005.   The first code dropped in mid-June and the final code dropped in early August.  Eric's implementation was built using Microsoft's IFS Kit and implemented a mini-redirector interface.  It provided support for anonymous \\AFS access without the use of a loopback adapter but did so by mimicking the SMB message flows.  Eric was able to demonstrate 5x performance improvements over the SMB interface.  At the end of the Summer Eric moved onto other obligations and work on the redirector interface stalled.

On August 28, 2006, I was introduced to Peter Scott of Kernel Drivers.  Peter is a Microsoft MVP and a world renowned Windows kernel specialist with a passion for file systems.  Peter volunteered to review the goals I had laid out for the OpenAFS client and the code that Eric Williams had developed.   Three major issues were identified during the review.  First, OpenAFS is a caching file system and the method used to deliver data to satisfy paging requests made it impossible to guarantee that data cached by Windows would be purged in response to a data version change produced by another machine.  Second, the mini-redirector interface underwent a significant change with the introduction of Microsoft Vista and maintaining a common code base across XP, Vista and beyond would have been impossible.  Third, the implemented functionality was sufficient to create, open, close, read from, write to, etc. but the OpenAFS client failed to support a large number of features required by Windows applications such as Unicode character sets, 64-bit file sizes, 64-bit kernels, the WNet API, volume information queries, security information queries, quotas, RPC services such as WRKSVC and SRVSVC, reparse points, and more.

The long term goal for the OpenAFS client for Microsoft Windows was not simply a file system that did not rely on the Microsoft SMB redirector and a loopback adapter.  The goal was to produce a best in class file system that integrated AFS into the Microsoft Windows experience.  Peter and I concluded that we should start over and design an architecture that could support all of the functionality that I desired for OpenAFS and meet some very aggressive performance goals.

Peter had developed a full redirector file system called KDFS which he used for the development of custom file systems for Kernel Drivers clients.  Peter agreed to license the code under a BSD style license to OpenAFS.  This permitted us to use KDFS as a starting point.  On April 21, 2007 we began coding.

We designed an architecture that would not only permit use of a native redirector on Windows XP SP2 through current and future Windows releases but provide a low-risk transition strategy for individuals and organizations to use when migrating from SMB to redirector based interfaces.  One of the key decisions was to maintain both the SMB and IFS interfaces as peers and require that all application visible functionality be implemented in both.  This approach permitted all new functionality to be deployed to end users as updates to the existing 1.5 release series.  Major functional improvements that were shipped prior to the 1.7.1 included:
  • Unicode (UTF-8) encoded file names [1.5.50]
  • Interface independent Path Ioctl processing [1.5.50]
  • Pipe Service RPC emulation for wkssvc and srvsvc [1.5.62]
In addition, literally hundreds of bugs in the cache manager were uncovered and corrected as part of the isolation of the SMB server from the generic AFS cache management layer.  All of these improvements were released as the work was completed providing the end user community immediate benefits and a guarantee that when the IFS interface did ship the cache manager would be unchanged.

The selected architecture permits a single afsd_service.exe to be used either in conjunction with an AFS Redirector driver (afsredir.sys) or with the AFS SMB Server that has been in use for the last fifteen years.  When the AFS Redirector driver is present and active on the system, the SMB Server is disabled.  If the driver is not active, the SMB Server is automatically started.  In addition to the afsredir.sys driver there is one other new component, the AFSRDFSProvider.dll which comes in both 64-bit and 32-bit flavors.  This Network Provider permits the Explorer Shell to browse \\AFS and its cells under the "Network" object as its own category "OpenAFS Network".  To switch back and forth between the SMB-mode and the AFS-Redirector-mode, all that needs to be done is to disable the AFSRedirector driver in the registry.

In general the application behavior when using the AFS Redirector interface should be the same as the AFS SMB Server.  However, there are some differences:
  • The AFS Redirector interface publishes AFS mount points and symlinks as file system reparse points using a Microsoft assigned OpenAFS reparse tag. 
    • Applications that are reparse point aware may no longer cross the reparse point without explicit direction.
    • Applications that are reparse point aware but not OpenAFS tag aware will not understand what to do with the reparse point data.  Ask vendors to contact openafs-gatekeepers@openafs.org to learn how to make their applications OpenAFS aware.
  • Drive mappings to UNC paths that were made using the SMB interface will not be accessible via the AFS Redirector interface until they are removed and recreated.  This is because Windows assigns a drive mapping to a particular file system driver.  When the SMB interface was used, the network in use was "Microsoft Windows Network".  When the AFS Redirector interface is active, the network is "OpenAFS Network".
  • Drive mappings made with the SMB Redirector were not considered to be available when the target path could not be resolved due to either no network access or lack of appropriate authentication credentials.  The AFS Redirector does not disable a drive mapping due to lack of network access or necessary permissions.
  • The AFS Redirector does not require the presence of the Microsoft Loopback Adapter.  When the AFS Redirector is in use, the loopback adapter is ignored.  There are no delays in accessing the \\AFS name space after a suspend or reboot.
  • Applications that report the speed of file copies will report the speed of writing to the Windows cache, not the time writing to the AFS file server.   This is because the AFS Redirector does not require synchronous writes to the file server for each write by the application.  The behavior is closer to that of the Unix cache manager where data is written to the file server only when the Windows cache manager (not to be confused with the AFS cache manager on Windows) flushes dirty extents to the backing store.
  • Due to the existence of the new Network Provider DLL, it is extremely important that the 64-bit WOW MSI be installed on 64-bit systems.  Otherwise, 32-bit applications will not be able to open files in \\AFS when using UNC paths.
  • There is no support for Offline Folders when using the AFS redirector interface.  This is because Offline Folders is a feature of the SMB redirector and not a generic capability layered above arbitrary network file systems.
  • Drive letter substitutions (SUBST D: \\UNC\path) to \\AFS paths will appear as a disconnected network file system when SMB is used but will be connected when the AFS redirector is active.
  •  When the \\AFS name space is viewed via the SMB redirector the directory pointed to by the share name is assumed to be the root directory of the entire name space regardless of how many AFS mount points are crossed.  When the AFS redirector is used, every AFS volume is recognized by Windows as a separate file system.
On the whole, the behavioral changes when switching from SMB to AFS redirector favor the new implementation.  This is especially true when the performance improvements are taken into account.

There are a number of subtle design decisions that are worth discussing.

One of the benefits of the SMB only OpenAFS service is that it ran entirely as a user-space service that could be stopped at any time, be replaced with new binaries, and restarted.  Microsoft Windows file system drivers once loaded cannot be unloaded.  In order to permit upgrades to the afsd_service.exe and kernel driver to be applied without a reboot Peter and I decided to implement the afsredir.sys driver as a framework only driver which in turn loads a kernel library driver, afsredirlib.sys that contains the vast majority of the AFS specific implementation details.  When the OpenAFS Service is stopped, the afsredirlib.sys library is unloaded by afsredir.sys and all operations on \\AFS file objects are suspended until the OpenAFS Service is restarted.  This permits upgrades to be performed on live systems with active applications.

The major benefit of AFS redirector architecture is an improvement in data throughput between the OpenAFS Service and the AFS redirector.  Both the service and the kernel driver share access to the memory mapped AFS cache file.  As a result, instead of sending data in-band within a FetchData or StoreData ioctl, the service and redirector simply exchange ownership over file extents within the cache.  This avoids a large number of data copies and reduces the cpu cost of each ioctl.  With this model in place reads from AFS cache of nearly 800MB/second have been observed.  This is approximately 12 times the best performance ever observed with the SMB interface.

The AFS redirector has a sophisticated Authentication Group implementation.  For those that are unaware, the UNIX AFS client implements Process Authentication Groups (PAGs).  A PAG is a collection of processes that share a common set of network credentials.  A process inherits PAG membership from its parent process but can choose to remove itself from the PAG or create itself a unique PAG.  This permits different processes running as local root to execute with different sets of network credentials.

For Microsoft Windows where a Thread object is just as prime as a Process object the Authentication Group model has been extended to permit processes to belong to more than one authentication group at a time.  Each process has one default authentication group active at a given time and each thread can select its own active group or use the process default group.  This approach permits applications such as IIS to create a unique authentication group for each remote identity and activate that authentication group for each thread handling a request on behalf of that identity.  When a new process is created it only inherits the one authentication group that was active.

Authentication groups are tracked as part of the Windows DACL in the Process or Thread Token.  When a process or thread performs a Local Procedure Call to a background service these tokens permit the background service to impersonate the caller.  When impersonation is active, the background service inherits not only the Windows SID of the calling process but also the active authentication group.  This ensures that LPCs execute with exactly the AFS permissions of the calling process.

Microsoft Windows supports multiple subsystems.  The most well known is the Win32 subsystem.  When NT was originally shipped there were also OS/2 and Posix subsystems.  On 64-bit Windows in addition to Win32 is the Wow64 subsystem which provides the 32-bit application compatibility layer.  The AFS redirector tracks which subsystem is in use and can use the active subsystem to select which @sys search list should be used.  A separate list is maintained for each subsystem.

The first official OpenAFS.org release to include the new AFS redirector was 1.7.1 published on September 15, 2011.  Seven and a half years after the initial proposal and 1608 days after Peter and I began the current implementation.  The Basic COCOMO model (with coeffcients a=2.4 and b=105) estimates the cost of implementing the AFS redirector and the changes to the OpenAFS Service at approximately US$1.2 million.  It can be honestly said that this project would never have been completed if it weren't for the fact that Peter Scott and I were willing to work unpaid for long stretches of time while we searched for additional funding to bring the project to completion.

The release of 1.7.1 does not mean that the project is complete.  There are still many features that I want to see implemented.  Here is a partial list:
  • The Windows File System Volume Query Quota Interface is not implemented. As a result, AFS quota information is not available to application processes or end users via Windows dialogs.
  • The Windows Volume Shadow Copy Service is not implemented. As a result, AFS backup volumes are not accessible via the Explorer Shell.
  • There is no support for storing DOS attributes such as  Hidden, System, or Archive.
  • There is no support for Alternate Data Streams as required by Windows User Account Control to store Zone Identity data.
  • There is no support for Extended Attributes.
  • There is no support for Access Based Enumeration.
  • There is no support for Windows Management Instrumentation.
  • There is no support for Distributed Link Tracking and Object Identifiers.
  • There is no support for storing Windows Access Control Lists. Only the AFS ACLs are enforced.
  • There is no support for offline folders or disconnected operations.
  • There is no Management Console for the OpenAFS Service
The funding for the AFS redirector came a handful of organizations.  Now that OpenAFS 1.7.1 is available I request that any organization that relies on the use of the OpenAFS client on Microsoft Windows contribute US$20 per copy to cover unfunded expenses and future development.

To end on another positive note, the OpenAFS 1.7.1 release has been tested on the Microsoft Windows 8 Developer Preview and it runs flawlessly.  Now all we need  are some nice Metro applications to take advantage of \\AFS.

Sunday, September 6, 2009

When the impossible happens, reconsider the assumptions of what is possible

Ever since Secure Endpoints started receiving OpenAFS for Windows crash reports from Microsoft there have been a small number of reports each month in applications that load libafsauthent.dll (afscreds.exe, netidmgr.exe, ...) and others that perform afs pioctls. It has been the rare case that a mindump has been available. The dumps that have been provided have made no sense. Its been clear that the stack or heap has been overwritten but other than that there has not been enough data to provide a clue where to start looking.

Last week OpenAFS 1.5.62 was released. It was an important release that fixed a long standing data corruption error. Something I have been trying to find for more than a year. Combine it with the support for WKSSVC and SRVSVC services providing vastly improved share name enumeration and Windows 7 compatibility and 1.5.62 was a release that I wanted everyone to upgrade to. Unfortunately, the release proved to have two downsides that did not come out during testing. First, Cygwin applications could not access /afs. Second, roaming profiles in some environments failed to work. The Cygwin compatibility problem was traced to the addition of (supposedly mandatory) extended responses to NTCreateAndX requests. The roaming profiles issue was caused by previously unseen requests to open directories as "Directory::$DATA" instead of "Directory".

Given the importance of the 1.5.62 release and the show stopper nature of the two issues that had been introduced with it, I spent a good portion of this Labor Day weekend testing it. Lo and behold, during testing Network Identity Manager crashed in the Visual Studio 8 CRT memcpy(). The crash signature looked similar to many I have seen in the past but this time I had access to not just the stack trace but the entire memory image to examine in a live debugger. Not surprisingly, the state of the process made no sense. It was unclear if the stack had been damaged. Could the data be real? The memcpy() was attempting to read data out of a buffer populated by a pioctl(). The buffer size is 16KB. The data that should have be returned should not have been more than a few hundred bytes. Yet, the memcpy() was attempting to read beyond the end of the buffer. Examining the contents of the buffer closely showed that the data in the buffer did not match the request. Instead of the buffer containing a GetToken response it contained a WhichCell response. Parse the string "Freelance.Local.Root" as if it were a marshalled token and all hell breaks loose.

Two questions came to mind. First, why is there no data validation of the data received via the pioctl()? Second, how in the world did the wrong response end up being received in the first place? The lack of data validation although completely wrong is not all that surprising. This source code has not been modified since the original IBM contribution. It wasn't causing any problems and therefore didn't attract attention. The response confusion was surprising.

The OpenAFS pioctl() interface on Microsoft Windows works by implementing a transceive (an atomic write request / read response) operation using CreateFile(), WriteFile(), ReadFile(), CloseFile(). The OpenAFS SMB server treats a NTCreateAndX operation on the magic file name "_._AFS_IOCTL_._" as the trigger to indicate that a pioctl() is being performed. Each time the file is opened a new smb file identifier is allocated. The caller writes the pioctl request to the file and then when the first read is issued, the requested operation is performed and the response data is queued up and sent in response. The caller issues ReadFile calls until end of file is reached and then the file is closed. Given this model, how is it that the response could possibly get confused?

My first theory was that a bug in the OpenAFS SMB server was issuing the same file id to two requestors. After close examination of the code it turns out that due to a thread safety issue there was a race that could result in that scenario. After fixing the race, I attempted to prove that the race was the cause of the problem. I kicked off five scripts executing a different pioctl operation 100,000 times. The client side bug was obviously being triggered but there was no evidence that the race I discovered had anything to do with it. Especially considering the fact that the problem continued to occur after the fix to prevent the race was installed.

The next step was to examine the behavior of the five scripts using Sysinternal's Process Monitor while filtering on all access to paths beginning with "\\afs". The output was quite revealing. It showed that requests and their responses based solely upon the length of the response were mismatched. Some ReadFile() operations failed with end of file errors on the first read.

At this point it was time to start examining the trace output of afsd_service. What I discovered was that the smb_IoctlPrepareWrite() and smb_IoctlPrepareRead() functions were being called multiple times on the same smb file id. The theory that the same pioctl instance was being used for requests from multiple processes proved to be correct. The question remained, why was it happening? Further examination of the trace output showed something even more curious. A large number of NTCreateAndX calls were missing from the output. I expected to see one NTCreateAndX operation for each pioctl request. In fact, that was a basic assumption that the original author of the pioctl interface must have assumed was true. Too bad for all of us that it isn't.

As it turns out the Microsoft SMB redirector chooses to avoid multiple NTCreateAndX calls for a file if all of the active requests have the same security privileges and request the same access modes. Instead, the SMB redirector manages the various open/close operations locally and only closes the file after it has been idle. The CreateFile operations were issued with FILE_SHARE_READ|FILE_SHARE_WRITE share mode. This permitted multiple apps to open the file simultaneously and perform writes and reads. If two processes open the file and write a request before the first process reads its response, the first process will receive the response meant for the second process and the second process will receive an end of file error. One solution is to remove the FILE_SHARE_WRITE in order to ensure that only one process can open the pioctl file at a time.

It is now possible to run the five simultaneous pioctl performing scripts without a single error. Even so, data validation checks have been added to libafsauthent.dll to prevent invalid input from crashing applications in the future. I'm now looking forward to the 1.5.63 release and examining the Windows Error Reporting logs in a couple of months to confirm that the random crashes are no longer being reported.

Monday, February 23, 2009


Its been nearly two years since the release of Network Identity Manager 1.3 as part of MIT Kerberos for Windows. Network Identity Manager is preparing to breakout on its own with version 2.0.

With version 2.0 the door is opened for identities based upon authentication technologies other than Kerberos v5. Whereas version 1.x is limited to providing a single sign-on experience when the initial authentication is performed with a Kerberos v5 principal name and password, version 2 permits KeyStore and Certificate initial authentication identities to be implemented. A KeyStore authentication can be used to automatically obtain Kerberos v5 ticket granting tickets for multiple Kerberos v5 identities. Each identity in turn can be used to obtain its own derived credentials such as AFS tokens, Kerberized Certificate Authority issued short lifetime X.509 client certificates, or various forms of web authentication credentials. Certificate based identites might be used with Public Key Initial Authentication for Kerberos (PKINIT) or the Globus Global Security Infrastructure.

Version 2 also improves the end user experience with:
  • a new identity creation wizard
  • progress dialogs
  • a streamlined and less error prone mechanism for obtaining new credentials
  • an updated credential display that is cleaner, less confusing, and more informative
For additional information on the upcoming Network Identity Manager version 2 see:
http://www.secure-endpoints.com/netidmgr/roadmap.html

Saturday, August 2, 2008

OpenAFS for Windows with Unicode is Available

A couple of weeks ago OpenAFS for Windows with Unicode path name support was released.  I thought this was going to be a big deal.  Due to the lack of Unicode support there were all sorts of problems for organizations that wanted to use roaming profiles and redirected folders.  Even more important is the fact that the vast majority of the world does not limit their writing to the characters represented in Windows OEM Code Pages 437 and 850.   For years these individuals could not save their data into AFS using the language of their choice.  

Up to this point, 1.5.5x has had one of the slowest adoption rates of any OpenAFS for Windows release over the last five years.  Is this because it is Summer?  Is it because most users are Americans and they do not require Unicode?  Is it because everyone has given up on AFS?  I don't know.

What I do know is that the Unicode version has been downloaded (in small numbers) by a broad range of top-level domains other than the United States including Malaysia, Russia, Canada, Germany, Taiwan, Brazil, Hong Kong, Poland, Yugoslavia, Croatia, Japan, and Indonesia.  Hopefully, users from these countries will write in to describe how Unicode support has made their lives easier.


Wednesday, May 14, 2008

File System Internationalization sucks

Internationalization in file systems really sucks.  There are two perspectives in the world.  First, there are the POSIX proponents who believe that names are simply nul terminated octet sequences that have no meaning except to the application that created them.  Second, there are those who believe that names are should be portable between systems and therefore should all be encoded in a common character set.  Lets call these second group of folks the UNICODE camp. 

I fall into the UNICODE camp.  This is most likely a side effect of having spent nearly fifteen years of my life working on Kermit, an application and file transfer protocol designed specifically to move files (by name) between computer systems using different architectures and locales.  I learned very early on that if you followed the POSIX approach the end result when a file is copied from an EBCDIC system to an ASCII system or a Latin-1 system to a CP437 system is gibberish.  Not only for human beings but for the applications as well.

A globally accessible file system such as AFS is in many regards similar to Kermit except that instead of copying files into a local file system from a remote system, the AFS client makes the entire remote file system accessible to the local machine.    The exact same character set conversion issues occur.  As long as all of the file names are in the same character set all is dandy and applications on one machine can access files created on another machine.

But what happens when the character sets are different?  In that circumstance, the names become gibberish to humans and applications.  In a worst case scenario, the file name as stored in the directory cannot even be represented on the local machine because the file name contains illegal code points according to the rules of the local environment.

This situation doesn't happen as frequently as it could because still most of the world is only storing US-ASCII or ISO-Latin-1 into the file system.  However, even with those restrictions there are still problems.  For example, the following characters are illegal on Windows systems

  " / \ * ? < > | :

It doesn't matter what the underlying file system is.  If those characters are in the name, the name is illegal.  Any name with those characters will not be included in the directory listing.
This in turn means it is impossible to see the file, access the file, rename the file, delete the file, or delete the directory the file is located in.  File systems that include objects with such names must perform name translation in order for the Windows users or applications to be able to manipulate them.

With the introduction of Unicode another set of complications are introduced.  Unicode provides for multiple semantically equivalent encodings of the same string based upon whether composed or decomposed sequences are used.  For historical reasons, MacOS X stores its file names using UTF-8 encoding of decomposed Unicode sequences, Microsoft Windows stores composed Unicode sequences, Linux also stores composed sequences, and all of the sequences for a given string can be different.  That means that a user who types the same string on all three platforms will obtain a different octet sequence for each platform.  So much for interoperability. 

The POSIX supporters make the claim that names must be treated as octet strings because the locale between two different processes on the same machine can be different.  All that tells me is that POSIX allows users to shoot themselves in the foot.  It doesn't mean it is right.  Of course, the POSIX folks do have a point.  If a UNIX system is incapable of communicating the character set that is being used to the file system, how is the file system supposed to do something sane with it to provide for interoperability between heterogeneous environments.

Microsoft Windows has an advantage here in that there is a standard character set for the entire operating system and all file systems: Unicode.  As a result a file system client on Windows can at least ensure that Unicode names are normalized on output, that directory entry names are normalized for display and lookup, that all illegal characters are mapped to something legal, and ensure that all strings communicated with the file server are the original directory entry names and not the normalized names used locally.  This is the approach that will be taken as Unicode is added to the OpenAFS for Windows client.

Wednesday, March 12, 2008

OpenAFS joins Google Summer of Code 2008

Today OpenAFS submitted an application to take part in the 2008 Google Summer of Code.  OpenAFS project ideas are listed at http://www.openafs.org/gsoc.html.

Thanks to Asanka Herath, Matt Benjamin, Simon Wilkinson and Derrick Brashear for volunteering to be mentors to the next generation of OpenAFS developers.

Update: Monday 17 March 2008, OpenAFS was accepted.