Sunday, March 24, 2013

Symbolic Links on Windows

 Over the last month I have learned more about symlinks on Windows than I ever wanted to know.  As many readers are aware, I am the lead developer of the OpenAFS client for Windows and the AFS name space supports two symbolic link type objects:
  • Mount Points: a directory entry that refers to the root directory of an afs volume.
  • Symlinks: a directory entry that refers to any absolute or relative target path; traditionally in POSIX notation.
The original AFS client for Microsoft Windows was implemented as an SMB 1.2 to AFS gateway service and it pre-existed Windows 2000, the first version of Microsoft Windows to include NTFS 3.0 and support for reparse points.  Due to the lack of native OS support, AFS specific command-line tools "fs mkmount", "fs lsmount", "fs rmmount" and "symlink make", "symlink list", and "symlink remove" were provided.

In 2007, Peter Scott and I began work on a Windows Installable File System for AFS.  Technically, the new AFS client is a legacy file system redirector driver which has access to the same functionality and flexibility as NTFS.  In Windows Vista and beyond Microsoft added support for symbolic links to files and directories within NTFS.  They implemented this functionality by combining a directory object or a file object with Reparse Point Data.  The data consists of a Reparse Point Tag value (assigned by Microsoft) and a tag specific data structure.

Microsoft assigns reparse tag values and then includes them in future versions of the ntifs.h header file in the DDK.  If you are developing a file system driver for Windows and wish to have a reparse point tag allocated to your driver, follow the instructions at Microsoft's Reparse Point Tag Request page.  Microsoft is likely to assign only a single Reparse Point Tag value for your driver.  Therefore, I recommend that you request a tag value without the "high latency" or "name surrogate" bits set.  You can always combine those bits with your assigned tag value.   The DDK ntifs.h header includes macros to test various bits:
Reparse Points are a generic mechanism for turning a directory or file object into a reference to something else.  The IsReparseTagMicrosoft() macro is important because it determines which data structure will be set on the file system object.  A Microsoft Tag will use the REPARSE_DATA_BUFFER structure whereas a non-Microsoft Tag will use the REPARSE_GUID_DATA_BUFFER structure.  The latter structure can be customized by the driver vendor.  I recommend defining a structure that contains a driver specific sub-tag value and a union of purpose specific values.  In fact, this is what we did for the AFS redirector.

//
// Reparse tag AFS Specific information buffer
//


#define IO_REPARSE_TAG_OPENAFS_DFS 0x00000037L

#define IO_REPARSE_TAG_SURROGATE   0x20000000L

//  {EF21A155-5C92-4470-AB3B-370403D96369}
DEFINE_GUID (GUID_AFS_REPARSE_GUID,
        0xEF21A155, 0x5C92, 0x4470, 0xAB, 0x3B, 0x37, 0x04, 0x03, 0xD9, 0x63, 0x69);

 
#define OPENAFS_SUBTAG_MOUNTPOINT 1
#define OPENAFS_SUBTAG_SYMLINK    2
#define OPENAFS_SUBTAG_UNC        3

#define OPENAFS_MOUNTPOINT_TYPE_NORMAL   L'#'
#define OPENAFS_MOUNTPOINT_TYPE_RW       L'%'

typedef struct _AFS_REPARSE_TAG_INFORMATION
{
    ULONG SubTag;
    union
    {
        struct
        {
            ULONG  Type;
            USHORT MountPointCellLength;
            USHORT MountPointVolumeLength;
            WCHAR  Buffer[1];
        } AFSMountPoint;

        struct
        {
            BOOLEAN RelativeLink;
            USHORT  SymLinkTargetLength;
            WCHAR   Buffer[1];
        } AFSSymLink;

        struct
        {
            USHORT UNCTargetLength;
            WCHAR  Buffer[1];
        } UNCReferral;
    };
} AFSReparseTagInfo;


The motivation behind using reparse points with the AFS redirector is due to limitations of the SMB to AFS gateway. The global AFS name space consists of millions of individual volumes scattered across hundreds or thousands of AFS cells maintained by different organizations. The entire name space can be thought of being rooted at /afs with /afs/ referring to the volume "root.cell" in the cell whose volume location database servers can be found via a DNS SRV query that assumes a one-to-one mapping between the cell name and DNS domain name.  That is too much information but the point is that when the UNC path \\afs\your-file-system.com\ is evaluated by an AFS client the subset of the AFS name space it refers to is unlikely to be a single volume.  This is really important because  the Win32 GetVolumeInformationByHandleW and GetDiskFreeSpaceEx API permits an application to query properties of the volume such as the amount of free space, the volume name, serial number, and system flags.

An SMB share UNC path is assumed to refer to a single volume.  The SMB 1.2 server does not return different volume information for different paths.  It always returns the volume information associated with the root of the share.  For AFS this is a nightmare.  Each AFS volume will have a unique name and id.  They will also have an assigned quota, have a certain number of bytes free, and can be either read-only or read-write.  Since the AFS name space and the potential associated storage is infinite but a single volume has finite constraints what should the GetVolumeInformation and GetDiskFree API families return when given an AFS path?  In the SMB world, AFS claims there is only one volume "AFS", it is read-write, the size of the volume is 2TB and there is always 1TB free.

This lying by the SMB to AFS gateway results in some awkward behaviors.
  • Attempts to open a file for write, create a file, truncate a file, or create or remove a directory on a read-only volume returns ERROR_WRITE_PROTECTED even though the volume properties indicate that it is read-write.  This results in awkward error messages from applications such as the Explorer Shell which checks the FILE_READ_ONLY_VOLUME flag to determine whether operations such as New ..., Rename, Delete, etc should be removed from menus when the active directory is part of a read-only volume.
  • Since the volume size is hard coded to be 2TB with 1TB free, it is not possible for applications to create files that are larger than 2TB.
  • But worse, the Windows SMB client believes that there is 1TB free.  It can accept vast amounts of data from the application before it discovers that in fact there is no room on the file server to store it.  When the space suddenly disappears the application and the user will receive a "Delayed Write Error" which effectively means "I know I promised you that I would safely store your data for you but I misplaced it and you can't have it back."  In other words, a fatal data loss occurs which more often than not will result in application failure and perhaps a monetary loss.
  • Mount points and symlinks objects are not exposed to Windows applications.  The applications believe that there are only directories and files.  This has some really negative consequences.  When an attempt is made to delete a directory object via the Explorer Shell, the shell will delete not only the directory entry but all of the contents of the directory tree below it.  If the directory entry was a reparse point, only the reparse point would be removed leaving the target intact.  Instead, the explorer shell attempts to delete everything.   When a symlink refers to a file, the symlink should be removed but the target should be left alone.   Finally, rename operations should be performed on the mount point or symlink and not on the target object.
When Peter and I designed the AFS redirector one of the goals was to address these short comings.   Implementing reparse points for AFS mount points and symlinks was key because reparse points attributes on directory objects are the indication to an application that the directory entry and its target may not be in the same volume; therefore, the volume and disk free information must be fetched.  Of course, not all applications properly pay attention to reparse point attributes.  Application authors frequently assume that a UNC path or a network drive letter mapping must be to an SMB 1.2 share and therefore can only refer to a single volume.  I am tempted to produce a wall of shame for applications that get it wrong.  However, the failure of application authors to implement the correct behavior in their applications is not a reason for a file system to fail to make the data available to them.

Up until the 1.7.21(00) release the AFS redirector exposed mount points and symlink data using the Microsoft assigned IO_REPARSE_TAG_OPENAFS_DFS tag value and the AFSReparseTagInfo structure wrapped by the REPARSE_GUID_DATA_BUFFER structure.  In principal this should have been fine.  Applications should not need to parse the reparse data in order to properly interpret a reparse point.  The file attributes of the reparse point object indicate whether its a file or a directory.  The high latency bit of the reparse point tag indicates if the target object is located in a Hierarchical Storage Management system that might not be able to queries about the target object in a reasonable period of time.  Unfortunately, many applications decide to ignore the FILE_ATTRIBUTE_REPARSE_POINT flag it is returned by a GetFileAttributes or GetFileAttributesEx call even though these APIs explicitly return information about a reparse point and not the target.   Some applications follow this behavior when the reparse point tag is not recognized which usually means when IsReparseTagMicrosoft() returns false.  Others do it always.

What happens when the FILE_ATTRIBUTE_REPARSE_POINT bit is discarded and the rest of the file attributes are assumed to apply to the target file?  In addition to the file attributes field the GetFileAttributes and FindFirstFile family of functions also return the file size.  Now the file size does not have much meaning when the object is a directory but when the target of the reparse point is a file using the wrong file size can be catastrophic.  File contents can be truncated when read or overwritten when written.  Applications will be mighty confused when they continue to append data to a file but believe the file size never changed.  They will be even more confused when they attempt to delete a file only to find that either the reparse point was deleted or the target file but not both.  Regardless, bad things happen and that leaves end users with a bad taste in their mouths.

For the 1.7.22(00) release I decided to significantly flesh out the reparse point handling.  For starters, I had been working with Rex Conn on adding knowledge of AFS Reparse Points to Take Command.  Take Command (and its predecessor 4NT) have long had excellent support for AFS.  Take Command distinguishes in the directory list symlinks to files, symlinks to directories and junctions.  It does so for AFS as well as NTFS.  When Take Command 15 is combined with OpenAFS 1.7.22 users can not only view the target information for AFS mount points and symlinks but can also create them if the Take Command process has the SeCreateSymbolicLinkPrivilege which permits the CreateSymbolicLink API to create a symlink to a directory or a file.

CreateSymbolicLink encapsulates the following operations:
  1. Determine the type of the target object (file or directory)
  2. Create either a directory or a file object to match the target type
  3. Construct the REPARSE_DATA_BUFFER structure using the IO_REPARSE_TAG_SYMLINK tag
  4. Issue the  FSCTL_SET_REPARSE_POINT to assign the reparse data to the directory or file
  5. Close the handle to the file or directory
In other words, the CreateSymbolicLink only creates Microsoft symlinks.  Since the tag type is in the data structure it is fairly easy for a file system driver to accept both the IO_REPARSE_TAG_SYMLINK data and the file system specific data.  Once implemented it became possible for the Take Command MKLINK command to be used to create symlinks within AFS volumes.

For the longest time I resisted squatting on Microsoft's tag and data structure but as long as FSCTL_GET_REPARSE_POINT returns the IO_REPARSE_TAG_OPENAFS_DFS data many applications do the wrong thing.  There simply wasn't any choice from the perspective of application compatibility.  As a result in the 1.7.23(00) release AFS Symlinks will be exposed using the IO_REPARSE_TAG_SYMLINK instead of the IO_REPARSE_TAG_OPENAFS_DFS tag.  Only AFS Mount Points will be exposed using the IO_REPARSE_TAG_OPENAFS_DFS tag.

With this change not only can Take Command understand AFS symlinks but so can the Explorer Shell, the Cygwin POSIX environment, the PowerShell Community Extensions, and anything else that can manipulate NTFS symlinks.  Even Hermann Schinagl's Link Shell Extension.

One might think that everyone might be happy at this point except that end users are still faced with applications that do not know how to properly interpret Microsoft Reparse Points.  One example is Microsoft's own .NET.  In Microsoft's How to: Iterate Through a Directory Tree (C# Programming Guide) the author explains:

  NTFS file systems can contain reparse points in the form of junction points, symbolic links, and hard links. The .NET Framework methods such as GetFiles and GetDirectories will not return any subdirectories under a reparse point. This behavior guards against the risk of entering into an infinite loop when two reparse points refer to each other. In general, you should use extreme caution when you deal with reparse points to ensure that you do not unintentionally modify or delete files. If you require precise control over reparse points, use platform invoke or native code to call the appropriate Win32 file system methods directly.

That is not the only thing that .NET does.  It also hides the FILE_ATTRIBUTE_REPARSE_POINT bit in the file attributes from applications and returns the file size of the reparse point data.  As a result parsing a file stream through a symlink to a file results in the data truncation bug.   If the .NET team truly wanted to hide reparse points from application developers, they should have substituted the file attribute information for the target files in all directory enumeration output.  Providing compatibility for broken applications such as this should not be the responsibility of a file system.  However, applications are more important to end users than file systems and if the applications do not work, the file system will be replaced (or never adopted in the first place.)   As a result a future version of the Windows AFS client will probably include a mechanism for requesting that Symlinks to Files be reported as Files and not IO_REPARSE_TAG_SYMLINK reparse points.

While on the subject of Symlinks and Windows I would also like to discuss other approaches to implementing symlinks on Windows that have been implemented over the years.  As I mentioned, Cygwin supports Microsoft IO_REPARSE_TAG_SYMLINK reparse points as Symlinks.

$ ls -l af*
lrwxrwxrwx 1 Administrators None 9 Sep 19  2012 afs -> //afs/all

However, "ln -s target link" cannot be used to create IO_REPARSE_TAG_SYMLINK reparse points.  This is because "ln -s" creates Cygwin specific symlink objects in the file system.  Instead of using reparse points, Cygwin writes a file that begins with a cookie "!", followed by a Unicode BOM and the target path in Unicode.  The file has the FILE_ATTRIBUTE_SYSTEM attribute set as an indicator that the file might be a Cygwin symlink.

On Windows Server, Microsoft provides both a POSIX environment, Interix, and an NFSv3 implementation.  Interix implements symlinks similarly to Cygwin except that the cookie is "IntxLNK\1" and the format of the target path is different.  While the NFS implementation identifies its Symlinks by use of an extended attribute,
"NfsSymlinkTargetName" which stores the target path.

There is one more type of link object in Windows which is sometimes interpreted as a symlink.  That is the Windows Shortcut .LNK file which is interpreted by the Windows Shell.  One thing that is quite odd is that Cygwin at the present time is capable of writing .LNK files but is not capable of creating IO_REPARSE_TAG_SYMLINK reparse points.
[Update: Corinna Vinschen of Cygwin indicates the reason is that POSIX paths can be stored in .LNK files but IO_REPARSE_TAG_SYMLINK fields require the use of Windows file paths and foreknowledge of the target type.]

Microsoft Windows Reparse Points are an extremely powerful and flexible mechanism for implementing file system specific control points.  Much more powerful than the traditional POSIX symlink although much more complex.  An example of a tool that is more powerful because of its reparse point awareness is Microsoft's "Robust File Copy for Windows" tool better known as RoboCopy.   RoboCopy can be configured to exclude junction points (/XJ) by which they mean reparse points; exclude junction points for directories but not files (/XJD); exclude junction points for files (/XJF); and even copy the symlink instead of the target (/SL).   All of these switches work with the Windows AFS client.

My final comment for this post is that evaluating AFS directories which contain symlinks is an extremely expensive operation.  Unlike the POSIX equivalents, a Windows directory enumeration always returns the WIN32_FIND_DATA structure for each directory entry which contains the file attributes.  A reparse point to a directory must have the FILE_ATTRIBUTES_DIRECTORY bit set and a reparse point to a file must not.  All of the other fields of the WIN32_FIND_DATA structure can be determined from the reparse point itself but AFS does not have a method of hinting the client what the type of the target object is.  As a result, the target path must be evaluated for each and every directory listing.  A directory such as /afs/andrew.cmu.edu/ which contains more than 30,000 relative symlinks to directories will require nearly twice that number of RPCs to the file server to complete the directory enumeration.  Something to think about when planning your AFS name space.

1 comment:

hardcoder said...

Insightful post, thanks for taking the time to post this.