Friday, October 14, 2005

OpenAFS vs ntvdm.exe wildcard searches

In the OpenAFS RT there has been a ticket open for over a year because 16-bit applications when executed out of AFS were crashing.  It turns out that ntvdm.exe will perform wildcard searches for files as FOO?????.C?? instead of FOO*.C*.  OpenAFS for Windows was failing to match this pattern to FOO.C as it implemented a semantic of '?' must match a single character other than (dot).  The real rule is that (dot) is the component separator and '?' matches a single character and may match no characters if it is at the end of a component and the input string is empty.
With this fix all of those people who wish to execute the DOS versions of Quatro or Microsoft Word out of AFS (you know who you are) can rejoice.

Saturday, October 8, 2005

Today we setup a read/write volume that is moved between two servers every 30 minutes.  The MIT Stress Test was then run against this volume.   While the test fails (because the volume becomes busy for six minutes during the move), the OpenAFS client cleanly fails over from one server to the next.  Good job!!!

We have been tracking what appeared to be a weird but in the rx library for the last several weeks.   After some period of time a server would respond to a client with a last packet in a call.  The client would respond by sending an ACK and then terminating the call.   The client would create a new call and issue a request to the server.  The server would appear to have ignored the previous ACK and begin resending the last message of the previous call.   The client no longer believing the previous calls exists would ignore the duplicate messages but it would also *never* resend the first packet of the new call.  This deadlock situation would remain essentially forever. 

After pulling our hair out for several weeks we discovered that the hardware clock on the machine was set to the year 2015.  ntpupdate was reseting the clock to the correct time.  The reason the client wasn't resending the new call packet was because the resend timer was set to expire ten years from now.   I wonder if the server will still be there by the time the resend takes place.  :-D

Friday, October 7, 2005

Tested executing Microsoft Office 2003 out of AFS with OAFW Byte Range Locking support and it works when the 'k' privilege is provided as part of the ACLs for the directories containing the executables.   It would be useful to compile a list of applications that do and do not work.

We have discovered another instance in which a Windows client might be unable to access volume data.  If the volume was cached by the client and then moved, salvaged or otherwise taken offline, the Windows client would never move the volume from the "offline" state to the "not busy" state.   In this case, being "offline" is distinct from all of the known servers being "down".   In the "down" case, the Windows client has a background thread to periodically test the reachability of the server.   In this case, there was no code in place to attempt to find a new source for the volume data.
A correction for this problem has been developed and was committed to the OpenAFS repository.  The fix will be available in the final 1.4.0 release.

Thursday, October 6, 2005

New installers for OpenAFS for Windows pre-1.4.1 with Byte Range Lock support have been uploaded to http://web.mit.edu/jaltman/Public/OpenAFS/ByteRangeLocks/.   The latest builds are dated 20051006.   These builds fix two issues related to lock management.  I am now confident that I can use this version on a daily basis.  Perhaps we should skip the 1.4.0 release.