OpenAFS, Kerberos, and Network Identity Manager: October 2006

Developers have a tendancy to focus on source code management. We maintain source code repositories to help us manage the development process. Within the repository we construct release branches. Each branch allows a set of sources to be shaped for a specific purpose. Typical branching strategies include separate branches for the maintenance of a public release, for development of the next release, and experimental branches for risky development that might not work out or may have an adverse impact on other developers. Developers often give somewhat arbitrary names to these branches "stable", "unstable", "maintenance", "development", "project foo", etc. that only have meaning to the developers.

As is often the case, the names assigned to the branches have no relationship with the quality of the code on a particular branch. This is especially true for a software project which supports large numbers of operating system platforms. Given the rate of development it may often be true that different branches might be a better choice than others for a given platform.

OpenAFS has traditionally labeled its branches as "stable" and "unstable". The even numbered branches are "stable" and the odd numbered branches are "unstable". This has resulted in significant amounts of confusion and frustration for end users. At any given time end users have been presented with up to three current releases:

the last final release off of the "stable" branch
the most recent test release off of the "stable" branch
the most recent release off of the "unstable" or "development" branch

What's an end user to do? More importantly, what's an administrator responsible for choosing the release to distribute throughout their organization to do?

When presented with the choice of selecting among "stable", "beta", or "unstable" which do you think the majority of individuals will choose? End users don't want to install software that is going to cause them to lose data and they don't want to be guinea pigs so more often than not they are going to choose the "stable" release. Even if this release has a list of known bugs a milelong and is years old.

The distinction between the various source code branches is of meaning only to the developers. End users do not think of software as source code. They think of it as a product and the labels associated with different versions of a product will signfiicantly influence the end user's decisions especially when faced with complex choices they are not qualified distinguish between. It is unrealistic to assume that an end user is going to understand the importance of file locking or the meaning of a 64-bit file size or the terminology surrounding deadlocks and reference count leaks. When a typical end user is presented with a choice among two or three complex options without a strong recommendation specifying which should be used, simplistic labels such as "unstable", "stable", "final", "development", "test", "beta", "candidate", etc. are much more influential than they are intended to be.

The reputation of OpenAFS on the Microsoft Windows and MacOS X platforms is suffering in part because of the choices given to end users and the terminology used to describe them. End users want something that works. They want to visit a web site and see that version X.Y.Z is the best version available for their platform and this is what they should be using. When they experience a problem and see that they are not currently running the recommended version, then they will upgrade. If they experience a problem and are presented with choices that they can't make heads nor tails of, they are going to take the path that appears to have the least risk. End users will choose the "stable" or "final" release over something labeled "test", "beta", "unstable", or "development" 9 out of 10 times. Even though the problem they are experiencing might very well be fixed in one of these apparently riskier releases.

For Windows users the availability of multiple releases has been a serious problem. The 1.4 series does not contain significant functionality that is meant to protect end users from data loss. This functionality is only available in the 1.5 series. Unfortunately, due to the fact that end users are presented with new releases from both the 1.4 and 1.5 branches as they are released it is truly impossible for end users to know which to use without a very clear recommendation from the gatekeepers and perhaps the broader user community.

One of the other significant problems facing OpenAFS versioning is the length of time it takes in order to get through a test cycle. It is often the case that a small number of problems on specific operating system versions or hardware architectures can prevent a test cycle from being completed. In the meantime, the release that should be considered the best choice on all of the other operating system versions and hardware architectures is stuck with a label of "test", "beta", or "candidate" which results in organizations and end users from being willing to install it.

As a result I am recommending that OpenAFS (and all other cross-platform open source projects) avoid the use of the one version is best for all platforms mentality. Instead of labeling releases as "stable-1-4-2", "stable-1-4-2-beta-1", "stable-1-4-2-rc3", or "unstable-1-5-9", just use numbers such as"1-4-41", "1-4-42", "1-4-43", "1-5-9". This removes the negative connotations associated with the labels. For each platform a recommended release number can be provided.

This new approach provides a number of side benefits. No longer do the developers need to guess at what version numbers to assign to test builds. When preparing for a new release we want the final version number to be X.Y.Z.00. Therefore, the developers typically try to assign numbers starting with X.Y.(Z-1).90 in order to ensure that version numbers always increase but to avoid the confusion that might arise if end users thought the test release was in fact the final release.

Another benefit is that it will be much easier for administrators to convince management to deploy fixes. Management is always reluctant to deploy a "beta" or "candidate" release because such a release must have bugs. The reality is that all software has bugs. Even if there are no known bugs in a given release at the time the release is announced it is guarranteed that over time bugs will be discovered and they will be fixed in later releases. A "final" release is simply one that is believed to build and run on all supported platforms without known faults.

The requirement that a "final" release build and run on all supported platforms including all new Linux kernels often results in significant delays before important bug fixes can make it out to the user community. For example, at the AFS & Kerberos Best Practice Workshop a demonstration was given of a bug fix to a problem in the 1.4.1 file serverthat adversely affects client mobility. The bug fix was committed on June 3rd and yet it has taken until October 17th before a 1.4.2 final release to be issued. In the meantime, more than four months of end user frustration has accumulated and many sites have deployed 1.4.1 on their file servers instead of one of the "beta" or "candidate" releases that contained the fix.

In speaking with end users, as long as the version label does not contain negative terminology they can push out any build that is recommended. However, once doubt is raised regarding the quality of the release in the minds of management all bets are off.

It is my hope that OpenAFS and other open source projects will abandon the traditional release labeling and replace it with incremental build numbers and platform specific recommendations.

OpenAFS, Kerberos, and Network Identity Manager

Wednesday, October 18, 2006

The need to avoid release labeling and choice for end users