Wednesday, September 26, 2007

Windows Error Reporting versus Open Source Development

Windows Error Reporting is one of the greatest services that Microsoft has ever provided to developers of applications and device drivers for Microsoft Windows operating systems.  It provides a registered and verified software developer with access to crash report data that for that developer's applications.

How does it work?
When an application terminates unexpectedly or a user terminates an application  due to a lack of responsiveness, Windows will capture a mini-dump of the application, the version information of all loaded modules, and the version information for the Windows operating system on which it is being run.  The user is then presented a dialog requesting permission to deliver this information to Microsoft. 

Registered application developers provide Microsoft with a mapping file that describes each binary in a product release including version info, link times, and other traits that can be used to uniquely identity the module.  When crash reports are received by Microsoft, the WER servers compare each report against the mapped modules.  When a match occurs, a WER event is generated and the application developer is notified. 

One of the really nice benefits of WER is that it can sort the events into buckets based upon the type of crash, hang, and process state at the time of the crash.  If the same type of crash occurs 50 times, all of the matching events will be placed into the same event bucket.  Application developers can easily compare the state of all of the crash reports to assist in tracking down the cause.

When a fix is available, the application developer can register a response which will be delivered to subsequent users that experience the same type of crash with the same version of the module or application.  These responses can indicate that the software is not supported on the OS version that it is installed on, or that a new version is available, or that a workaround can be found be reading a provided web page. 

This mechanism benefits both the developers and the end users because as soon as a bug is found it can be fixed without requiring that the end users go through a long process of reporting a crash to the developers directly and being unable to provide enough technical detail for the developers to fix it.  Once the fix is available, end users are automatically notified.  Less frustration for end users and for developers.  Everyone wins.

Unless you are an open source developer or end user....

What is the problem with Open Source?

Secure Endpoints is an open source vendor.  We distribute pre-built installers for Kerberos for Windows and OpenAFS for Windows.  For each of these distributions we have binaries and matching symbol data.  When a crash report arrives from WER, the mini-dump is loaded into a debugger along with the matching binaries and symbol data.  Without the binaries or the symbols, the mini-dump information is much less useful before the stack addresses cannot be matched up with specific functions in the application modules.
As long as the version of the application that is installed is the one Secure Endpoints built, we can make use of the crash reports to identify problems, fix them and notify end users via the WER response mechanism. 

What happens when an organization decides to build the product from the published source code instead of using the pre-built binaries?  In that case, WER matches the module names and file version information and places an event into a crash bucket.  Secure Endpoints downloads the crash report, loads it into the debugger only to find that we have neither matching binaries nor matching symbols.  The end result is that the WER report is useless.  The best I can do is file a response to the end user recommending the use of the pre-built binaries.

I can certainly understand why organizations wish to build their own binaries.  In most cases its because they want to be able to debug problems they experience in-house.  For that they need matching symbols files.  This is exactly the reason why both the Kerberos for Windows and OpenAFS for Windows distributions include the symbol files from the official build.  This way organizations have all the necessary pieces: binaries, symbols files and source code.  Organizations that identify problems internally should file bug reports to the open source maintainers so that fixes can be developed and incorporated into future releases.