February 21, 2010 — The Upgrade Treadmill, Part 3


Back on the upgrade treadmill once again, it is time to examine one of the nasty little variables involved in an upgrade to Microsoft Office 2010. For the past twenty years or so, Office has been the standard for text input, numeric manipulation, presentation, and departmental database management. Competing products make claims of their strength in various niche markets, but Microsoft is the elephant in the room and we cannot forget it.

Having a single dominant player in a given market has some advantages which we cannot ignore. Microsoft Office gives highly valued stability to the lives of knowledge workers around the globe. People can switch jobs in the safe assumption they will be able to learn the business of their new company without having to learn a new way of interacting with the computer. The term PowerPoint has become almost generic, in the same sense we speak of Kleenex. Large numbers of files are passed between businesses with little thought about file compatibility, because Office is the standard.

File standards are the fly in the ointment. The .doc file standard along with its related formats are ancient in terms of how we count time in the computing universe. File formats like this help us to understand the meaning of the term antediluvian. Considering the changes in technology and culture, including the Internet tsunami, the .doc format has had remarkable staying power. In fact, this format largely eclipsed Microsoft's own format for interoperability, the RTF schema. Most software vendors, including those competing directly with Microsoft, developed the ability to reliably read and write .doc files. Corporations and end-users rightly saw this as a boon. Microsoft, understandably, was less enthusiastic.

In the early part of the century Microsoft begin developing XML-based data formats for Microsoft office. Their published goals were certainly laudable, and included interoperability and openness. They clearly desired to lay a foundation which would serve the requirements of the office suite into the future. The .doc format was good and solid but limited. An XML-based format would eventually allow new types of content, not foreseen by the current authors of the programs. Although Microsoft would deny this, many users as well as competitors assumed Microsoft was working to maintain their hegemony in the office space. Their efforts to establish OOXML as a standard via the IEEE standards process was considered by some to be a blatant attempt to lock the Office standard in for the next 20 years.

Billions of bytes have been scattered about the Internet and the discussion of Microsoft's motives in the OOXML effort. It is not my purpose to criticize Microsoft. If you owned 90% of the market share of a large industry, and had to answer to a large group of stockholders, I can guarantee your actions might be constrained in some ways. Besides, this represents life as we know it, and this is not likely to change in the short term. The question of the day is how we manage a rather drastic file format change.

Because my corporate installed base largely stayed with Office XP, we have a fairly homogenous group of files related to Office. While there are some islands of users with newer versions of Office within our organization, they are not significant in terms of volume. This may turn out to be an advantage for us, because Microsoft's XML implementations have drifted across the three versions of Office subsequent to XP. Office 2007 does not fully conform to OOXML. In fact Microsoft is still tweaking the OOXML standard, so its promise of compatibility with Office 2010 makes me a bit doubtful.

Obsolete file formats are beginning to be a problem for businesses in general and governments in particular. If you are an archivist working for a state government, or perhaps the federal government, you are managing documents that may be hundreds of years old. In most cases you're dealing with the actual paper document as a source, even if it has been scanned. However since the middle to late 1980s, records of regulation, legislation, and legal issues are stored digitally. Because of the drift of file formats, we have discovered we may not be able to read old digital files, because no current software supports those formats.

I believe this summarizes the conundrum facing not just my organization but others. Next week we will explore some options in dealing with this problem.

Contact Marvin at mpreem@gmail.com