Spring 1998


FEATUREFEATURE
BOOK REVIEWBOOK REVIEW
POET'S CORNERPOET'S CORNER
LETTERS TO WBLETTERS TO WB
TECHNOLOGYTECHNOLOGY
*
*
*
*
*
*
*

Writer's Block




Green leaf

Technology

*

XML: Better Grist from a Better Mill?

by Peter Vasdi

In a world of documentation acronyms — ASCII, ANSI, SGML, HTML, DTD, FOSI — I have recently been exposed to the current newcomer on the scene: Extensible Markup Language (XML). For the first time, I'm seriously impressed with the possibility of a true breakthrough documentation solution. I'm particularly impressed that, at last, the industry is thinking of documentation in a way that promises to be practical and useful to us all.

Losing Sight of the Process

An uneasy relationship always existed between the world of documentation and the technical industry that produces tools for documentation producers and consumers. The search for simpler solutions has typically been pursued by people who straddle the technical and documentation worlds but who are expert in neither one. Unfortunately, this "little knowledge" has been a dangerous thing. It has lead to a reliance on simplified models of what is really a most complex activity. Seizing on the simpler models the technical industry has been able to build tools that, while adequate for certain purposes, cannot sustain the full breadth of the process. Yet that is how they are often sold by "documentation" workers to management. But the simplified view of the documentation world only obscures the larger problems, by lulling the average manager into the belief that the tool is resolving all documentation challenges. The "documentation management" buzzword falls into this category. It implies a global solution. But, at the same moment, most tools sold under the banner of this word provide only the ability to manage collections (libraries) of documents stored electronically as files. The smallest component handled is still the document itself. The industry's perception of document management has yet to take into account the efficient creation of content and the sharing of that content among a multitude of viewable outputs.

The Lessons of History

A society that forgets its history is forced to repeat it. We can learn much about a more realistic — and realisable — model for true documentation management by retracing the history of documentation. The brief glimpse of that history that follows touches on some key transitions in the evolution of documentation technologies. Each transition is presented from three perspectives: the way in which documentation was being created; the technological advance that occurred; and the underlying principle that surfaced as a result.

In the 1500s

Handwritten documents. Documents are written as pure text, free-flowing. Individual authors determine all of a document's qualities: text components representing groupings of information (phrases, sentences, paragraphs, chapters), and their flow and relationships to one another. The technical world responds by developing the printing press: lead "sorts" in different fonts are created for each letter. Without recognizing it, the industry and the authoring community have given each letter of the alphabet two properties: one identifies the letter as a component in text; the other identifies the form (shape, size, and style) of the letter. In hindsight, society in the 1500s has assigned two tags to each letter in a document, one to identify the text component in which it occurs (document title, section title, body), and the other to establish the form of the letter when it appears in that text component.

Between about 1500 and 1900

Printed books. The task of authoring content separates itself from the task of structuring content into a form that appeals to the intended audience. Typesetting becomes an industry. Typesetters develop expertise in analyzing documents, identifying information elements ("tagging"), and assigning style and format to the elements and to the document as a whole. In thousands of independent shops around the world, conventions and systems are established to divide written words into an increasingly complex collection of text components and associated letter formats ("fonts"). A sophisticated tagging language evolves, although it is not formally recognized as such.

From about 1960 into the early 1980s

Formally coded text components. As computers evolve to handle text as well as numbers, information begins to be authored electronically. Tools and systems are developed both for authors and for typesetting companies. Because computers store textual information only as a sequence of characters additional information ("meta" information) must be added if the computer is to be able to correctly format the text for display and printing. A large gap still exists between the tags used for authoring tools and those used for typesetting and page layout.

Mid 1980s

Standards. Computing tools evolve, proliferate, and become available to greater numbers of people. More people use the computer to record and communicate information. People who are information experts, rather than purely authors, begin to realize that they spend a large fraction of their time authoring information and being concerned about its appearance and audience, rather than pursuing their field of expertise. The industry responds with word processing and other tools that are simpler and more "user friendly". Standardization is seen as a way to make the tools simpler, and there is a drive to define the text components of which documents are composed, the possible attributes of each text component, and the "meta" information for each text component. Standardization begins to take hold: experienced documentation professionals link sequences of tags in complicated mini-programs to format specific text elements in different ways, thus developing "document templates". The idea that a text component not only has an identity and a format, but also a relationship to other text components arises.

Late 1980s

Library management. The proliferation of communication tools continues. Virtually everyone is recognizably involved in authoring information in some form. Large volumes of electronically stored information now require serious attention in regard to management. Conceptually, Standard Generalized Markup Language (SGML) is accepted as a universal, formal tagging language that can both identify text components and establish their properties and relationships. Industry solutions are based on the concept of "data plus metadata". However, because the authoring/publishing process is not clearly defined, the solutions address only the management of completed documents, the completed document file being the smallest data element that they can successfully manage.

Early 1990s

Authors’ rebellion. In spite of the improvements in conceptual standards and industry-provided tools, authoring still remains difficult and time-consuming. Most people's expertise lies elsewhere and they resent the effort required to produce documentation. Industry latches onto SGML as the solution to efficient document development, publication, and distribution in many media. SGML tools begin to be developed as industry tries to use SGML to solve documentation problems. Hardcopy media begins to be complemented and challenged by on-line media. The way in which the industry is applying the SGML standard begins to reveal itself as primarily directed towards hardcopy output. Much effort (millions of dollars) is spent to develop SGML document type definitions (DTDs), only to discover that: the tagging paradigm used is hardcopy-directed; the effort required to design an effective DTD is equivalent to that required to design a computer system; each output medium required independent stylesheets to be designed, and the effort to design each stylesheet is similar to that required for the original DTD. Other DTD-generating standards, parallel to the SGML ISO 8879 but oriented towards other media and audiences spring up: ISO 9069 for interchange format, ISO 9070 for information technology, ISO 9573 and others for SGML techniques, ISO 12200 for machine-readable terminology interchange format, ISO 13673 for conformance testing, other standards for on-line documentation, graphics, presentations, audio, and so on. Documentation is still as complex as ever before and little progress in quality and efficiency are made.

Mid 1990s

The look for a simpler, cheaper way. Interactive authoring and publishing efforts applied are on networks and the Internet, with limited success. These serve to highlight problem areas more than to meet documentation requirements. Industry takes a hard look at what a document really is and discovers that it is a repository of information, like a data base and yet not like a data base. Each document is too unique to easily be manipulated by any existing database management system (DBMS) and future documents promise to continue to be uniquely different.

1997-98

XML and XSL appear. XML standard evolves, greatly simplifying the SGML function-based coding standard to include only the concept of a begin and end tag. An information expert establishes what kinds of information are to be recorded, and creates names for each. Appropriate begin/end tags are arranged, and nested, in a specific order to create an author-friendly DTD geared towards information input and ready to accept information content. The DTD no longer needs to hardcode the relationships data elements have with one another, those relationships and attributes being automatically implied by the level of nesting of tagged text components between other information component beginning/end tags, and the frequency and positioning of information elements relative to other elements. Cost of creating a DTD is therefore greatly reduced; the publishing concern is removed from the authoring process. Stylesheet tagging standardized into a Extensible Stylesheet Language (XSL) to empower other experts in publishing with the ability to establish further relationships (attributes) between text elements based on intended publishing medium and audience.

Gleaning the Basics

Although we seem to have made a quantum leap in documentation concepts and solutions, there are still some hurdles to overcome. Most of these are conceptual in that technical people, management, and reluctant authors need to realize some basic truths about documentation. In summary I'd like to throw some concepts, gleaned from the above history, into the general documentation-related bucket.

The documentation process is more than just library management. It must address both the efficient creation of information content, its retention, and its publication, distribution, and overall management.

Documentation is a process that can be helped by using carefully selected technical tools. Documentation is not something that can be customized to suit the demands of any given technological tool. For any documentation effort to be successful, a process must first be manually developed to suit given requirements. Tools can then be later acquired and customized, or developed specifically to help the process along.

The documentation process is based on three distinct efforts, each of which requires its own expertise and experts: the recording of information; the storage of information; and the publishing of that information. The true author who writes for a living or a hobby knows how to do all three; the reluctant author who is an information expert in another field, is good at perhaps only the first effort, the ability to record information. To enable information experts to record information easily, you must not also task them with its storage and publication.

Because of the difficulty in implementing SGML standards and DTDs, the lowest level of information granularity is still the document file — the complete document. Current information management solutions still manipulate complete documents, not information within documents. Until documentation management solutions fragment this granularity down to the topic level, the recording of information remains a major task and reluctant authors will continue to protest and produce substandard documentation under stressed and uncomfortable circumstances.

XML?

XML promises to increase this granularity and provide information experts with a practical way of just recording their knowledge. The XML standard removes much of the onerous task of determining and coding in the relationships between data (text, or information) elements into a DTD and therefore promises to be simple enough so that the industry can produce XML add-ons to existing word processing systems. With XML, the relationship between data elements is implied by the level of "nesting" and the frequency and position of the element relative to other elements. With XML, therefore, a documentation expert can quickly tag a sample document, have the XML add-on generate a DTD that can then be given to an information expert to prompt him/her to enter information only and not worry about format and other publication and audience-related issues. XML makes it easy to create DTDs and therefore enables reluctant authors to author topics, not complete documents.

The other side of the coin, and the remaining challenge, is still to simplify the creation of suitable stylesheets for the final publication of the information in each output medium and audience. To meet this challenge, an Extensible Stylesheet Language (XSL) is being developed which, if adopted as a standard, will enable publishers everywhere to use the same language to convert the source information stored as instances of the DTD into true published documents. The End

 

Tell a friend

NEXT >>

 

Back to top