Tools and Technology
Teaching the New Art of Authoring in XML
FarPoint Technologies, Inc.
Although XML has been around for years, the use of XML is far from mainstream for most technical writers. There needs to be grounding in the use of XML that goes beyond the programming syntax that is widely available in most XML courses. This article describes one such course the author developed, that combines concepts that are needed by developers of technical content, along with practical exercises and easily understood analogies to give the next generation of technical writers the basis they need.
Not just another fad
Although XML has been around for years, technical writers have only recently begun to develop content using XML. Their use of XML is moving beyond the simple tagging of content for output to multiple display formats; now technical writers are beginning to use XML to meaningfully label the various information types in their content. Labeling is a necessary step as delivery technologies change more rapidly and the amount of information to maintain grows. This use of XML will help us to reap the benefits of content management and reuse.
But the use of XML is far from mainstream for most technical writers, for reasons ranging from inertia to intimidation. Not wanting to be seen as proponents of yet another technological fad or becoming too intimate with publishing technology, many technical writers have kept their distance from XML. I have developed a class for Duke University's Continuing Education Program (for their Certification in Technical Communication) that I hope addresses some of these concerns and steers professionals in the right direction.
The intended audiences for the class include technical writers, content engineers, or anyone responsible for maintaining technical content or documentation. It is intended for professionals who have not yet discovered the value of working with XML, which I believe is crucial for professionals to stay employable as enterprises share more content online and move that content into databases.
Let me start by saying that I'm not a professor or university instructor. So when I was asked to teach a course on XML for non-programmers at a local university, I thought it was an opportunity to convince others that XML is not just another fad. The task of designing a six to eight week class that meets once a week proved to be as challenging as any project I have done. But, once I came up with a suitable analogy, the course framework fell into place. The analogy that seems to best convey the structure and usefulness of XML for me is that of Tupperware containers. XML uses containers of various sizes (elements) and some fit inside others, while others do not. And you should always put the lid on (start and end tags), and label the containers with useful information (meaningful tag names and attributes). And, like Tupperware, there's a catalog of all the possible sizes and shapes (DTD) whether you use them all or not.
The course begins with a brief introduction that promotes the use of XML (or structure), including the sheer amount of information, the extent of the (universal) enterprise, the limitations of stove-piped documentation (which is available only to one department and thus cannot be reused), the fact that audiences are around the world (and around the clock), and that maintenance is an issue as time goes by. For all of these, the solution is XML. Then I quickly show the ubiquity of XML in the enterprise and its pervasive use in data storage and electronic transactions. This includes database publishing, content management, syndication (RSS), enterprise information portals (EIP) and e-commerce.
The course then briefly reviews the basic syntax and workings of XML, but not focusing on that as much as on typing the information for reuse. I like to use an example of an industrial lawn mower manufacturer that has a certain number of models for which some of the documentation is the same for all models. In my example, the company is bought out and integrated with a larger line of machines, and then expanded to include residential lawn mower models with common parts and procedures. So the necessity of modularizing the content and reusing it becomes apparent.
Four basic concepts
This leads into the four basic concepts covered in the course. The first, already mentioned, is modularization. By seeing how content can be segmented and modularized, the common content can be handled efficiently. The second concept is that of creating categories or names for content containers. Here again is the Tupperware analogy. By now the class is familiar with XML terminology, so I explain how to create XML elements, attributes and begin to shape a DTD. The third concept is that of adding attributes to allow filtering or sorting of the elements based on audience or delivery mechanisms. Finally, the concept of standardizing, i.e., creating parallel containers for similar types of products, follows the previous concepts. By articulating the need for certain containers, you can look at a new product and make sure that it has all the requisite containers filled with content before declaring a product fully documented.
Anyone who has put information in a database knows that you have to attach metadata to it so you can find it again; the metadata helps you to understand what you have. So while this aspect of XML is not familiar to most technical writers, there are those such as database analysts, web designers, e-commerce developers who take it easily in stride. So what I emphasize is the design of containers, figuring out the types of content, how to name the elements and put the content in the containers in the most meaningful way.
The course is also a chance for students to discuss the challenges of authoring technical content in XML from a content developer's perspective. I attempt to convince students of the benefits of restructuring the way content is maintained by asking each to choose a real-world example that includes more than a few pages of content. The goal of the exercise is to "containerize" the content; students are asked to put content into containers so they can find it, sort or filter it depending on audience, and reuse it as required. Of course there is more to architecting the content for reuse than simply tagging it, but it is a good first step. So whether students create original text or use existing text, the exercise involves figuring out what the containers should be, how they can be nested and ultimately how to work with content once it is in containers.
I also explain to the students that they can choose industry DTDs, so they need to be aware of the ones applicable to their industry. But there are not many public-domain DTDs for software product documentation. While there are some DTDs in the public domain (such as DocBook), I recommend looking beyond technical content as simply material to be formatted for hard-copy book publishing. Besides the sheer size of the DocBook DTD, its elements were designed with book publishing in mind, not topic generation. DITA is the best so far, as IBM moves it to the public domain. It is topic-based and is extendable, so you can create custom elements that inherit all the characteristics of the more general element.
But the course emphasizes creating your own element names so students learn something about the process of content management. Regardless of what tool or DTD students eventually use, they need to develop the ability to look at content and-with the audiences in mind-develop meaningful element names. In fact, understanding content and developing meaningful element names is probably the most valuable experience gained from this class.
Another aspect of XML authoring is that currently, the world of XML is divided into two camps: the data camp, and the document camp. But content does not reside completely in either of those camps. Just as technical writing is neither all engineering, nor all writing, XML authoring is neither solely of database transactions, nor all layout and presentation. XML is really a child-a mix-of both.
There needs to be an assortment of learning opportunities that go beyond XML syntax; one such practice is to use the Tupperware analogy and the modules concepts. This helps students understand the concepts of content management and the concepts of reuse. With the available tools and a few easily understood concepts, technical writers can quickly become productive with XML. You do not need a team of tools experts or programmers to make XML work. While the course briefly introduces the concepts of single-sourcing and content management, the skills I teach are a predecessor to a course that could cover these topics more fully. It is my hope that with this beginning course, we can begin to develop a curriculum that meets the demand of the business environment but remains general enough to include discussions of larger conceptual issues surrounding content development.
For further reading, many of these concepts are fleshed out in articles on keycontent.org, an online clearinghouse for articles on technical communication and content engineering. You can also read more about the Certificate Program at Duke at http://www.learnmore.duke.edu/techcomm/tcclassdetail.asp?ClassID=8612.