By Peter Fournier
DITA XML conversion projects can fail simply because they get too complicated. Major stumbling blocks leading to complexity include:
- Too many styles, sometimes running into the hundreds in large document suites.
- Badly applied styles for example a “Heading 1” followed by a “Heading 4”.
- A “Normal” paragraph manually formatted to look like something else.
- Inconsistent content such as multiple ways to label procedures, for example “To clean XYZ” and “Cleaning XYZ”.
Because of these and many other problems it has been very common to cleanup a suite before converting to DITA XML and often doing post-conversion cleanup as well. Pre- and post-cleanup can be the most time consuming, complex, hard to manage and expensive part of a conversion project.
Enter Stilo’s Migrate software. For instance, suppose we have a list in a document that looks like this:
In Microsoft Word, or any other WYSIWIG editor, all the program knows is that
- this text contains a heading (maybe),
- paragraphs, some of which are in a numbered list,
- and formatting to make the text look the way it does.
Migrate assists in the automated conversion to DITA XML by helping you create conversion rules and enhance the final DITA XML output. For example, you can create:
- A rule that detects the fact that the first word in the heading ends in “ing”, indicating it is probably a task heading. It, and the subsequent text will be placed in a “task” topic. The paragraph will be placed in a “title” element.
- However sometimes a heading isn’t really a heading it just looks that way, it’s a manually formatted “Normal” paragraph. That’s OK. You can create a rule that detects the manual formatting and the applies rule 1 above.
- A rule that adds a registered trademark symbol to the first occurrence of the word “QuickTrace”.
- A rule that detects that some paragraphs are numbered and that these should be wrapped in “steps/step/command” elements.
- A rule that detects that the word following the word “Click” is likely a UI element. The word following “Click” will be wrapped in a “uicontrol” span.
- However, sometimes the word “Click” is followed by a word that is in turn followed by a “→” character in which case the whole string with embedded arrows will be wrapped in “menucascade/uicontrol” elements.
- A rule that detects that a paragraph before the actual steps should be wrapped in a “context” element.
- A rule that detects that the last paragraph is indented at the same level as the numbered list and is not followed by another list item. Therefore, it should be wrapped in a “result/p” element.
The final result of the conversion to DITA will look like this:
So, starting with just formatted paragraphs you can create enriched, valid DITA XML, as you can see in the picture — and avoid, perhaps eliminate, pre- and post-cleanup.
To get a better idea of just how enriched the DITA XML really is here is the file that generated the improved output:
Note that the “→” character isn’t present in the XML, it’s added as part of generating the output.
The next time Migrate encounters a task formatted this way it will know what to do. That makes your conversion project simpler, more accurate and faster as you go along.
About the Author
Peter Fournier has extensive experience in the BNR/Nortel documentation space. In the late 80’s and early 90’s he studied the feasibility of moving the Technical Documentation to SGML. He later developed, with his team and advanced online help system for Network Management and other software produced by Nortel. The core of the online help software was based on SGML principles of containerization but only had five or six base elements, and a lot of attributes. It was engineered to be compatible with SGML so the group had no trouble generating valid XML when the draft standard appeared in late 1996 or early 1997. In 2005 he discovered, with great joy, DITA XML. He introduced DITA to JDSU (now Lumentum) in 2008 and served as DITA manager and technical prime until 2018. Between 2010 and 2014 he also found time to get a startup going and developed software to assist groups of 1 to 20 people to get into DITA and manage all the background complexity, including publication. As of 2021 he’s back in the DITA space and loving the Stilo philosophy of making highly complex transformation software easily accessible to customers.