Content management no image

Published on July 7th, 2013 | by Rahel Bailie


DITA-to-Web: The next Big Thing – Part 2

DITA (Darwin Information Typing Architecture) and the Web

Until now, there has been a comfortable, if somewhat artificial, difference between the world of content in a Web CMS and the world of structured authoring. The marketing content lived and died in the Web CMS, and the enabling content (product content, user guides, training material, customer support content, and so on) was created in a HAT or CCMS tool and was then pushed into a walled garden in the Web CMS. And never the twain did meet, nor did that content get leveraged for the greater good of the organization. Or for users.

Why the Web needs DITA

So what’s different now? Mobile, for one. The issues of meeting multiple audiences on different platforms, all on a tight deadline, has moved upstream to the people who were used to entering content into the WCMS and walking away. Now, that mobile solutions are being commissioned, often completely separately from the “main” system. And as content becomes desynchronized, and the pain points grow, organizations are getting the deer-in-the-headlights look as the fact dawns on them that they have to support content in a raft of forms and output that they’re not prepared to handle.

The next step, then, is to take an already highly-functional system for authoring structured content (and all the semantic advantages it brings to the table), and get it to work well within a Web CMS ecosystem. The possibilities are endless but here are a couple of basic scenarios.

Your organization is going from one market to multiple markets, and you need to vary your content for those markets in product information that is part of the news release – what’s available in the US vs Canada, India, or Pakistan, for example.

How this works in a Web CMS

The sample text below comes from an announcement of a tablet from 2011, when 4G was available only in the US.

Here is a look at the specs [for our new tablet]:

  • 10.1-inch 1280×800 display
  • 1GHz Tegra 2 dual-core processor
  • 1GB RAM
  • 16 or 32 GB internal storage
  • Always connected 4G Robust 3G

Here’s what a typical Web editor would do:

  1. Create the news release with the product description for the US site, as shown above.
  2. Copy the news release, and adjust the product description for sites for Canada,  India, and Pakistan (either change 4G to 3G, shown above, or leave off the last line for Pakistan which has neither 4G nor 3G).
  3. Copy the news release and adjust the product description, for any other site.

How this works in DITA

One news release is created with the product description for all three markets.

  1. The product specs for specific markets are tagged with the appropriate audience: 4G (for the US) and 3G (for Canada and India). Because Pakistan had neither 4G or 3G at the time, those bullet points would simply be exluded from the Pakistan site.
  2. Click “Generate”, let the transformation rules sort out what gets included in the multiple news releases and route the output to where it needs to appear.

Here is a look at the specs [for our new tablet]:

  • 10.1-inch 1280×800 display
  • 1GHz Tegra 2 dual-core processor
  • 1GB RAM
  • 16 or 32 GB internal storage
  • [audience=”4G”]Always connected 4G[/audience=”4G”]
  • [audience=”3G”]3G network[/audience=”3G”]

Barriers to DITA-to-Web Adoption

So if this system can bring so much benefit, why haven’t we seen any widespread adoption? Good question, and it comes down to education, technology, and governance.

Education: Those making technology decisions often have never heard of DITA, HATs, structured authoring environments, and don’t have any experience with it. From my experience, there are no courses being taught on this, nor books written on the technical side of how to implement this. So it remains the domain of a very niche group of developers, and for the vast majority of “php guys” or “java guys” or “.net guys”, this whole world is a mystery. It’s not much better on the editorial side. The vast majority of writers have never heard of this way of authoring, and have no experience. This way of authoring is not taught except in the most esoteric of programs. And learning a whole new way of writing can be intimidating.

Technology: Very few commercial CMS vendors have put in any effort toward getting these systems to work as part of a larger ecosystem. This problems is being solved on a case-by-case basis by very large corporations, but they aren’t about to hand out their solutions for their competitors to use. And even the most explicit examples (the Contelligence Group site comes to mind) of direct DITA-to-Web aren’t visible to the average viewer. You would have to be told what is behind the scenes, and be shown the post-login dashboard to really appreciate what is going on. The vast majority of structured content authors don’t have the  technical acumen to describe to developers what needs to be done to make the DITA-to-Web magic happen.

Governance: DITA-to-Web is a disruptive technology. It can change the locus of control, implicit power structures,  and organizational processes. The blocker could be management, who don’t understand the breadth of the value of solving their content problems, or it could be management reluctance to try new processes and technologies that they view as experimental or untested (despite DITA being around for over a decade). The blocker could be technical staff who are reluctant to go down this very different path, or a project manager who wants to do something simple and known, even if the effectiveness is limited. The blocker could equally be communicators who see structured authoring as more work, different work, or a writing method that somehow compromises their creativity.

It will likely take another few years for the DITA-to-Web technologies and processes to stabilize, and another decade, at least, for these to be widely adopted. The number of stakeholders affected by such a change is staggering, from those directly affected by changes in workflow to those indirectly affected, such as third-party agencies and communicators who can anticipate steep learning curves. Yet, as corporate pain points grow, so does the need for solutions that provide relief.

DITA-to-Web: The next Big Thing  – Part 1

Share this post:
These icons link to social bookmarking sites where readers can share and discover new web pages.

  • StumbleUpon
  • email
  • Facebook
  • LinkedIn
  • TwitThis

About the Author

Rahel Anne Bailie is a synthesizer of content strategy, requirements analysis, information architecture, and content management to increase the ROI of content. She has consulted for clients in a range of industries, and on several continents, whose aim is to better leverage their content as business assets. Founder of Intentional Design, she is now the Chief Knowledge Officer of London-based Scroll. She is a Fellow of the Society for Technical Communication, she has worked in the content business for over two decades. She is co-author of Content Strategy: Connecting the dots between business, brand, and benefits, and co-editor of The Language of Content Strategy, and is working on her third content strategy book,

9 Responses to DITA-to-Web: The next Big Thing – Part 2

  1. Don Day says:

    Thanks for mentioning the Contelligence Group web site, Rahel. Another blocker is the case of content strategists and information architects who simply have not understood the value that a structured XML content system like DITA can provide, particularly for how it can enable new HTML5 features to be exploited more consistently. Responsive themes, for example, benefit from consistently structured content such as that generated directly from XML structures. I’m looking forward to a Part 3 on this line of thinking!

  2. Larry Kunz says:

    Thanks, Rahel. I agree with your analysis, except for the Education paragraph. There are plenty of classes and tutorials that cover the principles of structured authoring. There are also good books about how to implement DITA: three that come to mind are Tony Self’s DITA Style Guide, Eliot Kimber’s DITA for Practitioners, and Julio Vazquez’s Practical DITA.

    Are you saying that these resources are incomplete or inadequate? That they’re not reaching the audience they need to reach? Or something else?

  3. rahelab says:

    Larry, I’d love to be wrong on this one but have to tell you that I stand by my research. I did a search on “learn DITA Vancouver” and came up with nothing. Compare this to “learn HTML5 Vancouver” where every community college and university, and some private-sector organizations have classes available. I don’t know of any tech writing program that teaches structured, semantic content (editorially structured content for help gets you only so far), and any of the other communications programs (business comms, marketing comms, public relations) don’t go anywhere NEAR content structure and semantics. I know there are DITA books out there – I own them all, but who can learn this stuff from a book? In a recent exercise with 20 writers, none of them learned more than some general theory from a book. (I tried to read the Tony Self book, but all it did was make me cry, so I loaned it to the development team.) At best, reading sparks a lot of questions.

    So taking a course on this means (a) finding one, (b) scheduling one’s life to attend it, (c) likely travelling to it, (d) buying software to really dig in and use it, and (e) setting up an environment to practice it before you forget it.

    If I’m missing information about which continuing ed programs have DITA courses or courses of creating semantically-rich content, please share. I’m sure people would like to know.

  4. Larry Kunz says:

    Thanks for clarifying that, Rahel. I’m afraid you’re right: I don’t know of any continuing-education programs that are teaching DITA or structured authoring. Many programs will introduce you to the concepts of structured authoring, but not nearly at the depth it would take to make you proficient.

    It seems like a chicken-and-egg thing: we’ll see training courses that are affordable and widely available when the skills are in demand; employers will demand the skills when people have been trained.

  5. Alan Houser says:

    The Web community has clearly embraced an annotation model for metadata (e.g., write a topic, add some keywords in your CMS). This approach may seem simplistic to the techcomm community, but it presents a far, far lower barrier to entry, and a far simpler toolchain, than the embedded markup model (XML-based) that techcomm embraces. And, for the Web community, this approach generally works.

    I don’t think acceptance of DITA in the web community is contingent on the right books, right education, or even right evangelism. Web folks are justified in saying “That’s nice, but you’re telling me I need to convert all of my content to XML? That my tools will suddenly be far more complex than currently? Oh, and that my entire website may break because of a single XML syntax error?”. These are reasonable and valid objections to DITA, and I don’t see a lot of progress in making DITA adoption easier.

    We (the techcomm community) shouldn’t be evangelizing DITA for the web. We should, however, work to identify appropriate solutions to publishing problems. Rahel’s example is sound, but is certainly not dependent on DITA as a solution. Why can’t we adopt a syntax, similar to what she proposed, in something like Markdown? That solution would be a couple of lines that could be deployed as needed, instead of requiring a total rework of the authoring and publishing toolchain.

  6. Don Day says:

    We all tend to bring our preconceptions about DITA into a discussion like this. While TechComm drives some of the content on the Web, so do marketing, education, policies and procedures, news, and that massive engine of SEO for commerce, blog posts. Markdown is brilliant for casual content creation, but as an alternate syntax for a subset of HTML, it cannot assert the level of business rules-based enforcement for the structure and semantics needed for the truly intelligent content that so many content strategy specialists have been talking up lately.

    Alan Pringle of Scriptorium just posted about what he calls Adaptive DITA (, a view of DITA usage that is perfectly standards compliant for a particular publishing niche. I’ve been using DITA in a similar way for direct-to-Web content, learning what to leave out and what to embrace in order to capitalize on some of XML/SGML’s 30+ year history of problem solving. Personalized and/or conditional content? Random access to any discrete structure for reuse or alternate rendition? Potential for systematic reuse/auditing/exchange of content across the company and with partners? Side-effect free reuse of processing logic? If we throw out XML and these solutions it can offer, well… “Those who ignore history are doomed to repeat it.” And some brilliant Web developers are all over the map on non-standard reinvention of this disdained knowledge.

    Any XML vocabulary can bring some order to the peculiar ways of HTML-based publishing, but DITA comes up because the basic topic design already closely models the HTML page paradigm. “DITA for the Web” by my definition is the smart use of this affinity to effectively trick out HTML with the content intelligence it lacks otherwise, thereby enabling Web tools to do new and better things. I’ll even let you author your content in Markdown; after all, it’s just a user interface for editing, but if I can feed that content into XML middleware, interesting new things can happen, particularly if we can bring in the best of DITA as a content architecture that augments what HTML lacks.

  7. Colin Maudry says:

    Regarding the the use of DITA, I second your analysis: the problem is that the truly Web professionals know about technologies such as XHTML, Oracle or MySQL, because that’s what they have been taught at school and what they feel good with. It’s an ecosystem in which they combine these various ingredients according to their requirements.

    If you tell them DITA is a new potential ingredient, it doesn’t fit in the recipes they have always been working and the gurus in their field never mentioned DITA.

    And I think that it’s normal and Web developers don’t need to know so much about DITA.

    At NXP Semiconductors, we publish DITA content to our main Web site, We publish all the category and product descriptions (the text that displays in the middle when you browse our product tree, on the left).

    To do so, we don’t send DITA content to our Web agency, because either they wouldn’t know what to do with it, or they would do real nasty things with it.

    Instead (all this is automatic):
    1- when a value proposition is released, we generate a plain XHTML rendition in our CMS, no classes, no styling
    2- we ZIP it
    3- we FTP it to a drop in folder on our agency’s servers, and they know what to do with it

    They are very happy, because they know perfectly what to do when they receive plain XHTML: they store it and style it (classes, javascript) on the fly when someone requests the page.

    The cherry on the cake, as we say in France: the same plain XHTML is reused by our Indian team to fuel the NXP mobile application.

    My conclusion: use DITA as a source to manage and publish authored content to the Web, but use XHTML as long as you don’t have a DITA-savvy engineer at all steps of the publication flow.

    Related reading: How we’re using DITA at NXP

    Finally, regarding the education bit, I graduated in 2009 from the University of Rennes, France, as an engineer in multilingual and multimedia communication*. A major course was about topic-oriented authoring and DITA, taught by Nolwenn Kerzrého (now Componize Software).

    I agree these courses are not easy to find, but they exist!

    Thanks for the thought provoking article.

    * The course description, in French

  8. Siddhesh Deshmukh says:

    Can index entries tags, say in, new topic of XMetal written using DITA be kept as the “search area” for web-crawlers? This should have advantages of SEO as well as DITA.
    (PS. I have little experience in SEO. Something similar above looks a good strategy to have advantages of both.)

  9. rahelab says:

    This is probably a longer answer, but the bottom line is that you can do SEO tagging, and as long as the presentation layer can preserve that metadata during ingestion, it will work at the front end.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Back to Top ↑