Published on July 22nd, 2013 | by Rahel Bailie6
To DITA or not to DITA: That’s a Good Question – Part 1
Well, that is one question to ask, certainly. There has been a lot of discussion during the past week on the Content Strategy Google group about the last post (DITA: Not Just for Technical Content), which leads to the question: how do you know when DITA (Darwin Information Typing Architecture) is the right standard for the job? With thanks to Noz, Rick, Mark, and Karen, who all asked provocative questions and/or made provocative declarations that led to this topic, a big thank you.
Examining this issue is sure to be of a length that will span two posts. The first aspect is about content authoring processes, and the second aspect is about production points for content manipulation, and how to decide when DITA is the right fit.
How authors work vs how systems publish
In a Web CMS, authoring and publishing is done in a single system. There is a tight relationship between authoring and publishing, that is a variation on the following theme:
In the minds of many, the stage where “associated content” (sometimes called components or related articles or something similar) is added, these are seen as content re-use. It is content re-use, in the Web world. Karen McGrane’s example was articles on Paleo diets. You may have several trending articles about Paleo, and you want to associate existing articles as “related articles”, or you may want to pull together a selection of articles into an ebook. (We’ll get to that scenario a little later.)
Authoring environment within a Web CMS
That’s all well and good, but each article is usually something BLOBish within the CMS. That is, there will be some form fields that look similar to this:
The metadata is separated into separate fields, so that the CMS can figure out how to process each component. But the “enter content here” section is a BLOB. Let’s say that one of the paragraphs is boiler plate about what paleo is. How do you re-use that paragraph from within the text blob? The problem with this, from an authoring perspective, is consistency, particularly with re-use. The blob gets copied and pasted. And then someone finds an error, so they go back and do a search. But in one of the instances, there’s been a slight change, and so the search doesn’t work. And while you’ve caught most of the instances, you haven’t caught all of them. And all it takes is one good lawsuit for that oversight to backfire on a company.
You could, theoretically, break up the body block into multiple blocks, but that becomes unwieldy for both the author to use and the system to process. So overwhelmingly, the CMS will have a single text field, with a rich text editor that allows users to add heading levels and formatting such as bold and italics.
So we have some good aspects of re-use – leveraging existing content with new content to provide a richer reading experience for the reader – and some limiting aspects of re-use – inability to re-use content at a granular level.
Separating authoring from publishing
In a DITA system, there is a significant difference to where and how authors assemble content.
First of all, the authoring is done is a separate system (the CCMS) from the publishing system. Doing this means that you can get very granular control over your content. You still have the same metadata (titles, summaries, and so on) – though usually they’re not put into form fields, but simply shown as part of the topic. The real power of re-use, though, is when you get to the body copy.
I could talk about the re-use aspect in several different ways, but I will confine myself (hard as it may be) to the example about the article on paleo. (Noz Urbina will talk about all the other good ways you can handle re-use in content later in the summer.) Where there is a statement within the third paragraph that needs to be consistent with everything else on the site, you would isolate the phrase or paragraph so that it can be re-used. You store that bit of content in a file for later reference. In DITA, this isolated term is called, logically, a content reference (or conref, for short). It gets stored with all of the other conrefs, for easy look-up. Every time you needed that statement, you would simply import that statement into the article you’re working on. If the statement gets changed, it gets changed at the source, and it can get replicated across the entire body of work. (I say “can” because there are generally controls that allow you to choose where the replication happens and where it should not happen. Authors need that kind of control.) I’ve created a crude representation here:
The article(s) would remain in the CCMS until the article is ready for publishing. Then, once all the content manipulation is done within the article – and could also apply to specifying the content that should be related to it, the content is “transformed” – rather like doing a software build – into its final output forms. This final step is an important distinction between the two processes for authoring content, because the transformation assumes that you can have separate outputs for the Web, for an ebook, for various mobile formats (whether this is a best practice is a different topic of disucssion; I’m simply asserting that it can be done technologically). There could also be outputs to other flavours of XML that can be ingested into other systems for use elsewhere.
Next week, I’ll discuss more about how the level of content manipulation can contribute to a decision about when DITA is the right choice.