To DITA or not to DITA: That’s a Good Question – Part 1

Well, that is one question to ask, certainly. There has been a lot of discussion during the past week on the Content Strategy Google group about the last post (DITA: Not Just for Technical Content), which leads to the question: how do you know when DITA (Darwin Information Typing Architecture) is the right standard for the job? With thanks to Noz, Rick, Mark, and Karen, who all asked provocative questions and/or made provocative declarations that led to this topic, a big thank you.

Examining this issue is sure to be of a length that will span two posts. The first aspect is about content authoring processes, and the second aspect is about production points for content manipulation, and how to decide when DITA is the right fit.

How authors work vs how systems publish

In a Web CMS, authoring and publishing is done in a single system. There is a tight relationship between authoring and publishing, that is a variation on the following theme:

Web workflow

In the minds of many, the stage where “associated content” (sometimes called components or related articles or something similar) is added, these are seen as content re-use. It is content re-use, in the Web world. Karen McGrane’s example was articles on Paleo diets. You may have several trending articles about Paleo, and you want to associate existing articles as “related articles”, or you may want to pull together a selection of articles into an ebook. (We’ll get to that scenario a little later.)

Authoring environment within a Web CMS

That’s all well and good, but each article is usually something BLOBish within the CMS. That is, there will be some form fields that look similar to this:

WCMS authoring environment

The metadata is separated into separate fields, so that the CMS can figure out how to process each component. But the “enter content here” section is a BLOB. Let’s say that one of the paragraphs is boiler plate about what paleo is. How do you re-use that paragraph from within the text blob? The problem with this, from an authoring perspective, is consistency, particularly with re-use. The blob gets copied and pasted. And then someone finds an error, so they go back and do a search. But in one of the instances, there’s been a slight change, and so the search doesn’t work. And while you’ve caught most of the instances, you haven’t caught all of them. And all it takes is one good lawsuit for that oversight to backfire on a company.

You could, theoretically, break up the body block into multiple blocks, but that becomes unwieldy for both the author to use and the system to process. So overwhelmingly, the CMS will have a single text field, with a rich text editor that allows users to add heading levels and formatting such as bold and italics.

So we have some good aspects of re-use – leveraging existing content with new content to provide a richer reading experience for the reader – and some limiting aspects of re-use – inability to re-use content at a granular level.

Separating authoring from publishing

In a DITA system, there is a significant difference to where and how authors assemble content.

CCMS workflow

First of all, the authoring is done is a separate system (the CCMS) from the publishing system. Doing this means that you can get very granular control over your content. You still have the same metadata (titles, summaries, and so on) – though usually they’re not put into form fields, but simply shown as part of the topic. The real power of re-use, though, is when you get to the body copy.

I could talk about the re-use aspect in several different ways, but I will confine myself (hard as it may be) to the example about the article on paleo. (Noz Urbina will talk about all the other good ways you can handle re-use in content later in the summer.) Where there is a statement within the third paragraph that needs to be consistent with everything else on the site, you would isolate the phrase or paragraph so that it can be re-used. You store that bit of content in a file for later reference. In DITA, this isolated term is called, logically, a content reference (or conref, for short). It gets stored with all of the other conrefs, for easy look-up. Every time you needed that statement, you would simply import that statement into the article you’re working on. If the statement gets changed, it gets changed at the source, and it can get replicated across the entire body of work. (I say “can” because there are generally controls that allow you to choose where the replication happens and where it should not happen. Authors need that kind of control.) I’ve created a crude representation here:

CCMS authoring environment

The article(s) would remain in the CCMS until the article is ready for publishing. Then, once all the content manipulation is done within the article – and could also apply to specifying the content that should be related to it, the content is “transformed” – rather like doing a software build – into its final output forms.  This final step is an important distinction between the two processes for authoring content, because the transformation assumes that you can have separate outputs for the Web, for an ebook, for various mobile formats (whether this is a best practice is a different topic of disucssion; I’m simply asserting that it can be done technologically). There could also be outputs to other flavours of XML that can be ingested into other systems for use elsewhere.

Next week, I’ll discuss more about how the level of content manipulation can contribute to a decision about when DITA is the right choice.

Share this post: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • StumbleUpon
  • email
  • Facebook
  • LinkedIn
  • TwitThis

Comments

6 Responses to “To DITA or not to DITA: That’s a Good Question – Part 1”

  1. Don Day on July 22nd, 2013 12:25 pm

    Rahel, you get it! I was on vacation during the heat of last week’s discussion, so it was not convenient to think through a thoughtful response at the time.

    I want to reiterate DITA’s usefulness for creating semantic, structured content for the Web–content that supports faceted search, associative links, managed reuse where necessary, complete-in-page authoring methodologies, seamless aggregation of collections of topics, and personalized views, potentially addressable at any level of XPath scope.

    Most public discussions about DITA content on the Web presume a build step that generates a compiled, static representation of a given set of those myriad conditions. This fits your description of a CCMS-based system, for example. I think you can still postulate a single CMS system where the XML source is user-facing and rendered on the fly in response to current conditions: browser or device type, apparent bandwidth, user profile and preferences, cookie values, and more to deliver real-time adaptive views of a request. The build-based process is crucial for content that needs to be published on managed approve-and-release schedules, but the on-demand system certainly maps directly to blogs, wikis, and directly-managed web site content. Is DITA the right choice for those use cases? Whenever the time comes for that question, I have some thoughts that bear consideration.

  2. rahelab on July 25th, 2013 8:47 am

    What’s interesting to me is that from any of the situations I’ve encountered, there’s rarely a need for “on-the-fly” content because content is one of those situations subject to workflow, review and sign-off, and so on. So “on the fly” would become “do it all in Word, and then paste into the publish view” all too quickly. Having a protected authoring environment (protected from wrong content being inadvertently published on the site, even briefly), and then deciding when to push it out is a prerequisite. But then, once the content *does* go into the web side of things, it should act like adaptive content, work within a responsive design framework, and so on.

  3. Jeff Eaton on July 26th, 2013 7:56 pm

    The challenge of an on-demand system that uses markup as its canonical storage format is the difficulty of building truly dynamic associative pages.

    Examples might include a list of the week’s most popular articles, a collection of help documents that have been rated as “unhelpful” by site visitors (and can only be seen by properly permissioned users), and so on. It’s definitely possible that I’m just unfamiliar with the mature answers to those kinds of requirements in this space, but it does seem like the answers tend to be either generate-and-cache solutions, or separate search/index servers.

    Rahel nails the really ugly problem with what I think of as the “blog-descended” web publishing world. The body field is treated as an HTML no-man’s-land, and the chunked-up discrete fields that surround it are considered the only “true” home of structured data. That approach ensures that sufficiently complex content will *always* bring reuse pain.

  4. rahelab on July 30th, 2013 8:13 am

    Jeff, I think I’ve covered some of this in the next installment. I used the metaphor of manipulate photos in Photoshop, then import into the CMS (where it can then be routed to accompany “most popular” or in aggregation pages) to demonstrate how a DITA-based CCMS works. There is a power authoring environment behind the scenes, and then the generated content output – once all the content references have been added, the variables replaced, and so on – is sucked into a Web-side publishing system for presentation. The idea that content authors work directly in the CMS is not the reality for content workflow, though it’s been told to writers for decades that this is how they *should* work. Writing is a messy, iterative process that generally involves all sorts of back-end manipulation of content pieces long before the publication date ever arrives.

  5. […] good explanation of DITA and when it’s a good […]

  6. The Battle for the Body Field | SI Services on February 26th, 2014 7:47 am

    […] a parallel approach. Rather than chunking content into fields and re-assembling it later, the XML community embraces fluid, markup-based documents. To capture meaningful structure and avoid HTML’s browser-specific presentation pitfalls, […]

Leave a Reply