Content management Baseball stadium seats - a "who's on first, what's on second" conversation.

Published on January 15th, 2014 | by Rahel Bailie

3

Friends don’t let friends treat content like data

As happens from time to time, a discussion breaks out on Twitter that really deserves its own article. This is one of those topics.

The topic started out as a joking statement between me and a tech writer for a data analysis company, and then turned into one of those bizarre “who’s on first, what’s on second” conversation about whether content can be managed like data.

On the one hand, we have supporters of the “content is data” theory. They assert that, when done right, content can be managed as data sets, to combine and recombine content algorithmically to create new contexts.

On the other hand, we have the camp who consider content and data to be two separate entities. As Joe Gollner weighed in: Data items make up content like rain drops make a river – there is a composition relationship but there is an obvious difference.

It would be too easy to allow this discussion to descend into technicalities, but I would rather keep the topic accessible to the non-techies, so will try to tease out the arguments while keeping things light.

Conventional databases are built on the concept that data is based on defined “sets.” Databases are based on defined sets, and what is actually stored in the tables and databases are a set of IDs. The idea is that if a set is defined correctly, then any element in a set can be processed just like any other element in the set. Claudia Wunder, a seasoned database architect who spoke with me at length about the nature of databases, used the metaphor of spice jars. When using the contents of a jar of pepper, it doesn’t matter which granules of pepper you use, the end effect will be the same.

Processing content this way is possible, but is certainly an edge case. One of the contexts mentioned is the premise behind Narrative Science. It’s worth a little detour to understand how it works. I will use the narrative of sports results, after a conversation with Scott Abel a while back, as the example.

Content managed as data

In a database, you could have data sets that builds sentences algorithmically, as long as you stick to an extremely strict formula for the structure. Note that I am not creating a database structure, but showing this in a way that makes sense to the average reader. Let’s say we have the following pieces of data.

Day

Winning Team

Won by 11+ points

Won by 10- points

Losing Team

Earlier today

Anaheim Ducks

slaughtered

defeated

Anaheim Ducks

Yesterday

Calgary Flames

trounced

beat

Calgary Flames

Last night

Edmonton Oilers

thrashed

triumphed over

Edmonton Oilers

Los Angeles Kings

burned

edged out

Los Angeles Kings

Phoenix Coyotes

killed

won over

Phoenix Coyotes

San Jose Sharks

San Jose Sharks

Vancouver Canucks

Vancouver Canucks

Some scripts can be written to create sentences, based on a number of prescribed rules. We can assume that after a game, there will be dome data entered about the date of the match, which teams were playing, and the score.

  1. Check publishing date and time against game date and time, and choose the right value from column 1. (Even though
  2. Add a comma, a space, and the word “the”.
  3. Check the teams that played and their scores for the winning team, and choose the right value from column 2.
  4. Check the point spread and choose a value either from column 3 or column 4, without repeating the same value twice in a row.
  5. Add the word “the”.
  6. Check the teams that played and their scores for the losing team, and choose the right value from column 5.
  7. Add the word “the”.
  8. Add a comma, a space, and the phrase “with a score of”.
  9. Enter the score, with the winning score first, a hyphen, and then the losing score.
  10. Add a full stop.

What you would get, using the data that we put in, is something like this:

Earlier today, the Calgary Flames trounced the Edmonton Oilers, with a score of 12-1.

Yesterday, the Calgary Flames defeated the Edmonton Oilers, with a score of 2-1.

You have here, in effect, a large, yet finite, set of sentences that can be built over and over again. Somewhere upstream, data will be received from the scoreboard, and that triggers the calculation and publishing of results. The problem, of course, is that if anything needs to be changed in the database – for example, if a team name changes – the database administrator needs to go into the appropriate table and make the fix. This is because conventional databases have no authoring interfaces or other tools that ensure the content accuracy or quality.

As well, this algorithmic building of sentences requires an ever-increasing number of rules to make the content make sense. For example, if a new team name begins with “The”, then a new rules has to be created that prevents repetitive words. As authors can appreciate, there are so many grammatical constructions that need to be tested. The rules could become unwieldy, or there could be a new construction that doesn’t get tested, or one rule could contradict another rule.

This is definitely an edge case, but has a lot of potential for those instances when you can commoditize content for production in very formulaic ways, such as sports results, or other generators.

A number of years ago, a CTO of a content management system demonstrated what he hoped would be a way for factory procedures to be automatically constructed from the software that ran industrial machinery. Using the same principles as shown in the table, the procedures came out something like this:

Machine operator send pipe to next station.
Machine operator receive pipe.
Machine operator punch hole.
Machine operator release pipe.
Supervisor receive pipe.
Supervisor inspect pipe.

Technically, the data made sense. However, having procedures with profound literacy challenges was distracting, and the lack of nuance raised a very real threat of introducing inaccuracies because of the need for human interpretation.

Most often, content is not that cut and dried. In fact, it is a lot less predictable. So now that we’ve covered what happens when content is managed as data, we need to look at the other side: managing content in a content repository, which I will cover in a couple of weeks.


Share this post:
These icons link to social bookmarking sites where readers can share and discover new web pages.

  • del.icio.us
  • StumbleUpon
  • email
  • Facebook
  • LinkedIn
  • TwitThis

Tags: , ,


About the Author

Rahel Anne Bailie is a synthesizer of content strategy, requirements analysis, information architecture, and content management to increase the ROI of content. She has consulted for clients in a range of industries, and on several continents, whose aim is to better leverage their content as business assets. Founder of Intentional Design, she is now the Chief Knowledge Officer of London-based Scroll. She is a Fellow of the Society for Technical Communication, she has worked in the content business for over two decades. She is co-author of Content Strategy: Connecting the dots between business, brand, and benefits, and co-editor of The Language of Content Strategy, and is working on her third content strategy book,



3 Responses to Friends don’t let friends treat content like data

  1. OK. By the end of this I got where you were coming from, but I really didn’t get that from the title and opening sentences.

    My first thought was, “of course we want to treat content like data! We want to be able to query it, transform it, migrate it, sort it, filter it, etc. etc.” By the end I realise you mean “treat like data” in the sense of generate it in the first place as you would generate like data; with algorithms and cookie-cutter blocks. Yes, most content creation processes that ape data generation processes create pretty wobbly content.

    So, agreed with the conclusion, but I found the wording of the premise threw me way off!

  2. Chris Parker says:

    We’ll have to agree to disagree on this one. A lot of the technical constraints you mentioned are pretty ably defeated with a sufficiently savvy database designer. As one of my coworkers likes to say, “Everything is technically possible, given enough time and money.”

    In fact, most modern database languages (SQL for sure, but Datalog, SQL subsets like DDL and DML, OQL, XQuery, etc.) have become enormously dynamic. I mean the semantics on these things allow for some very interesting, very complicated applications.

    Moreover, huge gains have been made in data analytics in recent years, giving us an unprecedented ability to visualize and act on data *about* data. This, in turn, allows us to more efficiently and readily respond to any changes that might be necessary to whatever repository we’re using to hold our data. I mean, content.

    And this is before we even start talking about gains made in natural language processing, deep language processing, etc.

    I’d even go so far as to suggest that the sheer amount of content being generated on a daily basis (Clive Thompson estimates we’re generating the entire United States Library of Congress *daily*) is going to require we start dealing with content as data.

    Anyway, obviously the example presented is a little too granular to be applied in any useful way, but I’d stop short of saying that content shouldn’t be treated like sets in a database.

  3. Ellis Pratt says:

    Numbers span the divide between content and data, and we normally see sports results presented as numbers. If I see Sheffield Wednesday 5-0 Nottingham Forest, I know that Sheffield Wednesday trounced Nottingham Forest. Converting the numbers into a narrative doesn’t tell me anything more (unless perhaps I was listening to the result on the radio). The sports reports generally tell me if the final result truly reflects the game.

    Icons (and images) can also give us content and be used in a structured way, and this is probably a more effective way to automate the creation of content. Robert Horn studied the use of arrows as a communication device, following his development of Information Mapping (ftp://lienlabnas.ym.edu.tw/Public/MO/Arrows.pdf and http://www.stanford.edu/~rhorn/a/recent/artclNSFVisualLangv.pdf).

    It would probably be much easier to use arrows and icons to present industrial machinery procedures than simplified sentences.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Back to Top ↑