The Rationale for JSON over TEI/XML in Literary Markup

This is a republishing of an article I wrote for the Strange Bedfellows project.  The original url is here.

 Clutter and confusion are failures of design, not attributes of information.[1]

I recently wondered, “what is TEI used for?”   I have known about TEI for about three years but somehow its purpose slipped my mind. (There is a basic introduction to TEI here.) All I could remember was that it took a lot of time.  When I looked back at my dissertation (where I had wrote a little about it in relation to the Rossetti Archive), I could not remember much more about it than it was used as a structuring element for the massive amounts of information housed in the Rossetti Archive.  After a quick Google search, asking the same question, I came up with an answer.

TEI is a structuring element but only in the semantic sense.  For example, TEI can be used to mark up a literary text and represent line structures in a poem, chapters in a book, and almost anything else that you could think of.  In contrast, HTML uses similar looking tags but delineates building blocks of a webpage’s structure with these tags.  One argument is that HTML is linear whereas TEI is hierarchical so that, in essence, they represent two different types of information.[2]  The semantic information of a webpage’s content may be further subdivided into four kinds of information:

-physical: e.g. the binding of a codex as opposite to its gatherings, the erasures of some words on a version of a text, the presence of colours in a text etc.

-structural: e.g. divisions of a text into chapters, columns, sections, stanzas etc.

-implicit or extra textual: e.g. information on the text’s author and date, or on the provenance of a textual source and the archival collection where is physically held etc.

-semantic or interpretative: e.g. editorial interventions, identification of variants, abbreviations, readings, names of persons and places that occur in the text etc.

TEI is proffered as a solution to organizing the “content” half of a webpage because it can deal with every side of the multifaceted object that is a literary text.  However, TEI standards seem like playing a game that does not work unless everyone follows the rules.  For instance, if I wanted to use a tag that is not part of the normal distribution it will not be supported by others without the definition that I have.  TEI appears to be used for controlling messy data sets or complicated sources of information, like in the Rossetti Archive.  However, as Edward Tufte notes above, the complicated nature of information, particularly literary information, is not an attribute of that information but of the design surrounding it.  Yet TEI does not promote a feasibly “readable” text.  Instead, levels of granularity are generally enforced by time and monetary constraints, not textual information.  Actually, the fact that you have to pay to be part of the club seems especially odd in a time when academics are usually clamoring for “open-source”, i.e. free, solutions.

JSON is a format that stands in relative opposition to TEI.  Since TEI can be reduced to XML[3], which is arguably a very strong format across many kinds of computing, it is no wonder that an XML based format would be a natural choice for marking up texts.  However, JSON can trace its roots back to Javascript and, ultimately, conventions used by the entire C-family of programming languages.[4] In this, JSON is arguably more universal than XML.

One of the reasons often used for the adoption of TEI is that TEI is platform and software independent based on its XML heritage, however JSON does this as well.  In fact, JSON is just as stable as XML and it also supports unicode.  Another argument for TEI is that “TEI markup supports a wide range of useful functions including online publication, searching, text analysis, and conversion into other formats. There are many tools for manipulating, presenting and querying XML documents available”.[5] JSON can perform the same functions in a smaller format and isfaster and easier to parse, both by humans and computers.[6] It may be argued that XML can achieve similar file sizes if it is seriously compressed, like with gzip,[7] and that TEI is easier and more logical to use in a literary markup system.  Yet I would argue that the only feature TEI/XML has that JSON does not is that it can be styled with CSS immediately for a webpage, compared to using PHP to import JSON.

Due to the similarities of TEI and JSON, why is there still an emphasis on using TEI?  JSON is able to be parsed in the same way regardless of which tags are used, is arguably better for reading by humans and computers, and it is free.  I am not sure why JSON has not become as popular as TEI but I think that the use of TEI should be reconsidered for literary markup.  Much time and money could be saved if needs can be met by JSON which is a strong argument for any scholar on a budget.

[1] Tufte, Edward. Envisioning Information. Chesire: Graphics Press, 1990. 51.







1 Comment

Filed under Addressability

One response to “The Rationale for JSON over TEI/XML in Literary Markup

  1. Jacob

    TEI is also free to use. You only have to pay to vote which most people don’t care about. I would love to be able to use JSON instead of TEI. It would make my workflow a lot simpler. However, I’m not sure it can be done. TEI has established approaches for dealing with virtually any kind of textual information. I’m not sure how you would do the same things in JSON. I would love to hear any ideas for marking up manuscripts, for example.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s