There is Nothing Either Good or Bad, But Thinking Makes it So

I have recently been working on an extension to shakesbook.org that employs Ajax to retrieve a file of Shakespeare’s plays from the server.  This is similar in theory to the Simile timeline I used before and also akin because of my use of a JSON data format.  I am hoping to apply a JSON data structure to complete works of literature instead of using XML.  Jon Bosak organized Shakespeare’s works into XML as early as 1996, the files here are from 1999, but from what I can tell no one has tried to place the plays into JSON.  I wish to return to this, instead of leaving it settled in XML, for several reasons.  For one, JSON is arguably more easily readable [1] by humans and computers and JSON has familial ties to Java/JavaScript, which is the language I can work in best.   JSON also takes up much less space than XML [2] which means faster communication between the server and the webpage.  These reasons together made me choose to remake Shakespeare’s works, however my acquaintance with both the plays and the data format did not result in a quicker solution; instead I inadvertently arrived at larger questions of meaning in these plays.

The problem in using Bosak’s model is partially because of the weight and redundancy of XML and partially because of JSON’s more restricted use of object names.  This is to say that the name <Line> that Bosak uses for all of speech lines in an array is restricted to a single use of “Line”: in JSON.  In addition, I wanted to create a format that incorporated a running line number system such that a line number is part of the data format but not part of the data itself, i.e. mark a line as “line one” without inserting it into the line proper.  Bosak’s format enables counting lines but would require processing the entire document every time instead of it being immediately accessible.  These requirements resulted in me setting up arrays of nested arrays where each name could be unique to its array but able to be repeated in other arrays.  For example, the author contains the list of works, each play in that list has a selection of acts, and inside each act are scenes.  But the larger question resides in whether or not a line is inherent to the character speaking it or if a line encapsulates the speaker.

A diagram of this situation is below.

My first intuition was that the character possessed their lines since that is how it is structured in print editions.  The problem with this is that JSON requires a single variable declaration per array so that the array “Scene One” cannot contain more than one instance of “King” or “Queen”, etc.  This means that the scene’s array would only a list of characters which speak in it.  Each character then has an array of lines numbered as they appear in the scene.  For me, this format drastically emphasizes a false dichotomy between how I would proceed as a reader and how I would proceed as a programmer. It also revoked the idea of an easier parsing by humans. In contrast to this, Witmore and Hope [3] removed speech prefixes in their digital study of Shakespeare’s works thereby suggesting that characters are not fundamental to the text of the play.  This is intriguing since it seems counterintuitive to my personal experience stemming from printed matter.  Yet I feel like this makes sense at the same time.  Ordering the data like this would result in an easier recognition of variants, such as the differences between quartos, folios, or editions that might also be similarly marked up.

Still, I feel that the idea of possession and attribution in texts is not truly addressed by determining a data format but rather something to be thought on further.

1. http://www.w3schools.com/json/default.asp

2. http://www.json.org/xml.html

3. Hope, Jonathon and Michael Witmore. “The Hundredth Psalm to the Tune of ‘Green Sleeves’: Digital Approaches to Shakespeare’s Language of Genre.”Shakespeare Quarterly 61 (2010): Number 3. pp. 357.\

******* Revised JSON Format Following the Comments Below *******

Advertisement

7 Comments

Filed under Addressability, Shakespeare

7 responses to “There is Nothing Either Good or Bad, But Thinking Makes it So

  1. While line numbers etc can be easily implemented as XML attributes as specified by the TEI format, if you do want to go with JSON I would suggest writing a simple parser that creates a general set of tags marking out the structure. i.e. Instead of “Act One”: [array], I’d prefer “Act”: “One”, “Content” : [array of scenes and so forth]. Similarly “Speaker” : “Hamlet” etc rather than having the speaker name as the key itself. Ideally keys in JSON (and in most data structures) should contain only general information about structure while values should contain specific information. “act”, “scene”, “setting”, “speaker” etc. as keys rather than “act 3”, “scene one”, “heath”, “Lear” etc).

  2. Hey Anupam,

    I agree completely with you. I think that the key names in JSON should be independent from their respective values. But the problem I am running into is that in each array there can only be one item with that key name. So when I use JSLint to evaluate the code, it errors on the second “line” variable in first scene’s array. (I pasted the revised code at the bottom of the post above, with the error-free form on the left and the erring format on the right.) This is the problem that prompted the post above and, although I wish I did not have to mingle key names and values, I can not find an alternative. Do you have any thoughts on this?

    Thanks, Mike

  3. Hi Mike,

    Any type of object that you have more than one of should go in an array. The code on the right breaks because you’re repeating the “Line” : (number) key without putting it in an array. The following code parses:
    {
    “author”: “Shakespeare”,
    “play”:[
    {
    “title”:”Hamlet”,
    “text”:[
    {
    “act”:[
    {
    “number”:”one”,
    “contents”:[
    {
    “scene”:[
    {
    “number”: “one”,
    “line”: [
    {
    “number”: “1”,
    “content”: “Quidquid latinae dictum sit”
    },
    {
    “number”: “2”,
    “content”: “altam viditur.”
    }
    ]
    }
    ]
    }
    ]
    }
    ]
    }
    ]
    }
    ]
    }

    So, you need to make each line its own object by putting curly brackets around them and then wrap sets of lines together as an array. Very tedious to do by hand, I know, but easy when you generate it with a script.

    best,
    -A

    • Hey Anupam,

      That looks great, thank you for the help! I always seem to get caught up in Java/JS/JSON’s use of brackets. I will be sure to post again when I have further updates in case you are interested.

      Cheers, Mike

  4. Good article. Check the following site, it seems good to play JSON.
    http://www.w3resource.com/JSON/introduction.php

  5. Pingback: Text Encoding

  6. Pingback: Text Encoding | All Is True

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s