I think this is the last old post I had to write. This is focused on my final project for Prof. Witmore’s class in May:
Over the course of a semester, Professor Witmore introduced our class to writings about relational patterns and networks, then subsequently applied them to the study of literature. We read books such as Graham Harman’s “Prince of Networks: Bruno Latour and Metaphysics”, Franco Moretti’s “Graphs, Maps, Trees: Abstract Models for a Literary History”, and Alexander, Ishikawa, and Silverstein’s “A Pattern Language: Towns, Buildings, Construction” which slowly coalesced in my mind and led into my final project; a Java program designed to help render Docuscope quality text from a plain or formatted transcription.
The program TextSnip allows the user to identify and remove textual features such as speech prefixes, stage directions, and anomalies in a document. As Docuscope currently stands, such features could potentially skew results,  so it was an issue that needed remedying. In addition, this would help in preparing new texts to work on in Docuscope. However, building such a program does not really fulfill the requirements of a literature class; the issue needed an argument or proposition in addition to such a tool.
I eventually decided to analyze the differences that Witmore and Hope had found due to these textual features, but on a different scale. I began the entire process of curating texts for Docuscope anew and sourced the plays from a transcription of Shakespeare’s First Folio off of EEBO (Early English Books Online) instead of the Moby Shakespeare. This consequently created a new problem in modernizing the works to get them up to Docuscope’s specs. I had intended for TextSnip to be able to modernize texts too, but soon realized the intensity of such a project. I opted instead to use Allistair Baron’s VARD II which had the advantage of years of dedicated programming and an established user base. http://www.comp.lancs.ac.uk/~barona/vard2/
Throughout the project, I had wanted to run the texts through a prosthetic and normalized process as much as possible to have consistency in the final texts. To do this, I lowered the tolerance level in VARD to 0% which meant that it did not require any input from the user in making modernization decisions. Unfortunately this resulted in some perceivable errors, such as mistaking “Ephesus” for “Emphasis”, which were minor in degree and number. However, this also meant that I put the texts through TextSnip first, so that the items it would find would not be mistakenly modernized and pass undetected through TextSnip.
After modernized and culled, the new texts were run through Docuscope and JMP just like Witmore and Hope’s texts were. The results were very intriguing.
For one, different results arose in individual dendrograms between the EEBO texts and the old texts. When compared together, almost every play chose to cluster with their other version rather than by modernization technique. This is crucial because VARD processed texts resulted in a higher average of Docuscope tags and, when isolated, these same texts produced larger groupings by genre. 
It does not seem logical that two versions of the same work would differentiate themselves individually, but attract collectively. By doing this, a network of relationships between the smaller units in a text and the larger, thematic grouping is really called into question. These elements were what Moretti argued were the important pieces in literature. Yet essentially there are two results present.
To resolve where this variation is coming from, I would argue that TextSnip is only automating a predetermined process so instead we should analyze the distinction between VARD and Martin Mueller’s modernization techniques to resolve this. I do not wish to shy away from any blame for my own program, but these results revolving around discrepancies between individual words in the versions is truly interesting.
In light of these findings, I would like to continue thinking on this: If we could isolate single words that influence changes on a larger scale, would that signify a fault in the program or a successful discovery? And what exactly are the findings above telling us about literature?
If you would like a visual walk through, the PDF from the class’s poster session is underneath.
And more information about TextSnip can be found in my paper below.
A more detailed spreadsheet contained numbers counted is also below
1. Witmore, Michael. “Shakespeare Out of Place?” Wine Dark Sea. 3 September 2010. 16 November 2011. <http://winedarksea.org/?p=801>.
2. Genre as defined by adherence to Heminges and Condell’s divisions in the First Folio.