This is a republishing of an article I wrote for the Strange Bedfellows project. The original url is here.
I watched a TED talk by Aaron Koblin about “artfully visualizing our humanity” but I think his talk could also serve as a “how to” guide for the best way of thinking about the digital humanities. In it, Koblin speaks mainly about accessing large data sets and crowd-sourcing, both of which are topics that might not seem applicable to the humanities. But, one of the things I remember about Professor Michael Witmore’s work is one of our conversations regarding statistics. Professor Witmore was relaying to me a conversation he had with a professor of statistics about trying to find the best method for “artfully visualizing” Shakespeare’s corpus and literature in general. The statistics the other professor recommended for Witmore’s project were similar in kind to the methods used on the human genome project but Witmore’s study of literature was infinitely more complex than DNA. That literature could contain so many variables is, at first glance, hard to conceive of but becomes easier as one considers that individual words, even common “filler” words, are counted as well as phrases, clauses, etc. In this respect, even a contained data set like Shakespeare’s corpus proved to be much hard to visualize clearly and meaningfully than scientific studies. The “largeness” of the humanities then, is a question of scope; a question that has had trouble being answered as scholarship moves into the digital realm.
At the beginning of Koblin’s talk, he mentions a quote by a media theorist who states that:
the 19th century culture was defined by the novel,
the 20th century culture by cinema,
and the culture of the 21st century will be defined by the interface [1]
Time and time again, as much of the problem of explaining Witmore’s work to others was explaining its visualization as the methods behind it. I worked with Professor Witmore and his project for about two years and I mainly used dendrograms, see an example here, to illustrate the relationship between texts and parts of texts. Unfortunately, by using the multivariate statistics that scientists use we also ended up using the visualizations of scientists, hence why explaining them to our humanities counterparts was so difficult. With this, and for many other reasons, I am firm in believing the truth of the quote Koblin mentions. The quality of the interface or visualization of the data is as important as the quality of the data itself. The visualization conveys information about the data and so it should be pointed to convey the information about the data the author is trying to highlight.
I think much of the problem with the digital humanities has been both a “field of dreams” fallacy, where “if you build it, they will come”, and a semi-blind rush to digitization. The book has been the accepted form of knowledge transfer for the last four hundred odd years and people do not know what to do without it. For instance, Early English Books Online (EEBO) allows searching of their collection of digitized manuscripts and early modern books by keyword, author, title, subject, bibliographic number, and year range; in essence, it is a digitized call number system. But what if you want to compare copies of a certain work or analyze the entire data set by author or year published? EEBO provides an incredible range of resources, of which I myself have used extensively. However, the project’s focus is on digitization/preservation whereas mine was usually on comparison. I think too that ideas of the interface often come into play with your classification as a librarian or a scholar. From a librarian’s point of view, preservation and digitization is often the goal, mandated at times, whereas the scholar’s goal is knowledge creation, synthesis, and dissemination. Neither perspective is more valuable than the other, however without both goals in mind the book as an interface will not be able to be replaced.
There has been effort on the scholarly side of things to work on knowledge interfaces. For example, Professor Alan Galey’s work provides interesting models for viewing the text of literature. However, without a community base (and/or data to implement with) these models will remain just that. Ideas like Galey’s are instrumental to shifting views of digital humanities work in literary studies away from creating a hyperlink archive (i.e., a book) or digitizing a text for the sake of digitizing it. One example of this reiteration of the book in digital form is in the Internet Shakespeare editions. A play like As You Like It is patched together with the option of two different texts (like a synoptic or multi-text edition) however the look and feel of the website/interface is that of a normal book. I believe that pursuing this is not truly adapting to the “digital” of the digital humanities and that the emphasis on the interface and visualization of information is going to be crucial for future scholarship.
[1] http://www.ted.com/talks/aaron_koblin.html (0:26)