Genre Dependence on Character Ideolects

And yet, we know that when human beings are involved, all findings are provisional. Odd.

To extrapolate on Michael Witmore’s comments in his previous post, it is indeed odd how provisional our results are.  Case in point: I have been examining what John Burrows and Hugh Craig have called the ideolect of characters in connection with the plays in which these characters’ lines occur.  I stumbled upon this idea while looking at Shakespeare’s Romeo and Juliet and asking how the characters of Romeo and Juliet help steer this play towards tragedy or comedy. (This was done for a panel I presented on with Witmore  and William Blake (Carnegie Mellon) at a digital salon hosted in at UW-Madison.  Prof. Witmore and Bill Blake are themselves working on an analysis of Hamlet without the prince, and the 1 Henry plays/Merry Wives of Windsor without Falstaff: we’re all interested in this kind of “subtraction experiment.”  My initial findings are in the gallery below.

All pictures are JMP generated Hierarchical Clusters, using Ward’s method, and Frequency Counts from Docuscope, with a best guess analysis and distance scale present in the dendrogram.

The first image is of Romeo and Juliet with and without Romeo and Juliet.   The original text file [followed by .rev], that same text file without either Romeo or Juliet’s lines [followed by w-o in the label], all of Romeo’s lines in one text file, and all of Juliet’s lines are all highlight with an asterisks in the image.   There are also text files for the Act divisions, with and out Romeo and Juliet’s lines, as well as individual character’s lines with Act and Scene divisions.  Despite a lot of other chatter present in this picture, is focus is spent on the four, starred elements we can see that the play with and without the main characters clusters closer together (more easily seen using the distance scale) than anything else in the diagram.  This is curious, since the play is named after those two particular characters.  In addition, Romeo (as defined by the sum of his lines) is on a twig of the tree that is shared with the two versions of the play, while Juliet is located on a different partition of the tree.  Juliet’s relegation is also odd, even recognizing gender biases present at the time, as it wouldn’t seem plausible for those partialities to manifest themselves within the deeper texture of the play.  This kind of diagram creates the question of how the different pieces of these plays really fit together.  For I use the word “texture” for the lack of a better term.  It could be that Romeo’s lines were written differently than Juliet within the play, for the effect of either promoting Juliet through lexical individuality or demoting her by making her an outlier at the most minute linguistic level.  Or, working off of previous genre distinctions, it could be that Romeo is an extremely tragic character that matches the overlying context of his respective tragedy.  But would that mean that Juliet is a less tragic figure, or rather a comic figure by comparison?  And what does it mean that all four highlighted elements are on the deepest and densest “roots” of the tree?  Does the slight difference between the characters even allow for these conclusions, or are our conclusions not going far enough?                       O

The second print has the same four asterisks as the first; however they have now been introduced to the rest of Shakespeare corpus while simultaneously eliminating the other components from the previous diagram.  Using the same method of analysis, this new diagram reveals more conclusive evidence towards how these plays fit together.  Like Michael Witmore’s initial diagram, many years ago, Shakespeare’s corpus tends to cluster together in groups that have been historically defined with genre distinctions.  Using this prior knowledge and insight, this new analysis provides significant results as it clusters Juliet with two other comedies, Comedy of Errors and Taming of the Shrew.  In addition, the next level of the tree connects this trio with a larger group comprised of comedies (excluding Othello).  A holistic look at this “comic” group can be seen with the red color of the labels on the left.  Surprisingly enough, both versions of Romeo and Juliet are still the tightest knit pair of the analysis and Romeo clusters with them, together with Midsummer’s Night’s Dream in between.  But since Romeo is the farthest outlier of this group, different implications may come across dependent on our reading of the first diagram.  If texture is truly correct, that means that Juliet is very distinctly different from the main play.  However that also means that Romeo is different when compared with Midsummer’s Night’s Dream.  But it would seem strange that both of the main characters’ lines would be so drastically different from the rest of a play that is named after them.  If genre distinctions are utilized, then Juliet would appear indeed to be a largely “comic” figure but this analysis also poses problems as Romeo is separated from the main plays by a comedy.  And concurrently, the question arises of why the plays as a whole are located within the half of the tree comprised of the comedies rather than the half with tragedies and histories.  Though through this lens, Romeo, the farthest outlier on the farthest outlying cluster, could be noted as the closest object to the other half.  While this could recognize Romeo’s “tragic” nature, this again begs the question of whether or not the distinction present is significant in any other application besides this diagram.                             D

After my initial foray into this kind of slicing and dicing of plays, I chose to move to the other titularly binary plays in Shakespeare’s corpus; i.e. Antony and Cleopatra and Troilus and Cressida.  My first hypothesis revolved around the notion that Juliet has been written as a “comic” figure, which explains why she moves around the diagram but Romeo remains close to his original play.  My next set of findings supported and negated my hypothesis respectively as they came to be.  (The method of creation remains the same from the first set of portraits)

Both the top and bottom pictures reveal an experimental method of including, not only the two title characters, but also a third character that may easily be considered the third wheel of the play’s dynamics.  I ended up choosing the Nurse in Romeo and Juliet, Charmian in Antony and Cleopatra, and Pandarus in Troilus and Cressida.   The first portrait illustrates a continuation of what I had noted in my last findings.  In it, Antony clusters with Antony and Cleopatra with and without the main characters lines in addition to Romeo and the two versions of his play.  Cleopatra and Juliet also cluster together in a completely different light blue twig, supporting the observation made previously with Romeo and Juliet solus.  Further evidence within the novel “third character”, in the form of Charmian as she loosely clusters with Juliet and Cleopatra, expands the possible hypothesis to all of the women’s lines were written differently than men’s.  This doesn’t seem like too much of a stretch, however the Nurse puts a damper on this quickly as one notes her position in the tree.  While she is an outlier, she is an outlier away from the other women and is technically closer to Romeo than Juliet.  This contorts my hypothesis as the Nurse has always been described to me as a comic figure, at least within my own education and readings.  Would that external knowledge invert my thoughts as they are applied on this diagram, meaning that Romeo is also comic, and make Romeo and Juliet a comedy overall?  This turn of course seems less likely, however the Nurse’s placement within the tree certainly makes a difficult situation out of my standing hypothesis.  In addition, the women appear to be closer to the center of the diagram than the men, excluding the Nurse.  Does that signify a more uniform authorial style within the women with the possible use of stock characters, or that the women provided the stability within the play while the men went off on dramaturgical tangents?                      D

The final portrait is the culmination of the work I have been able to do so far.  It is all three of Shakespeare’s plays that possess two character’s names in the title.  With this picture, I didn’t change any variables except for adding four new data points from Troilus and Cressida.  These four data points create a new flux in the data that hasn’t truly been seen before.  Romeo has been moved away from his respective play and has now been conjoined with Juliet.  However, this diagram clusters Antony and Troilus with their respective plays well.  All three of the “third wheel” characters have been grouped nicely together as well.  Cleopatra and Cressida have also been inserted into a group of comedies such as Comedy of Errors and Taming of the Shrew.  This data reveals a lot of potential in the way that it divvies up each of the targeted groups.  However, it ultimately resolves into a problematic coupling between Romeo and Juliet in the sense that their respective play is very nearly on the other end of the diagram.  This could validate the earlier suspicion found out by the Nurse, as Romeo could truly be a comic character.  But now the entire theory becomes null, despite garnering two out of three of the leading males present.  Another factor that becomes apparent is the fact that the data from Antony and Cleopatra didn’t split up Romeo from Romeo and Juliet, but these new data points do.  This could be because of the fact that Troilus and Cressida is considered a comedy, while Romeo and Juliet and Antony and Cleopatra are thought upon as tragedies.  But would the data points change as much if only tragedies are used; even tragedies without a nominal binary in the title?  Does this diagram really need any more data, or should we only be looking at diagrams like the first which isolates the play in question?  And should other authors be utilized to sift through this data?  In two other diagrams below, I applied some of Shakespeare’s theatrical contemporaries to the data from the initial experiment with Romeo and Juliet only.

The first diagram is of Shakespeare’s, Jonson’s, Middleton’s, and Dekker’s respective corpora together with the data from Romeo and Juliet.  Ignoring a large heap of chatter, the topmost portion of the diagram is asterisked with interest.  With this, a large number of Dekker’s comedies and one of his masques separates Romeo and Juliet from Romeo and Juliet.  Moving on, the next diagram only had Dekker and Shakespeare and we can see that the sheer number of comedies that separates Romeo from the plays has decreased.  However, Juliet and her surroundings remain the same across both pictures.  In addition, Dekker’s play that starts with a “ma-” represents the masque that is still present.  This method of sorting almost seems to create more headaches than it solves as now a whole new slew of questions arise from the multiple variables that Dekker’s plays bring into the mix.  Michael Witmore’s statement appears to hold still hold very true:                                And yet, we know that when human beings are involved, all findings are provisional. Odd.

While an attempt at possible conclusions seems rather futile, I will at least speak to what I want to do next:  First and foremost, I would like analyze a larger selection of Shakespeare’s tragedies and his corpus in general to see if any remnants of my original hypothesis are visible in a later stage or analysis.  This larger selection of data points could help me determine whether or not the question of genre plays into the characters lines, i.e. the data from Troilus and Cressida rearranging the data from Romeo and Juliet.  I would also like to do this as I have read many articles from different literary critics regarding a kind of division between “comic” women and “tragic” men in Shakespeare and particularly in his tragedies.  If literary critics could see these traits before computers even existed, then this method of diagramming the data may be catching onto something indeed.  In addition, I wish to plyingly question both the methods and data that I am using.  For example, a different kind of multivariate analysis might be appropriate to truly isolate and extract the particular features that are being investigated.  A different selection of data may also prove to be the trick at answering some of these questions.  Take all of the conclusions you may have arrived at through this article, for instance, and now apply the data present in this excel spreadsheet that has the same electronic text files run through a Microsoft word count.

At least for me, it is shocking that Romeo and Juliet possess nearly forty percent of all of the words in the play between them but the two versions of the play still cluster together closely in all of the diagrams above.  This strictly numerical data allows for a looser interpretation, but also might not have such a profound impact without the JMP diagrams as well.  Seemingly endless possibilities, questions, and subdivisions to be examined seem to only expand with every direction taken.  However I hope that what I have provided shows that while my results are odd, they also possess recycled elements worth noting, especially where human beings are concerned.


