To Be or Not To Be a Romance

Whilst going over older materials I have stored, I came across an article by Witmore and Hope dealing with the Romances or Late Plays of Shakespeare.  It was a journal in Early Modern Tragicomedy (2007), the twenty-second installment of the “Studies in Renaissance Literature” series.   In it, Witmore and Hope write that John Fletcher’s “definition of genre not only specifies what must be in a play to qualify it for membership in a genre, but also what it must lack”.  Fletcher’s postulation of what must be present, but also not present, to belong to a genre is similar to what I tested in my last post by adding and removing the characters that plays were named after.  In review, it was a mixed bag of results leading towards both the idiolects of the play’s main characters and the texture of plays themselves as the primary reasons for clustering.  However Witmore and Hope’s article sparked a new thought process in my head.  Since readers and critics as far back in time as Fletcher have noted the peculiar differences between the Romances and the rest of Shakespeare’s corpus, does that mean by following Fletcher’s formula that adding or subtracting characters will affect a play’s genre classification?  One of Witmore’s earliest views from Docuscope was a simplified dendrogram noting a genre specific clustering result from an unbiased word tagging program.  I have since noticed particular genre related movement in the Romeo and Juliet post, but I am now trying to combine a three hundred year-old literary critic’s mind with a modern machine’s processes.  In sum, I wish to determine if the idiolects of the main characters assist the Romances in clustering differently from the rest of Shakespeare’s corpus and whether or not the isolation of these particular characters’ lines from the whole play reacts with the genre specific clustering already present.

This post also differs slightly from its predecessor despite the intrinsic similarities in the way that it was shaped.  I have now modified my criteria for deciding which characters to isolate.  In doing so, I have referred to the Pelican Shakespeare’s (1969) chart on page 31 of its General Introduction.  I isolated the characters who spoke the most lines in each play, rather than solely relying on the character’s name being in the title.  This selection process also brings up the other significant difference between this post and the last; I am only isolating one character from each play instead of two.  I stayed with the Pelican’s definition of Romances or Late Plays as well, only selecting The Winter’s Tale, Cymbeline, Pericles, and The Tempest and leaving out, rather than adding, Two Noble Kinsmen like the Riverside II suggests.   In the end, I separated Leontes, Imogen, Pericles, and Prospero from their respective plays.

While pursuing this course of action, I happened to realize that I did not possess Shakespeare’s Pericles in my database.  So I culled an edition of my own from the text file available on EEBO (Electronic English Books Online); overall the edition I spliced into workability is designed to match the Riverside Shakespeare II in most spellings and textual disputes.  I also took some editorial liberty in creating this text so that it differs from the Riverside in much the same ways that the original files I use do; i.e. I removed the stage directions, speaker’s names, and other bracketed information inserted by previous editors.  Due to fact that I did this without trying to spend a whole day on it, it is undoubtedly modernized more-so than intended after the speed and method of this process.  I did this task in part because I didn’t have the file and in part because I wanted the experience of doing it.  But after reviewing my work on this single file, I have deemed it fair that I should make this file public so that others don’t have to repeat this like I did.  You may access the file on the link below.


As far as the research results go, below is a picture of my edition of Pericles together with the other three Romances from the original data set.   It is immediately the farthest outlier, which was problematic to safely introducing my edition to the rest of the corpus due to its questionable origins.

But after bringing in more data points, like the other version of the plays and the isolated lines of the selected characters, it quickly appeared to me that Pericles leveled out in terms of stability.

*Both of these diagrams were produced using Hierarchical Clustering from JMP, using Ward’s method, and Frequency Counts from Docuscope, with a best guess analysis and distance scale present in the dendrogram.  These data sets are analyzed at the cluster, or highest level, of Docuscope’s results.  All others following are the same unless otherwise marked.

*The labels are read as thus: Pericles rev = original text file (rev), Pericles w-o = original text file without Pericles lines (w-o), and Pericles = all of character’s lines in one file (     )

Be sure to note that Cymbeline and Winter’s Tale cluster in the first image when only surrounded by whole plays, but Pericles and Cymbeline become a pair when the more diverse data set enters.  Leontes and Prospero also cluster with their respective plays when compared to Pericles and Imogen.

To further settle any lingering doubt about my edition of Pericles, since it does have the farthest statistical distance between the two versions comparatively (distance from name/stem to connection crossing on diagram), I attempted to further validate my claim to stability by comparing the strict numerical data against my previous post as well as its contemporary environment.  What I calculated is below.

I noticed that almost all of the percentage values translate directly between words and lines.  Only Leontes and Imogen vary slightly between their two percentages.  However, I also noted that the range of percentages for a single character to fall in is roughly between sixteen and thirty percent.  In that, Pericles does fall within the established range from the original data set.  But these findings also support my choice to use the Pelican’s data to choose the character from each play as the “main” character conveniently happens to speak the most in the play overall.

It is interesting to note that in Troilus and Cressida, neither of them has a majority of lines yet their names are in the play’s title.  This also occurs with Cymbeline.  Despite the title, it is Imogen who claims the largest percentage of the play.  These reversals between title and “main” character are interesting indeed, but for pondering another day.

Returning to the visualizations, below is Shakespeare’s corpus incorporating thirty-seven plays and with the plays in question colorfully highlighted.

As you might notice by referring back to the first diagram, the introduction of new data points preserves the cluster of Winter’s Tale and Cymbeline.  In fact, comparatively the two cluster closer together now than in the first diagram.  Pericles and Tempest become closer as well but don’t form a complete cluster themselves.  However this diagram is problematic in more ways than what meet the eye.  Firstly, while adeptly isolating the Histories, the other genre groups have been skewed and commixed.  If you look at the left and topmost diagram in my previous post, Epic, you will see more clearly that three groups were defined initially rather than just one.    As mentioned, this poses problems as now there is no clearly seen group of Romances and therefore nothing to test our subtraction hypothesis on.  Secondly, this debunks the security of Pericles in the set and begs the question of whether or not it is my editing or the text of the play that has so drastically changed the data set arrangements.

In Shakespeare’s Modern Collaborators, an installation of the Shakespeare Now series, Prof. Lukas Erne argues that, in a case from Tempest, modern editing can have significant impacts on a Shakespearean play.  For instance, in Act 2.1.85-97

“Gonzalo’s ‘Ay’ renders in modernized spelling the First Folio’s ‘I’ (TLN 766), and the Riverside edition plausibly glosses the word as ‘a sarcastic expression of approbation’.  It seems equally possible, however, that Gonzalo is starting a sentence with the first person pronoun but is immediately interrupted by Antonio.   If so, Antonio’s self-serving rudeness would seem very much in character, depriving honest Gonzalo of speech as he had earlier deprived Prospero of his dukedom.  As the present example suggests, editorial decisions in modernizing the spelling can easily affect characterization” (17).

This casts serious doubts on my own practices in making Pericles, namely the speed and generalizations of which it was done.  However many of the assumptions and decisions that my posts are made upon consider small instances like this negligible when compared to the rest of the data.  In addition, seeing that Docuscope only categorizes roughly seventy-five percent of its input, it is even probable that small differentiations like the one Prof. Erne points out might not even be noticed by Docuscope and included in its output to JMP.  But even if my sloppy methods of modernization tampered with the results, we are still left with relatively unworkable results for the subtraction theory.

Luckily Michael Witmore published a similar post using the same methods of analysis with a largely superior data set in his post Docuscope Goes Live, at  In the post all of Shakespeare’s works may be seen clustered together against one hundred and fifty years of contemporary plays.  In that diagram, Cymbeline and Winter’s Tale cluster together in a smaller group but a Pericles play doesn’t appear to me, at least labeled as Shakespeare.  The presence of Pericles in that diagram under a different author’s name, or its non-presence, reinforces doubts surrounding Pericles. Since a nearly similar structure can be seen within the linked pictures on both of our blogs, a fairly certain assumption can be made that it is the play itself that scrambles the previously defined data.

So I have now taken the original thirty-six plays and added the subtraction experiment previously outlined, minus the Pericles data.

This diagram provides a lot of similarities between itself and the original corpus in the sense that Merry Wives is still an outlier and the Histories remain perfectly preserved.  In a sense, Tempest, Cymbeline and Winter’s Tale also remain in very similar locations on the map but now the other elements have moved around.  For instance Hamlet, Troilus and Cressida, King Lear, and Timon of Athens all leave the Green labeled branch but Midsummer Night’s, Romeo and Juliet, and Julius Caesar all remain with Tempest.  This large of a movement, with all four leaving plays moving towards the exterior of the diagram certainly cannot be incidental.  In fact the entire section of plays between As You Like It and Cymbeline remains unchanged yet moves toward the center of the tree.  That also brings up the breaking of the previously unmalleable relationship between Cymbeline and Winter’s Tale.  Throughout the corpus-only pictures in this post, these two plays have clustered together tightly but now they are separated.  However, these two plays also breakup in the second picture when their respective characters are isolated.  Does this mean that Imogen and Leontes are a type of principle component in the way that they polarize these plays?  And is their presence providing the impetus for the large scale movement of plays described earlier?

This rationale for the character’s lines is supportable, however if a look is taken at one of my previous post’s pictures, Genre Dependence on Character Ideolects, you will see this separation of Cymbeline and Winter’s Tale without their main characters removed.  Instead the introduction of Dekker’s works, the playwright who proved the most movement for elements of Romeo and Juliet as well, is what separated them.  Or rather than separate them, cause them to cluster with All’s Well that Ends Well and Merchant of Venice instead of themselves as we see here.  Could that suggest an implicit statistical similarity between Dekker’s works and these character’s lines?  Maybe so, however through this experiment I am able to conclude that the isolation of these character’s lines does affect the plays’ genre clusters previously present, especially in redefining these three particular plays as a previous cluster of Romances.

Also included are three diagrams that didn’t end up supporting my post but yet have interesting data within them.


The thirty-seven play corpus with Romance elements added on a cluster level analysis.

The thirty-seven play corpus with Romance elements added on a dimension analysis.

The thirty-six play corpus with only the plays with and with-out characters in Cluster Analysis.



Filed under Shakespeare

2 responses to “To Be or Not To Be a Romance

  1. This analysis is fascinating. Why, Stumpf is asking, are some characters “sheddable” — why can a play remain itself texturally without certain leading characters — and what does this say about the contribution of different characters to larger genres. The answer is, it depends. And in the case of Imogen, a stock female character, we see a lack of intrinsic connection between character and dramatic mode.

    So, what about Miranda? Definitely not a stock character, but perhaps there’s not enough of her as a speaker (in terms of percentage of lines) to offer an accurate text. I look forward to reading more!

  2. Pingback: Action Is Eloquence | All Is True

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s