Online articles

Thoughts on ‘Low-tech’ Digital Humanities

[Note: This article first appeared in the 4 April 2022 edition of the Talking Humanities blog at the School of Advanced Study, University of London.]

Last year I visited the Waste Age exhibit at the Design Museum in London. I will never forget my immediate sorrow at seeing a massive bottle-top chain made with collected waste from beaches in Cornwall in only a few weeks in winter 2015 by the Cornish Plastic Pollution Coalition, alongside the various small exhibitions of what the curators deemed our Throwaway Culture.

Through a powerful combination of data visualisations, multimedia exhibits of waste, art works, and educational material, the Waste Age exhibit provided ample reminders that an insatiable desire to consume, innovate, and profit is helping to destroy the natural world. But it also made clear that digital technology is not innocent in the matter: there is not only planned obsolescence in the tech industry, but also failed recycling initiatives and an innovation-at-all-costs attitude.

Every technical decision we make requires energy. Digital technology would not be possible without oil and minerals—even the so-called ‘cloud’ is powered by raw materials which need to be mined. Every phone we buy, computer we use, email we send, Google doc we create, Zoom call we hold, digital project we curate—these all rely on physical and virtual infrastructure that are still largely dependent on fossil fuel consumption. The Internet consumes a huge amount of resources, ranging from labour, electricity, and infrastructure costs. Much-hyped new technologies such as machine learning, artificial intelligence, and blockchain technologies require as much energy as some small countries. All data transfers in this data-saturated world require electricity, which creates carbon emissions — and this contributes to climate change and accessibility issues in the Global South.

What can digital humanities researchers do to resist this techno-utopian, innovation-hungry, carbon-intensive technological trap? As we digital humanities practitioners often like to say, being involved in the digital does not necessarily mean that we believe that more technology will solve all of our problems. This is where the humanities bit comes in: one of our roles in DH is to question prevailing practices and feature creep in digital research, and to educate people about the gains and losses of using certain technologies. Along these lines, for example, we have seen important work from Shoshana Zuboff, Roopika Risam, Safiya Umoja Noble, Carl Benedikt Frey, and Alex Gil that show how technology serves as a mirror of societal defects ranging from surveillance capitalismracism, techno-determinism, and income inequality. Yet there is still more work to do to show how digital humanities can answer to the climate crisis.

Minimal computing offers an important set of thinking tools to make responsible and low impact digital decisions. As Jentery Sayers has written, ‘a minimal approach reduces the need for not only substantial storage and processing power but also a reliance on middleware, databases, peripherals, and substantial pieces of hardware. Such reduction should increase access while decreasing technology’s environmental effects (eg, by reducing waste and energy consumption).’ Minimal computing is a set of practices that aims to reduce barriers to access, engagement, and critical nuance. By focusing on sharing content without bells and whistles (ie software dependencies independent of content), users will have a better chance to access the essential data that a project curates. The label ‘low tech’ does not mean unsophisticated—it means efficient and more accessible.

At the Digital Humanities Research Hub, one of our values is responsible and low-impact technology. This value aligns with a variety of climate interests, including minimal computing, green computing, and agile and collaborative computing. Low-tech solutions can reduce data transfer by up to 70 per cent in comparison to regular, database-driven websites by identifying only the most necessary components for communicating research online. In this respect we are inspired by Low Tech Magazine, which operates an entire web site with a solar panel situated in Barcelona.

The DH Hub is actively developing and supporting low-tech and minimum viable product-oriented solutions for digital research. For example, when I teach Digital Scholarly Editing courses at the London Rare Books School, I not only show students how to create minimalist data models for their projects but also give them a range of options for responsible and low-maintenance publishing options. One of the research projects I work on, the Herman Melville Electronic Library, also uses minimal computing principles for its web site, which I discussed during our 2021 seminar series. And finally, the DH Hub is leading a working group to develop a tool-kit, Sustainable Digital Technologies for the Arts and Humanities, which will be released this summer through the Digital Humanities Climate Coalition. These activities reflect our commitment to equip researchers with the tools to make more informed decisions about the environmental consequences of their technological choices.

Figures for Publishing Scholarly Editions: Archives, Computing, and Experience

Anyone who accesses the PDF or print copy of my book Publishing Scholarly Editions may notice that some of the figures came out a little blurry. I have included the original image files below in case anyone would like to examine them closely.

On Tony Harrison’s ‘The Icing Hand’, from Illuminations

[Note: this was originally posted on Newcastle Poetry Festival 2020 Inside Writing showcase.]

Dr Christopher Ohge, Lecturer in Digital Approaches to Literature at the IES, reads ‘The Icing Hand’ by Tony Harrison from the Bloodaxe Books online archive and shares his thoughts on the poem.

I have long admired Tony Harrison’s poetry. He is justly regarded for giving dignity to the working people of his youth. Harrison is also grouped with that famed cadre of ‘Leeds poets’ after the Second World War, which included some other of my favourites, Geoffrey Hill and Jon Silkin among them.

I chose ‘The Icing Hand’ for several reasons. The first is that I find it to be a beautiful combination of techne (craft) and thought, which is evident not just in Illuminations but also in the collection in which it appeared, The School of Eloquence (1978), which is my favourite of his collections. This accessible poem has an incredible economy of words and is also teeming with philosophical significance and literary echoes. It meditates on the power of ephemeral experiences. His father makes beautiful cakes, then they are demolished (with pleasure) at happy occasions; the child and his father make sand castles that eventually get swept away. This may be a rejoinder to Tennyson’s imagining the ‘topmost froth of thought’ in poetry.

I cannot love thee as I ought,
For love reflects the thing beloved;
My words are only words, and moved
Upon the topmost froth of thought. (In Memoriam LII)

Yet Harrison’s wonderful turn in the poem, that his father’s hand ‘guides / my pen when I try shaping memories of him’, also recalls those wonderful opening lines to Seamus Heaney’s ‘Digging’: ‘Between my finger and my thumb / The squat pen rests’. He allows the returning waves to fill his memory––the froth settling on the beach after the receding wave.

I also chose this poem from the Bloodaxe Archive because, as a textual scholar, I love to see page proofs and holograph corrections. This poem includes an intriguing variant: in the third stanza, the proof initially read that ‘one wave-surge sweep / our wrinkle-stuccoed edifice away’. That sounds right to me at first, but it does not make sense upon reflection. It is corrected to ‘winkled’, a tricky word, but exactly the right word, being a shortened version of periwinkle, a spiral shell of a mollusc that covers the castle. This is perhaps another echo of Tennyson, from the same stanza with the ‘topmost froth’: ‘Abide: thy wealth is gather ’d in, / When Time hath sunder’d shell from pearl’. For Harrison, the wave is time; it is cyclical. Ruminative. An imitation of the father with a different medium, yet sitting in between two worlds––of craft and thought? A dance of the intellect, looping over constructed memories through the construction of poems.

Reading the poem aloud reinforces its meditative qualities. The meter is not consistently iambic (which I think is a virtue), and he employs thoroughly English tactics of alliteration and consonance. If you read the first two words as a trochee (about a kind of troche, no less?), as I do, you can hear how the poem begins with an energetic spirit. All good poems have an internal variation of pacing, and you can certainly see that working here. While the first three lines of the first stanza left me nearly breathless, as with the last three lines of the second stanza, Harrison ends that stanza with ‘hope to swim’––an apt phrase for me as the reader. The doubling of remembrance in the final stanza layers the poem about the similarity of cake-making and sand castles. It shows the dignity of quiet habit, of the creation of gritty beauty that might have no lasting value, of the likelihood that your work will be swept away by nature. It is, as T. S. Eliot said, in ‘the trying’ that the poet realises that ‘every attempt / Is a wholly new start, and a different kind of failure’ (‘East Coker’, The Four Quartets). Yet there is something wonderfully earnest too in the page proof in which Harrison fixed a mistake.

The last two stanzas of ‘The Icing Hand’ have even more poetic abruptions, culminating in a final line with five commas, seemingly resisting ‘floods’ with the topmost froth of an agile mind.

The Making of an Anti-Slavery Anthology: Mary-Anne Rawson and The Bow in the Cloud

[Please note: this blog was also posted on the John Rylands Research Institute’s blog, but I have made a few minor revisions.]

Part 1

A Definitive Object, a Solidity of Purpose

“The plan appears to me very promising, and I hope, and that its success will further the amiable design of its formation––as a publication … which though so attractive to the Eye, and in some of their contributions, so touching to the heart, have always seemed to me as wanting a definitive object, a solidity of purpose…”

(Mary Sterndale to Mary-Anne Rawson, 28 February 1833)

The anti-slavery anthology The Bow in the Cloud, published in 1834, came with international aspirations. It was roughly an 8-year project, one that sought to bolster various abolitionist movements throughout the world. The making of the book encompassed the time when slavery had been abolished in the UK (and was still legal in the colonies), to when it was abolished in the colonies but still very much alive in many other parts of the world. It was initiated in 1826 by Mary-Anne Read, a young activist in Sheffield encouraged by the example of her parents, who were well-known philanthropists, to use literature to influence public opinion. Yet the project stalled, for reasons that are not yet known. By mid-1834, Mary-Anne, now married to George Rawson, and still living in Sheffield, had re-ignited the project and shepherded The Bow in the Cloud into publication with a major London publisher, Walford and Jackson.

Many of the figures in the anthology require more exploration: only about half of the contributors are in the Oxford Dictionary of National Biography or Wikipedia. Also, many of the existing entries on these contributors do not mention their abolitionism, or their literary contributions to this anthology. It is also no less significant that roughly half of the figures (including Rawson) appear in the sprawling painting by Benjamin Haydon commemorating the 1840 Anti-Slavery Convention in London, which you can see in the National Portrait Gallery in London. Notice that Rawson is prominently featured near the front (on the far right side, circled in red).

(Courtesy National Portrait Gallery, NPG 599.)

While searching for the Bow in the Cloud on Google Books, I got a result that erroneously lists the author of the book as Bonnie Barton (presumably an OCR error––a common metadata problem with Google Books––picking up the first contributor to the anthology, Bernard Barton). I have also found entries in WorldCat––and even in peer-reviewed scholarly works––mentioning Bonnie Barton’s Bow in the Cloud. This quasi-erasure of Rawson’s role in the making of this book exemplifies the amount of work that still needs to be done to recover the histories in our cultural archives.

Going beyond these oversights, the material contents surrounding the book’s publication have never been thoroughly investigated. Its editor, Mary-Anne Rawson, was a founding member of the Sheffield Ladies Anti-Slavery Society, one of the most radical abolitionist groups and one of the first to officially boycott plantation sugar and to call for universal abolition. This history of this text is crucial because the Bow in the Cloud is an early example of the politically-themed literary anthology, one which would be soon copied in the US. Its manuscript collection is significantly vast and revealing; particularly so for an anthology of this kind, with so many contributors, some of whom were not professional writers. Each submission to the anthology came with a covering letter (and some submissions have multiple letters spanning from 1826 to 1834) and some pieces also came with photographs, drawings, engravings of the authors, or newspaper clippings. The poems that Rawson chose to publish were also copied in her own hand, and several of those fair copies show evidence of her revisions to the pieces before she supplied a printer’s copy to Walford and Jackson.

The entire manuscript collection, housed at the John Rylands Library (Eng MSS 414-415), has just been digitised, as part of a digital humanities start-up grant that I received from the John Rylands Research Institute, University of Manchester, in 2018.1 The digital images of more than 600 surviving manuscripts total 818 high-resolution files which will soon be available––with extensive metadata of each item based on my study of the manuscripts––as open-access images on a IIIF manifest.2

What also requires exploration is a meditation on the nature of the enterprise itself: this is an anthology, edited by a pioneering woman with specific aims that were difficult to articulate, at a crucial time in history. How were these pieces solicited, who declined, which pieces were rejected, and which were eventually published? How were they received, edited, organized, and designed? How did this kind of book affect the publishing and literary culture of abolitionism in the United States? The surviving evidence now being brought out will give the best sense yet of this unique volume’s textual history. I am currently creating a digital edition and a network analysis to bring out more of these connections in this under-appreciated anthology.


The Bow in the Cloud was published in 1834 by Jackson and Walford, in St. Paul’s Churchyard, London. The firm was probably best known as the publisher of the nonconformist, progressive magazine Eclectic Review as well as many ecclesiastical books.

The book was sold for 12 shillings, which is about 50 GBP in today’s money. Put another way, it was about two days pay for a skilled tradesman, or about the cost of a week’s supply of butchered meat and tea. In other words, this was a middle-class product, on the edge of affordability.


The volume appears handsome: its foolscap octavo pages (at 6¾” x 4¼ “) were gilt on the edges and bound in turkey morocco with a gilded engraving on the cover. The publication’s advertisement pamphlet called attention to its quality, which is sort of true: while the goatskin-based binding is a sturdy, high-quality material, it was bound a bit too tightly, and the gilded pages were really an affectation of the publishing world that did not add significantly to the quality. The foolscap pages themselves were also small and fairly reflective of most publications of the time, so it was not an exceptional piece of craft.


The book is sizeable: it comes to 408 pages. In addition to Rawson’s Preface, it contains 95 pieces of poetry and prose.

The book’s frontispiece was designed by H. Corbould (that is, Edward Henry Corbould). Here is his original version of the etching that was printed before the book’s title page.


In 1833 the nineteen-year-old Corbould was at the beginning of a distinguished career as a book illustrator and watercolour artist. In the same year that The Bow in the Cloud was published, he received his first of several gold medals from the Society of Arts, and he later refined his literary appreciation by creating illustrated editions of Chaucer, Spenser, Shakespeare, and Tennyson. One of Corbould’s letters sent to the publishers Jackson and Walford even includes his fee for the illustration: 4 pounds 40.

But this was not the only illustration that could have been included in the book.

Ann (nee Taylor) Gilbert, the well-regarded children’s poet who co-wrote with her sister Jane Taylor, sent a poem that was published as “The Mother” (she is listed in the volume as ‘Mrs. Gilbert’). Her submission came with a small watercolour illustration that Rawson did not use.


All three of Gilbert’s poems in the volume examine the pathos of shattered domesticity created by slavery, a voice that stands in contrast to what she is known for today.

Another pencil drawing by Mary Sterndale was to accompany her only published poem, “The Slave Ship”, but Rawson did not include it.


Here again is an image of the family that is intended to solicit sympathy.

These illustrations are unique examples of the wealth of textual and paratextual material in the collection, but the manuscript variations also offer several clues to Rawson’s process.

The collection includes two heavily revised drafts of her Preface. Each draft also includes unpublished notes. One of the goals of recovering this textual history is to produce a digital edition of The Bow in the Cloud, including not only unpublished manuscript material such as the covering letters, but also a genetic text showing the development of the texts that exist in multiple versions.

Here is a preview of the genetic editing tool, called TextLab, which I am using to mark-up transcriptions of the manuscripts in XML according to the standards of the Text Encoding Initiative. This markup identifies several useful attributes, including the date of the manuscript, who made what changes, and other information about those changes and the medium. It also matches the transcription to its location on the page image. Here is the opening of Rawson’s Preface in TextLab’s editing environment (in minimal TEI).


And this is how it is rendered in the diplomatic transcription interface.


I am currently marking up all the manuscripts in the collection for a digital edition so that researchers will be able to analyse the stages of revision that preceded the publication. These encoded documents will consist not only of poem and prose fair copies (many of which were revised by Rawson), but also of the original submissions that were included with covering letters. Encoding of unpublished documents will also give researchers a better sense of the varieties of work that went into bringing this anthology to the public, as well as the connections between the documents that will form the basis of a network analysis.

One unpublished note from the Preface reveals Rawson’s desire to clearly justify her role before the public:


The Editor of this little volume is not placed in the awkward predicament of many original writers, who feel it necessary to make an apology for (appearing before the public) or (for adding to the number of books already before the public). She has no apology to offer––nay––so far from feeling one needful and pleading for indulgence, she is enabled to take far higher ground––she feels that she has conferred a favour on the public especially the junior part of it, and she can unhesitating say, that she considers them a most valuable & rare collection of original papers, and that it will be the fault of the reader if he does not rise from the perusal with.

Rawson chose not to sign her Preface in the published version, so her “role” was as an anonymous editor from Wincobank Hall, Sheffield. In fact, her name does not appear anywhere in the book.

Rawson also solicited advice from some of the volume’s contributors about the Preface and other contributors’ pieces. One such clue left in the manuscripts comes from the congregationalist minister J. W. H. Pritchard’s letter from 11 April 1834. It proves not only that he helped her edit some poems in the book, but that he also offered suggestions to Rawson’s preface, which were adopted. Pritchard wrote in one instance:

The sentence [in the Preface] might admit of a change of this kind “It would indeed have been delightful if every hand which has taken a prominent part [or been actively employed] in pulling down the prison house, & in striking off the fetters of the bondsmen, could have put &c”

That phrase, as it was published on page 5, adopted some of his suggestions: ‘It would indeed have been delightful if every hand which has been actively engaged in pulling down the prison-house, and striking off the fetters of the bondman, could have put a stone into the monument here erected upon its ruins, to tell posterity where it stood, the curses it contained, and how it fell’. This phrase is also not to be found in the two surviving drafts of the Preface. It is also clear that there was an additional printer’s copy of the Preface, which does not survive, because there are still notable differences between the the manuscripts in the collection and the published version. Her unpublished notes also show that she sent proof sheets to at least seven other readers––only some of whom had contributed to the book.

These hitherto undocumented details surrounding Rawson’s editing of the anthology show her as an active editor, organizer, and writer. They also reveal the extent to which she relied on her social network of anti-slavery activists, not just locally, through the Sheffield Ladies Anti-Slavery Society, but also in London and elsewhere.

In part 2 of this blog, I will go into some more detail about some of the contents of manuscript collection surrounding this fascinating anthology.

Digital Text Analysis of Herman Melville’s Marginalia in Shakespeare [A Progress Report]

Remarks delivered at the Digital Humanities Congress, Sheffield, 6 September 2018[1]

(Click here to download all of the slides as a PDF file.)

“He had the tradition in him, deep, in his brain, his words, the salt beat of his blood. He had the sea of himself in a vigorous, stricken way… It enabled him to draw up from Shakespeare… History was ritual and repetition when Melville’s imagination was at its own proper beat.”

––Charles Olson, Call Me Ishmael (1947)

Melville was a keen appreciator of Shakespeare––this is not a groundbreaking revelation to many. Yet few scholars have closely examined Melville’s marginalia in Shakespeare to reveal that influence. In his copy of King Lear, for example, Melville noted many of the concise passages involving Lear’s tragic double-bind in the play. In the exchange that precipitates Gloster’s blinding, when Gloster declares he will live to see vengeance delivered upon Lear’s daughters, Cornwall responds, “See it shalt thou never!” Melville underlined the diabolical wit of the rebuttal and wrote in the margin, “Terrific!”

(Click on the images to expand.)Figure 2

Terrific, as in terrifying (think of “The Serpent … with brazen Eyes | And hairie Main terrific” in Book VII of Paradise Lost). And terrifically true to Melville.

In the final scene of King Lear, when Edmund responds to Albany’s challenge by exclaiming “I will maintain | My truth and honor firmly,” Melville wrote at the top: “The infernal nature has a valor often denied to innocence.”

Figure 3

Consider that annotation in relation to the innocent nature of Starbuck in Moby-Dick: “That immaculate manliness we feel within ourselves, so far within us, that it remains intact though all the outer character seem gone; bleeds with keenest anguish at the undraped spectacle of a valor-ruined man” (Chapter 26, “Knights and Squires”). And Ahab’s valor does not seem to be isolated within himself but rather partakes in a natural will of malice, such that his crew “seemed specially picked and packed by some infernal fatality to help him to his monomaniac revenge” (Chapter 41, “Moby Dick”). “The infernal nature has a valor often denied to innocence.”

I show these examples from King Lear to illustrate the importance of the evidence of Melville’s close engagement with Shakespeare, on the one hand, as well as the tendency to focus on annotation in studies of authorial reading. Markings such as underlinings, marginal scores, and checkmarks are just as important and can also reveal patterns of reading that affected the Melville’s thinking.

Figure 4

Melville’s marginalia in his 1837 American edition of the Hilliard, Gray Dramatic Works of William Shakespeare offer a strong case for digital text analysis among books that survive from his library. He marked thirty-one plays in the seven-volume set, comprising 681 distinct passages with marginalia that can be attributed to Melville. Previous attempts by scholars to count the marginalia were significantly off, but working with Steven Olsen-Smith and a team of contributors, I have applied computational approaches to reading the data of the marginalia. What this shows is that text mining approaches, while often used for large data sets, can also effectively aid close reading and facilitate new discoveries. This kind of text analysis may be even more important for marginalia, which is fragmentary by nature. It is a curated selection of words within a coherent text.

Interpretive stakes are high for analysing Melville’s marginalia to Shakespeare’s plays. Aside from his comments on his friend and fellow author Nathaniel Hawthorne in “Hawthorne and His Mosses,” Melville’s pronouncements on Shakespeare in the same essay constitute his most detailed assessment of a writer whose works survive, heavily marked and annotated, from his library. Their main purport boils down to the following extraordinary passage:

But it is those deep far-away things in him; those occasional flashings-forth of the intuitive Truth in him; those short, quick probings at the very axis of reality;—these are the things that make Shakespeare, Shakespeare. Through the mouths of the dark characters of Hamlet, Timon, Lear, and Iago, he craftily says, or sometimes insinuates the things, which we feel to be so terrifically true, that it were all but madness for any good man, in his own proper character, to utter, or even hint of them.

Shakespeare’s genius hinges on interconnected notions of rhetoric, sentiment, and reception. His most profound disclosures, with their philosophically bleak implications, are made in few words, whether more or less directly but stealthily and by insinuation. Their potentially baneful effects on readers––who are intellectually and temperamentally unprepared for them––necessitate that craftiness.

Melville’s Marginalia Online is one of the most advanced digital projects devoted to an author’s personal library and marginalia. In addition to the data entry for the front end of the digital archive, staff members of the project have been also marking up the digitised books by Melville using coordinate-capture XML markup, which allows for word searching that also highlights the results on the facsimile image of the book page.

Figure 6 See, for example Melville’s first marking in The Tempest:

<div id="2" x="277" y="2415" group="1" width="1299" height="129" type="checkmark" sealts="460_1_c011" attribution="HM" mode="comedy" play="1a">
<w x="416">That</w>
<w x="526">this</w>
<w x="653">lives</w>
<w x="726">in</w>
<w x="815">thy</w>
<w x="1023">mind?</w>
<w x="1197">What</w>
<w x="1344">seest</w>
<w x="1469">thou</w>
<w x="1574">else</w>
<div id="3" x="277" y="2479" group="1" width="1075" height="74" type="underline" sealts="460_1_c011" attribution="HM" mode="comedy" play="1a">
<w x="353">In</w>
<w x="446">the</w>
<w x="580">dark</w>
<w x="836">backward</w>
<w x="943">and</w>
<w x="1124">abysm</w>
<w x="1192">of</w>
<w x="1345">time?</w>

Each marking is encoded as a <div> element which comes with several attributes identifying aspects such as the play to which it belongs, the play’s mode, and other relevant information that aid the text analysis. Each word is marked with a word element (<w>) and a coordinate attribute that points to its place on the page facsimile. The example also features an embedded marking (an underline) within the checkmarked passage.

As a result of this XML markup, a user can now undertake word searches on the site. Here I have entered “fear,” which as I show later, is a prominent negative-sentiment word among Melville’s markings.

Figure 8

Melville marked the word “fear” in several of his books, including his other major source for Moby-Dick, Thomas Beale’s Natural History of the Sperm Whale. But there the word appears 17 times in the Shakespeare marginalia.

Figure 9

By clicking on the Shakespeare results one can see more details about the words––the first result features two instances of “fear” as well as a corresponding annotation.

Figure 10

One can click on that particular result to see the book page with the search results highlighted therein. Here Melville is showing his intertextual prowess, cross-referencing the death-as-sleep metaphor in Measure for Measure with another instance of it in The Tempest. These individual and nuanced explorations of Melville’s marginalia are a huge boon to researchers, but we are now using text analysis and data mining to reveal more about the total corpus of these markings.

How important was Shakespeare to Melville’s style? As an experiment, I ran a stylometry calculation using the “stylo” package in R.

Figure 11

Stylo’s accuracy shows in this visualisation, which groups most of Melville’s works near each other in what might be thought of as a stylistic family tree. The greater proximity of Melville’s writings to Shakespeare’s shows that, from a linguistic perspective, Melville’s style is a closer cousin to all of Shakespeare’s plays than to Homer or Milton.


Figure 12

Stylo can also group the most distinctive words in Melville’s reading as compared to his own words in his works. Among the most distinctive words (other than function words) in the whole texts of Homer, Milton, and Shakespeare, “honour,” “grace,” “son,” and “father” stand out, suggesting themes of virtue and legacy. On the other hand, some words in Melville’s writings that diverge the most from these readings––such as “seemed,” “moment,” “like,” and “something”––are related to perception. These discoveries are catalysts for new analytical directions.

Why might Shakespeare be closer than Homer and Milton? Stylo also reveals that Melville, like Shakespeare, was drawn more to the first person (unlike Homer and Milton). This R code (shown in RStudio) bolsters this conclusion:

Figure 13
A table of the most frequent bigrams in the Shakespeare marginalia, highlighted in red, shows a heavy prevalence of first-person constructions––five out of the top ten, in fact. No other pronouns appear.

The XML encoding of each marking on the word-level facilitates computational analyses of the markings.

Figure 14
With an XSLT transformation, we produced word counts for each play.

Figure 15

One can quickly learn a lot––and pose some questions––from these results. There are some surprises: Melville marked more words in the comedies than in the tragedies; that among the comedies, he marked the most words in Measure for Measure (well, that might not be so surprising, because it is a dark comedy); that the tragedy with the most markings is Antony and Cleopatra; that he marked more words in Henry VIII than he did in Hamlet or King Lear. We can already see how far we have come since I showed you the two intriguing annotations in King Lear––a play which actually represents a small percentage of his notes in Shakespeare.

Given the apparent differences between the comedies and tragedies, we realised the necessity to calculate the word counts-per-marking, as well as the average word count per marking, in each play mode.

Figure 16

It turns out that Melville marked much shorter passages in the tragedies than in the comedies––an average difference of about 10 words.

R code adapted from Matthew Jockers’s Text Analysis with R for Students of Literature can also calculate other linguistic features. Here is a graph of lexical uniqueness (calculating hapax legomena, or words that only occur once in a corpus).

Figure 17
Again, the markings in the tragedies have the highest lexical variety––that is, the highest percentage of unique words. This can also be read as a general index of briefness in passages and sections. Now let’s consider the low average word counts and high lexical variety in the tragedies in light of Melville’s quote about Shakespeare: “But it is those deep far-away things in him; those occasional flashings-forth of the intuitive Truth in him; those short, quick probings at the very axis of reality;—these are the things that make Shakespeare, Shakespeare.”

Short, quick probings––as reflected in his own notes to the text. Construed, then, within the framework of esoteric expression he attributed to Shakespeare, Melville’s preoccupation with the bleakness of worldly and human conditions in his marginalia to the Dramatic Works corresponds with the views he expressed in “Hawthorne and His Mosses.”

Melville’s mentioning the four primary dark characters moved us to test the lexical uniqueness in the plays in which they appear.

Figure 19

Indicators of brief marked passages carry heightened prospects of significance; for it is in such marginalia that we may expect to encounter what Melville described as Shakespeare’s “short quick probings at the very axis of reality”. The passages with maxed values offer a number of different candidates for the sorts of disclosures Melville had in mind, including “Virtue itself of vice must pardon beg” (Hamlet) and “Truth’s a dog that must to kennel” (King Lear). Overall this graph reveals more nuanced information than can be gleaned from the word count graph: the markings in King Lear contain almost three times more lexically unique marked passages than Hamlet. It would be a mistake, therefore, to deduce from the marked-word counts that Melville engaged more extensively with Hamlet than with King Lear. Instead, he engaged with each differently, marking fewer but longer passages in Hamlet, and a larger number of shorter passages in King Lear.

The two markings in Othello also have quite divergent lexical values.
Figure 20
Low values can call attention to passages marked by Melville for their rhetorical qualities along with their purported sense, which is the case for the second of these passages but less so for the first. Here Melville’s interest was focused primarily on the idea of Iago’s dark utterance of incisive profundity. But in the second passage, Melville’s attention to wordplay, as well as to sense, bespeaks a different but no less significant dimension of the verbal features that moved him to apply pencil to paper.

If we start to investigate the words themselves with the aid of computation, the results suggest new avenues for understanding Melville’s reading of Shakespeare. The three most common substantive words that Melville marked were “man”, “world”, and “love”.
Figure 21
As this table shows, the high-frequency terms follow an interesting trajectory, showing that “love” appears proportionally the most in comedies, “world” in the tragedies, and “man” in the histories.

Moreover, a wordcloud of word frequencies in the comedies shows a lot of what might be considered humanistic terms, whereas in the tragedies there is a cluster of “world” in tension with “man” and “good”.

Figure 22
Figure 23
In addition to these helpful word frequency results, I also created tables of all the markings with another XSLT script.

With that we are now able to provide HTML tables of each marking with its associated bibliographic reference as well as its word count.

Users of these tables for the recent special issue of Leviathan (June 2018) have already found it very efficient to have all the markings in one searchable table with their associated metadata.

The table is also sortable by word count, which is particularly important for gauging Melville’s attention to brevity.


Or prolixity.


(To access the full table of markings, go to

These fairly rudimentary calculations already provide a good amount of research questions, but there are still many other ways to investigate this fairly small data set of marked words for close reading. One way of doing that is with sentiment analysis––which is particularly relevant to an author who, according to Melville, was esoterically dark.

Figure 28

Drawing on Julia Silge and David Robinson’s “tidytext” package in R, sentiment analysis shows the frequency of positive and negative words in a given data set. Melville clearly marked a much greater proportion of negative words in Shakespeare relative the whole texts, so he was sincere in his estimation of what he called Shakespeare’s blackness.

The sentiment data also show that Melville marked a greater net number of negative words, and that he noted a small group of positive words with more frequency than the negative ones. The negative words are more variable as well as more numerous. A bar graph of the twenty most frequent positive and negative words allow one to posit new questions about frequently-used words and their implications within marked passages.

Figure 29

Recall at the beginning I looked at “fear,” which is fourth from the top, and “death” ranks second from the top. But the positive words raise some questions as well––can “good” really be positive in Shakespeare, or “love” (also one of the highest frequency terms overall)? How does the word “great” function in context?[2] What can be inferred from the high frequencies of negative terms and the concentrated appearance of select positive terms?

To complement this data I produced a tidytext tibble (which is a dataframe that organises each variable in a column and each observation in a row. Each type of observational unit is a table––for us that boils down to one token per row with various attributes). We have organized the data such that each marking observation has a corresponding play title. First I created a table of bigrams without stopwords to look for some more clues.


Figure 30

In the top ten results, already two interesting ones come out: “peace peace” and “hate thee”––both of which indicate trouble.

Next I created a trigram table: the presence of fewer trigram results reflects the small corpus size, so these might seem less relevant.


Figure 31

Yet it is still intriguing to notice the implication of tenuous personal relationships in the substantive trigrams, especially the ones highlighted in red. Testing trigrams is meant to achieve a sense of linguistic relations––subject, object, verb constructions can be very informative. In this case, however, what does the lack of repeatable trigrams show, other than the fact that this is a small corpus? It reinforces the previous results showing that Melville was generally not attending to repetition, that he was paying more attention to distinct utterances and ideas rather than rhetoric.

The following tibble shows the TF-IDF results of bigrams: standing for term frequency–inverse document frequency, TF-IDF is a numerical statistic that attempts to reflect the importance of a word or group of words.

Figure 32

TF-IDF is often weighted by the number of occurrences in a document and its appearances throughout the corpus, but here we can see the striking results within one document of Melville’s markings. Particularly with bigrams, the results suggest the importance of pessimistic pairings that are quite unique in the corpus: “life cancels,” in Henry IV (a play that scholars may have overlooked as an influence on Melville), but also the group from Othello: “blood burns”, “dangerous conceits”, “nature’s poisons”, “poison natures”.

A larger table of TF-IDF results also suggests some important bigrams in Richard III, with “darkness true” and “meaner creatures”.

Figure 33
In Romeo and Juliet, “blood stirring” and “mad blood”; and in the Taming of the Shrew, “bitter word”.

Figure 34
And these bigrams are bespeaking bitterness, indeed. Of course they are not meant to suggest that the TF-IDF bigrams are the most important pairings among Melville’s markings, but they are guideposts to new avenues of his reading. They also emphasise the weight of negativity (with more context) in the markings. However, they are statistically provocative within the document, and are worth investigating further.

Having generated these bigram tables, I can also undertake other sentiment analyses in which the code finds pairings of the most frequent words preceded by “not”. This provides more context to the unigram sentiment results I showed earlier.

Figure 35

Given the dark implications of the markings, notice too that the highest frequency word to be preceded by “not” is the word “good.” Tinkering the R code further to inner-join a vector of negation words with the sentiment calculation, I also produced a graph of sentiment words preceded by negations.

Figure 36

And again, the word “good” comes out on top, but some other interesting results appear: “satisfied”, “pleasure”, and “true”. Even the negation of a seemingly negative word such as “pity” provides no less comfort.

Returning to the negative sentiment unigrams, a wordcloud of all the negative sentiment terms weighted by frequency can guide further exploration, complemented by the overall word frequencies.

Figure 37

Making sense of all these analyses requires some critical thinking and close readings of these marked words and ideas in relation to Melville’s own works. Given that we know Melville was reading Shakespeare closely while writing Moby-Dick, we focussed our attention to the play Henry VIII (a play which we might have ignored given its lack of notoriety but which we now investigate because it was the third-highest marked play). For example, he scores a speech by Cardinal Wolsey, which contains several terms duplicated or approximated in the negative wordcloud, such as “fear,” “weak,” and “malice”:

We must not stint
Our necessary actions, in the fear
To cope malicious censurers; which ever,
As ravenous fishes, do a vessel follow
That is new trimmed; but benefit no further
Than vainly longing. What we oft do best,
By sick interpreters, once weak ones, is
Not ours, or not allowed; what worst, as oft,
Hitting a grosser quality, is cried up
For our best act. If we shall stand still,
In fear our motion will be mocked or carped at…

Figure 38

These passages, never before cited as a potential source for Moby-Dick, presage the language of Ahab’s tragic striving, and his malicious and monomaniacal quest.

Ahab’s obsession for vengeance, which the narrator Ishmael compares to the obstinacy of a “thunder-cloven old oak” (125), also resonates with Melville’s attention to the dialogue in Measure for Measure featuring Isabel’s comments about misappropriated strength and the tyranny of hubris: “And he, that suffers. O, it is excellent / To have a giant’s strength; but it is tyrannous / To use it like a giant” (1: 358). Isabel’s next remark on the ludicrous pride of “man” (quoted earlier) is preceded by a suggestive analogy to the natural and phenomenal world:

Thou rather, with thy sharp and sulphurous bolt,
Split’st the unwedgeable and gnarled oak,
Than the soft myrtle:––But man, proud man!
Dressed in a little brief authority,—
Most ignorant of what he’s most assured,
His glassy essence,—like an angry ape,
Plays such fantastic tricks before high Heaven,
As make the angels weep; who, with our spleens,
Would all themselves laugh mortal.

Involving the high-frequency term “man” and the prominent sentiment-result “heaven,” as well as the sentiment graph data relating to pride, anger, strength, suffering, and death, the page of Dramatic Works containing Isabel’s laments also features her speculation, “could great men thunder,” and repeats “thunder” multiple times. Melville also marked Cleopatra’s comparing Antony to “rattling thunder.”
Figure 39
As you can see above, the image of a thunderstruck tree and the presence of outsized strength and hubris all figure in the simile at the end of chapter 119 of Moby-Dick: “As in the hurricane that sweeps the plain, men fly the neighborhood of some lone, gigantic elm, whose very height and strength but render it so much the more unsafe, because so much the more a mark for thunderbolts; so at those last words of Ahab’s many of the mariners did run from him in a terror of dismay.” The following examples from Henry VIII, Measure for Measure, and Antony and Cleopatra all illustrate the confluence of influences from plays across modes that also happen to have been among the plays in which he left the most markings. It is difficult to imagine Ahab’s character coming into being with such force––and nuance––if Melville had not studied Shakespeare’s plays.

Computational approaches to Melville’s marginalia allow readers to calculate word counts and frequencies, word variety, topic clusterings, and sentiment associations. Complemented with informed acts of careful reading and source elucidation, these text analyses reveal Melville constructing new paths in his own writing from his experiences of reading Shakespeare. By using distant reading strategies with the marginalia, in their own right and in the service of close reading, we arrive more informed than ever at the “very axis” of their genius.

The methods demonstrated in this talk were inspired by Matthew Jockers’s Text Analysis with R for Students of Literature and Julia Silge and David Robinson’s Text Mining with R. The XML files, XSLT, and R code used by the MMO team can be accessed on GitHub at

  1. Parts of this talk were adapted from a lengthier essay that was published as part of a special issue of Leviathan: A Journal of Melville Studies 20.2 (June 2018) devoted to digital text analysis of Melville’s marginalia, but this talk also revealed some new techniques and visualisations that did not make it into the publication (which can be accessed at It is important to stress that despite the illusion of completeness that a publication suggests, the text analysis of Melville’s marginalia is an organic process. We are still refining and improving our methods and learning more about ways to enhance our understanding of Melville’s reading. ↩︎
  2. In our Leviathan piece, we show that the word “great” is not always a positive word. Sometimes it is an enhancer of a negative word; other times it is just a flavoring word. This kind of word demonstrates the necessity for critical thinking of the data. ↩︎

Working with digital texts: regular expressions on the command line

Working with my colleagues Marty Steer and Jonathan Blaney (Digital Humanities at the School of Advanced Study), I have been teaching sessions on “Working with Texts” for the LAHP Digital Humanities short course this month. We have provided a range of introductions to markdown, html, xml, TEI-XML, document ontologies, the command line, and regular expressions. The second session in particular inspired a slideshow tutorial of using regular expressions on the command line for literary texts. I also wrote almost the entire slideshow in markdown, my favorite language for authoring web content.

In the spirit of the Institute of English Studies’s recently launched Literary Appreciation Lab, I am beginning to share (and reflect on) recent adventures in teaching and researching digital approaches to literature. In the spirit of that:

Click here to access the tutorial slides for using regular expressions on the command line.

Also, for more information on the “Remark js” html template for creating markdown slideshows, check out Remark’s GitHub repo. It is a fairly straightforward––and powerful––way to author html slideshows. Why create html slideshows? I find it easier to have all of my presentation material on a web browser––e.g., it is easier to open hyperlinks from the slideshow and navigate by simply switching browser tabs. Using html also gives the author more control of the document––there is no proprietary software involved (like with Powerpoint and Numbers). Any decent web browser will open the file with ease.


Now, for those of you who read through the slideshow, I will offer some brief and humble reflections.

The first is, I could not have learned regular expressions (regex) without the help of regex101––it allows you to not only test your regex but also to evaluate its efficiency (i.e., the number of steps taken to find a pattern match). I am aware that my regex (as reflected in my slideshow) possibly could be more efficient, but sometimes you stick with what works. With regex101, I can still test out alternatives for efficiency and consider adjustments.

Secondly, as with many adventures in computer-assisted literary analyses, I was interested to find the sheer preponderance of “madness” words in Shakespeare’s plays. I expected the words to come up more often in Melville––and even in Melville it is interesting to note that his “maddest” book, Pierre, has only about 27 occurrences of “madness” words compared to 51 in Moby-Dick. But 334 instances in Shakespeare––that is indeed more than I had expected. This led me to a 2016 article by Will Tosh that was published on the British Library’s web site.

Several “conceptual” investigations into madness in Shakespeare already exist, but not statistical ones. Even if this activity does not lead to a critical breakthrough, with the simple adjunct of grep searches through multiple files on the command line, you can see how quickly I was able to direct the computational results into questions about several literary texts. Questions which will be resolved with careful reading. I find that these are useful teaching examples of how close and so-called “distant” reading can be complementary activities. In a forthcoming series of posts, I will reflect on my recent work with a team of researchers at Melville’s Marginalia Online on digital text analyses of Melville’s reading of Homer, Shakespeare, and Milton. These essays on these analyses have just been published as a special issue on “Melville’s Hand” in Leviathan: A Journal of Melville Studies. In those essays we show how close and distant reading can be accomplished and communicated to a wide audience–in and in each piece there are plenty of new discoveries about Melville’s sources and their relation to his own work. (More on that soon.)

Finally, I give a bonus example on using xmllint to conduct XPath and grep-style searches. I’ll admit being new to using the command line to deal with xml files (as I have always used the oXygen text editor’s outstanding XPath and XSLT features), but I can also see the value of using the command line for xml searches and transformations. For example, if I cannot access oXygen, I will still be able to query or modify files. Also, if I am not so well-practiced with XQuery or something like the eXist database, I can use xmllint on the command line to query multiple files. I have also recently become interested in XMLStarlet, which is a popular alternative for querying and modifying xml on the command line, but that will have to wait for another day.