"I Fear the Man of a Single Book"

“I Fear the Man of a Single Book”

By Jules Thomson, Associate Archivist / Social Media Manager

Adapted from an article originally published in the CLA’s September 2019 Bulletin

Who invented the internet? (No, not Al Gore!) You may have heard of Ted Nelson and Tim Berners-Lee, usually hailed as the founding fathers of the World Wide Web. But if you answered "a Jesuit priest named Roberto Busa" you also wouldn’t be wrong.

Father Busa was an Italian linguist and seminarian, who studied Theology at the Episcopal Seminary of Belluno - along with Albino Luciani, later known as Pope John Paul I. But Busa is best known as the creator of the Index Thomisticus, a vast digital database of the works of the prolific Saint Thomas Aquinas. In this role, he also pioneered experimental applications of digital technology which would become foundational to the internet as we experience it today.

In 1949 at the Papal Gregorian University of Rome, Busa published a PhD thesis on Aquinas; but it wasn’t until his collaboration with the top brass at IBM, 30 years later, that he was able to convert decades of linguistic research into a dynamic database using emergent computing and internet technology. It was the first time in history that a humanities project had been subject to organization and analysis by computer algorithms. One of the abilities unique to this technology, and crucial for analyzing the works of Aquinas in particular, was its allowance for lemmatisation - grouping together different forms of a word to allow analysis as a single item. Another obvious advantage was the ability to share the database, first across local area networks and eventually globally through the World Wide Web.

The project is widely hailed as the first to introduce digital computing to the humanities – or vice versa, depending on how you look at it - so much so that a Roberto Busa Prize has been established by the international umbrella organization for humanities computing, ADHO, to recognize “outstanding lifetime achievements in the application of information and communications technologies to humanistic research”. Index Thomisticus both anticipated and typified what would later come to be known as "digital humanities", an ever-expanding field marrying the potential of digital technology with history, linguistics, sociology, and the arts.

Nigh on 40 years later, digital humanities (also known as DH) is more popular than ever. One has only to dip a toe into the world of grant-writing to understand that the phrase itself, for better or worse, is veritable currency. A combination of access potential, perceived value for money, and perceived digital preservation of data has contributed to an all-embracing attitude on the part of granting bodies such as the National Endowment for the Humanities. In 2018 the NEH awarded the fourth largest share of its funds specifically to DH projects, after federal/state partnership funding, programming, and general preservation and access.  And to quote the American Academy of Arts and Sciences, “activities with a substantial digital component receive funds under all NEH program areas, thus the percentage of funds allocated to ODH understates the share of monies NEH invests in digital projects and materials.” [emphasis mine].

Few technological advancements throughout recorded history have rivalled that of computing and the internet. As far as the arts and humanities are concerned, the last innovation of equal magnitude was the invention of the printing press. And as was the case in the 15th century, it behooves scholars, historians and information professionals to align with popular media as a means to preserve and promote their work. The Darwinian model, when applied outside of the field of evolutionary science for which it was solely intended, is notoriously problematic. But the ‘change or die’ aphorism is nonetheless demonstrably relevant in the world of business and technology, in which new innovations must keep pace with evolving formats and public expectations. By now, models of commercial technology have permeated academia, information science, and the heritage sector to such an extent that the demarcations between them are increasingly blurry. Faceted search algorithms pioneered by the likes of Amazon have been put to use in public library catalogues. Museums are ‘gamifying’ their collections with 3D modelling, VR headsets, and augmented reality. The science of analyzing and improving user experience (known as UX) is increasingly indispensable in the public sector as well as the private.

Far from being left in the dust, Father Busa’s work continues to resonate today. His harnessing of lemmatization, the grouping of differently-inflected words together, anticipated a wider need for embedded, machine-readable iterations of human-readable text. (i.e., hidden code behind the words you see on the page, which accomplishes all sorts of functions). There are as many reasons why this might be desirable as there are digital humanities projects, but they commonly include semantic grouping of related words and phrases, and the ability to search across non-standardized spellings. The latter are, of course, extremely common in manuscripts dating from before the 20th century. These functions assist not only with research and analysis but also with accessibility, as they facilitate search and retrieval within host websites, and discoverability on search engines such as Google.

The encoding itself is largely accomplished through the process of XML tagging. Most people have at least heard of HTML (Hypertext Markup Language), the primary language of the web, designed to display text. XML (Extensible Markup Language), contrastingly, is the semantic, machine-readable encoding hidden behind the textual web, which allows for dynamic organization as well as mere display. In simple terms, it works by attaching “tags” to visible text which describe the meta-categories to which the text belongs, allowing for analysis and regrouping based on these. There are some immediately obvious applications, such as interactive websites and databases, whose functionality is built upon the language’s dynamism.

XML is extensible – meaning that infinite iterations of it can be tailor-made for any project. In the digital humanities field, one of the most important iterations is TEI (Text Encoding Initiative) format, which is utilized primarily by libraries, museums, publishers, and individual scholars to present texts for online research, teaching, and preservation. Notable examples include the Folger Shakespeare Library, which has used TEI in its Folger Digital Texts project to lemmatize words across different versions of Shakespeare’s manuscripts, and the Library of Congress’s American Memory project which used a simplified form of TEI to encode a wide array of archival material types. A related schema was developed for the encoding of musical scores (MEI – Music Encoding Initiative format) which has enabled machine-reading of music archives, recently applied to the works of Beethoven and Delius, as well as a large corpus of mediaeval sheet music.

Busa’s methodology also spawned an array of lauded pioneer projects in the decades after the development of his Index, largely in the fields of literature and history. These include the Walt Whitman Archive, the Slave Societies Digital Archive, and the Emily Dickinson Archives, among many others. In addition to textual analysis, these projects also allowed for the virtual confederation of historical materials physically housed in diverse locations – a revolutionary hallmark of the field, now largely taken for granted.

Today the web is littered with other examples, large and small, including dynamic literary compendiums, image libraries, interactive historical mapping projects, visualizations, phone apps, and crowdsourcing projects. You will also find masses of scholarly articles and publications discussing the potential merits and issues inherent to the field, and several degree programs in the subject at universities such as UC Berkeley and Tufts. Some DH projects explore and confederate myriad works or subjects. Sometimes the same subjects or creators are revisited in different ways. Increasingly, the detailed textual encoding which typified early projects is expanding to allow for analysis of “big data”, i.e. wider patterns and trends, and “linked data”, expanding access across subject siloes to allow for a more holistic, big-picture understanding of a given topic.

The quote “hominem unius libri timeo” - I fear the man of a single book - is often attributed to Thomas Aquinas. The phrase has variously been interpreted as either critical or supportive of single-minded subject expertise (with ‘fear’ sometimes denoting reverence rather than admonition). Perhaps, in the world of digital humanities, both meanings apply equally. Father Busa’s Thomistic research was the result of decades of intensely focused academic study. And yet, without his interdisciplinary foray into the brave new world of computer science, the Index Thomisticus would have existed only in the confines of a few bound paper volumes, infinitely less dynamic and accessible, and – more importantly – its creator would not have spawned a global movement which continues to evolve new iterations and potentials today.

It is this marriage of academic rigor with the expansive potential of the digital world that typifies digital humanities as an emergent field, allowing knowledge previously encased in academic silos and physically static archives to be shared more easily with other institutions and simultaneously broadcast to the public at large. At their most successful, these projects breathe new life into the documentary and material record, recreate and enhance our knowledge and understanding of history, and bring it, with all its surprises and idiosyncrasies, into the limelight as never before.



