Beacon Street Diary blog

Transcription as Access

by Zachary Bodnar, Archivist

Let me present to you a story I am all too familiar with. You take from a box a well-preserved eighteenth-century record book. There is no title written onto the cover and no title page within the volume to describe what the pages within might contain. So, to determine what the record book is about, you begin flipping to random pages, only to realize that the book might as well be written in another language because the handwriting within is nearly indiscernible. Or perhaps it is, essentially, written in another language because, upon further study, you realize that the writer is using a form of shorthand. At that point, what are you to do?

If you are me, you pass that volume off to one of the CLA’s transcriptionists but for most people, without years of experience reading old manuscripts, eighteenth-century handwriting can be an insurmountable obstacle for understanding the object. I still struggle reading old manuscript materials and I now do have years handling and reading this type of material. Some of it is just a matter of distance; I will forever struggle with most s’s looking like f’s. But the fact is that poor handwriting is just as endemic to the eighteenth-century as it is today, and with cursive, where numerous letters are distinguished only by the location of a tail or the number of strokes, any poor handwriting can quickly turn an item into a comprehension nightmare.

It was for this exact reason that, within NEHH, the CLA set aside money to hire transcriptionists. Transcription, within the library and archives context, is the process of accurately representing text found on paper into a machine-readable format, such as a Microsoft Word document. By this process, we can provide access to the widest audience and elucidate texts which might otherwise see use only by those experienced in reading old manuscripts.  And by ensuring that transcription is in a machine-readable format, not only can we transmit the transcription to the widest possible audience, we ensure that databases and computer system can read and interpret that transcription.

Transcription though can take many forms, each with their own advantages and disadvantages. The style of transcription the CLA currently produces is highly accurate to the original in formatting and spelling. This means that every spelling error put onto paper is also reflected in the transcripts the CLA produces. This type of transcription, which is akin to creating a typed surrogate of the original, may be especially helpful to researchers whose careful study may hinge on a single misspelling. But equally valid is corrective transcription which takes the original and “fixes” it for a modern audience, erasing spelling mistakes, clearing up shorthand, cleaning up symbols, and changing those fancy s’s into our modern s. This approach may not be originalist in the strictest sense, but for an audience who simply wishes to read the meaning of the original, this approach may be best.

Transcription is a powerful tool for access, and as time goes on, there are ever more and more tools at our disposal that we hope to employ at the CLA. AI transcription technologies such as OCR (Optical Character Recognition), which can automatically transcribe print, such as that found in books, and HTR (Handwritten Text Recognition) can do a lot to provide more transcription, even if both are prone to mistakes. And we also hope to provide an ability to search within our transcriptions, a feature the NEHH viewer cannot yet accomplish, but which may finally be accomplished at the end of the CLA’s search for a DAMS. Transcription is important to the CLA.  At the heart of our mission is access and transcription provides significant access to all our users. And we are so very excited to show you how that will be accomplished in the months, and years, to come.