I’m Chloë Farr, the Kula: Library Futures Academy Graduate Student Fellow in AI. I hold bachelor’s and master’s degrees in Linguistics, and I am currently a PhD student in Computer Science at UVic. My background in humanities sometimes gives me some unique perspectives in Computer Science.
Over the last few months, I’ve been deeply engaged in conversations with UVic Libraries staff and researchers across GLAM organizations, exploring innovative ways that AI and other algorithmic tools could support their work and contribute to library science. The opportunities surfaced are wide-ranging and energizing.
UVic Libraries has long been committed to the preservation and sharing of knowledge, meticulously documenting and storing knowledge, creating metadata and finding aids for archival materials. But there is a lot yet to be uncovered or publicized due to an overwhelming amount of manual labor such endeavors require–such as manually transcribing thousands of hours of audio recordings, typing out handwritten or typed documents, and writing detailed descriptions of tens of thousands of photos and other graphical material.
The posts I’ll be making here are to highlight opportunities, our line of investigation, how and whom they would benefit, and how we plan on mitigating foreseen risks.
Primary Investigation
Many of my conversations around AI focus on a well-established technology: optical character recognition, aka OCR, which converts alphabetical characters in image files to machine-readable text. When it comes to historical documents found in many of our archives and collections, OCR still falls short for a number of reasons:
- Artifacts on the physical materials (stray hairs, smudged ink, type bleeding through thin paper)
- Artifacts created in the scanning/photographing process (bad lighting, folded pages, scratches on the film)
- Complex typography
- Complex textual layouts (nested articles in newspapers, sloped handwriting, warping data tables).
Each of these, individually, creates unique barriers to good OCR outputs. All together, they can really affect findability in a collection since a word or phrase might not be readable/findable. For example if you were looking for my name “Chloe” in a collection, it might have been rendered as “CMoe”, which would make finding my name impossible in a search box.
OCR technology today is pretty good, but it, too, has a history and early OCR technology doesn’t match what is possible today. Libraries that were at the forefront of digitization and OCR at the beginning of the 21st century might have thousands if not millions of documents with not-so-perfect text recognition.
I’ve made OCR my primary focus both in my own graduate research and in my graduate fellowship with KULA, in concert with partners across North America. New approaches to this problem are being explored with the assistance of AI. Our goal is to make accessible the content that already exists by focusing on pattern recognition over content interpretation.
Improving OCR technology would open access to a far deeper reservoir of historical knowledge than we can easily interact with now. With more accurate OCR for archival documents, the span of truly usable data can reach further back than ever, letting us investigate periods that were previously inaccessible at scale. Researchers would have more data to analyze, history would be easier to navigate. And documents with better text conversion, students can focus their valuable time on analyzing data to ask thoughtful questions rather than finding and cleaning data.
Keep an eye out for upcoming posts on our work with OCR, which will spotlight a variety of document types I am working with, and other projects we at KULA are working on.

