Collection Spotlight: AM Primary Source Databases and Optical Character Recognition (OCR)

Posted in:

Posted on:

April 18, 2023, 2 p.m.
Let optical character recognition (OCR) technology help you search the past.
written text on page with word highlighted; image of website screen search superimposed in upper left corner

by Winn Wasson, Social Science Librarian

Remember the days when proving you were not a robot involved deciphering and typing in contorted letters and numbers? With artificial intelligence able to read text, those days are long gone. However, optical character recognition (OCR) technology, which now allows machines to read words on a webpage, piece of paper or sign, can help with searching digitized archives.

Syracuse University Libraries has access to 79 databases of digitized archival material published by AM. These draw on textual and visual documents from a wide array of archives in the United Kingdom, the United States and other countries, and cover subjects such as gender studies, African American Studies, Native American and Indigenous Studies, food and drink and other commodities and international relations.

You can use OCR technology to search within the text of the digitized documents on most AM databases. When users click on a search result, they will see the instances of the search term within the primary source and can go directly to the relevant document section, where the search terms will be highlighted. Furthermore, as the image above shows, the OCR technology in AM databases can also read cursive handwriting in addition to individual letters.

Of course, users should still exercise their own knowledge when looking at search results. OCR can miss some instances of search terms or produce false positives. The shape of some letters has changed over time (consider the “s” in the Declaration of Independence and the U.S. Constitution that looks like an “f”?), and older primary sources might use outdated terms for objects, groups of people or ideas. Most importantly, users should read the context in which highlighted search terms appear, both to see if the highlighted search terms match the meaning users have in mind and, of course, to get the information contained in the full primary source. AM database users should take advantage of its targeted OCR search capabilities but remember to read full documents and take time to browse the plethora of primary sources these databases offer, because sometimes the most interesting discoveries in primary sources answer the questions you never thought to ask.

To provide feedback or suggest a title to add to the collection, please complete the Resource Feedback Form.

Back to posts

Previous Next