How OCR Performance can Impact on the Automatic Extraction of Dictionary Content Structures

Resumen

In the last decade, OCR progress has triggered a massive trend towards the digitisation of legacy documents, with several Digital Humanities projects exploring means for structuring retro-digitised dictionaries. However there is a lack of awareness of the impact of the OCRs quality on the information extraction process. In this work, we shed light on the relationship between these two steps through experiments carried out with a TEI-based system for automatic parsing of dictionaries.

Publicación
19th annual Conference and Members’ Meeting of the Text Encoding Initiative Consortium (TEI) -What is text, really? TEI and beyond
Pedro Javier Ortiz Suárez
Pedro Javier Ortiz Suárez
Doctorante

Soy estudiante de doctorado en Ciencias de la Computación en Sorbonne Université y en el equipo de investigación ALMAnaCH en el Inria