På svenska

Presentation Information     2014-04-14 (14:15)   •  The seminar room at Vi2

Speaker Anders Brun
Title From quill to bytes, datamining collections of historical documents
Abstract When printed or handwritten text is scanned or photographed, we obtain image files of the text in question. In this data format, however, the text is mainly useful for manual browsing and archiving. The individual characters, the meaning of the text and other interesting aspects of the text itself are hidden for further computerized analysis. Using Optical Character Recognition (OCR) or Handwritten Text Recogtition (HTR), we can extract the text to make it searchable, editable and accessible for large scale computerized data analysis. However, historical collections, in particular, present a target with varying degree of difficulty and a universal solution to the problem of extracting text from images simply does not exist today. 85 years after the first patent, OCR and HTR remains active fields of research. In this talk I will present q2b, "From quill to bytes", which is a VR funded project that sets out to discover and develop novel methods for large scale computerized analysis of historical printed and handwritten texts. It covers a brief historical overview of OCR, current state of the art methods for HTR and some preliminary results.