Publications:A base-line character recognition for syriac-aramaic
From ISLAB/CAISR
Title | A base-line character recognition for syriac-aramaic |
---|---|
Author | Elizabeth Tse and Josef Bigun |
Year | 2007 |
PublicationType | Conference Paper |
Journal | |
HostPublication | IEEE International Conference on Systems Man and Cybernetics Conference Proceedings |
Conference | IEEE International Conference on Systems, Man and Cybernetics, 7-10 Oct. 2007, Montreal, Que. |
DOI | http://dx.doi.org/10.1109/ICSMC.2007.4414012 |
Diva url | http://hh.diva-portal.org/smash/record.jsf?searchId=1&pid=diva2:408392 |
Abstract | Serto is the cursive alphabet of Syriac-Aramaic, which is used by the largest corpus of documents in libraries in Aramaic. A lingua franca, and often a source language, Aramaic has influenced major Judaic, Christian and Islamic thoughts as well as the development of science. The script is cursive, e.g. Arabic, and consequently it has a hand-writing appearance compared to Latin. Serto, and Aramaic in practice, has not an automatic character recognition system, OCR Most library documents are reproductions using printed characters. The readers would strongly benefit from having an OCR, as these reproductions are predominantly books, printed in the pre-computer era. We propose a segmentation-free OCR using linear symmetry features with an individual threshold for the tensors of the characters, and an ordered search sequence. It yields ~ 90 % correctly identified characters in the average. As a first recognition scheme for Serto, it represents a base-line OCR for Syriac-Aramaic. |