Publications:Ethiopic Document Image Database for Testing Character Recognition Systems
From ISLAB/CAISR
Title | Ethiopic Document Image Database for Testing Character Recognition Systems |
---|---|
Author | Yaregal Assabie and Josef Bigun |
Year | 2006 |
PublicationType | Report |
Journal | |
HostPublication | |
Conference | |
DOI | |
Diva url | http://hh.diva-portal.org/smash/record.jsf?searchId=1&pid=diva2:408389 |
Abstract | In this paper we describe the acquisition and content of a large database of Ethiopic documents for testing and evaluating character recognition systems. The Ethiopic Document Image Database (EDIDB) contains documents written in Amharic and Geez languages. The database was built from a variety of documents such as printouts, books, newspapers, and magazines. Documents written in various font types, sizes and styles were included in the database. Degraded and poor quality documents were also included in the database to represent the real life situation. A total of 1,204 pages were scanned at a resolution of 300 dpi and saved as grayscale images of JPEG format. We also describe an evaluation protocol for standardizing the comparison of recognition systems and their results. The database is made available to the research community through http://www.hh.se/staff/josef/. |