2 septembre 2021
Ce document est lié à :
info:eu-repo/semantics/altIdentifier/doi/10.1007/978-3-030-86334-0_33
Chahan Vidal-Gorène et al., « A Modular and Automated Annotation Platform for Handwritings: Evaluation on Under-Resourced Languages », HAL-SHS : histoire, ID : 10.1007/978-3-030-86334-0_33
There is today several approaches for automatic handwritten document analysis. HTR achieve in particular convincing results both in layout analysis and text recognition, but also in more up-to-date requests like name entity-recognition, script identification or manuscript datation. These systems are trained and evaluated with large open and specialized databases. Manual annotation and proofreading of handwritten documents is a key step to train such systems. However, it is a time-consuming task, especially when the formats required by the systems display considerable variations, or when the interfaces do not manage several level of information. We propose a new modular and collaborative interface online, ready-to-use, for multilevel annotation and quick-view solution for handwritten and printed documents, including for right-to-left languages. This interface undertakes the creation of customized projects, and the management, the conversion and the export of data in the different formats and standards of the state-of-the-art. It includes automated tasks for layout analysis and text lines extraction with high level fine-tuning capacities. We present this new interface through the case study of the creation of a database for Armenian, an under-resourced language with specific paleographical issues.