2012
Copyright PERSEE 2003-2024. Works reproduced on the PERSEE website are protected by the general rules of the Code of Intellectual Property. For strictly private, scientific or teaching purposes excluding all commercial use, reproduction and communication to the public of this document is permitted on condition that its origin and copyright are clearly mentionned.
Dominique Longrée et al., « Latin du haut Moyen Âge et annotation morphosyntaxique automatique : quelles perspectives ? », MOM Éditions (documents), ID : 10670/1.3d638f...
This paper assesses the performance and the interest of using taggers for the morphosyntactic annotation of Early Medieval Latin texts. With this aim in view, we used the TnT tagger because of its good performance in the tagging of Classical Latin texts (see Poudat, Longrée 2009). The training corpora are made up of Classical and Medieval texts from the LASLA (Laboratoire d’Analyse Statistique des Langues anciennes) database. The subsets used for the tests are for their part taken from the Hagiographic Latin texts of the same database, and were chosen according to their relevance for evaluating the tagger’s sensitivity to stylistic, diachronic or diatopic variations. The tagger’s reliability is assessed using various training corpora and we finally show how a tagger can be used as a real heuristic instrument in order to show off proximities or distances between texts and between corpora or sub-corpora.