13 février 2019
info:eu-repo/semantics/openAccess
Simon Gabay, « OCRising 17th French prints », e-ditiones, ID : 10.58079/o2ot
In the past few years, OCR tools have dramatically increased their efficiency and accuracy. We have therefore decided to create a ground truth bank for 17th French prints – because of research we are currently carrying on stylometry, but also because it is one sure way to get non-nomalised 17th French texts. A first corpus Most of our training data is taken for literary texts, and especially plays. A first test has been carried on the following texts (c. 110,000 words and 19,000 ...