19 janvier 2021
info:eu-repo/semantics/openAccess , info:eu-repo/semantics/openAccess
Raphaël Barman et al., « Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers », Episciences.org, ID : 10.46298/jdmdh.6107
The massive amounts of digitized historical documents acquired over the lastdecades naturally lend themselves to automatic processing and exploration.Research work seeking to automatically process facsimiles and extractinformation thereby are multiplying with, as a first essential step, documentlayout analysis. If the identification and categorization of segments ofinterest in document images have seen significant progress over the last yearsthanks to deep learning techniques, many challenges remain with, among others,the use of finer-grained segmentation typologies and the consideration ofcomplex, heterogeneous documents such as historical newspapers. Besides, mostapproaches consider visual features only, ignoring textual signal. In thiscontext, we introduce a multimodal approach for the semantic segmentation ofhistorical newspapers that combines visual and textual features. Based on aseries of experiments on diachronic Swiss and Luxembourgish newspapers, weinvestigate, among others, the predictive power of visual and textual featuresand their capacity to generalize across time and sources. Results showconsistent improvement of multimodal models in comparison to a strong visualbaseline, as well as better robustness to high material variance.