Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extraction

Fiche du document

Date

3 juillet 2023

Discipline
Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licences

http://creativecommons.org/licenses/by/ , info:eu-repo/semantics/OpenAccess




Citer ce document

Elise Lincker et al., « Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extraction », HAL-SHS : sciences de l'éducation, ID : 10670/1.0gslgh


Métriques


Partage / Export

Résumé En

Ensuring accessible textbooks for children with disabilities is essential for inclusive education. However, providing native accessibility for educational content remains a challenge. In the mean time, existing educational materials need to be adapted, for example by providing interactive versions to overcome difficulties caused by disabilities. In this context, our project aims to automatically adapt PDF textbooks to make them accessible to children with disabilities. The first step towards this adaptation involves extracting and structuring the content of textbooks. In this paper, we introduce textbook models, propose an automated extraction pipeline, and conduct preliminary experiments. Our textbook models are based on the various activities involved and provide layout and semantic information. They enable normalized and structured representations of educational content at both document and page levels, facilitating the automatic extraction process and the conversion to popular formats such as TEI and DocBook. In order to automatically extract PDF textbooks structure, our experiments, using a state-of-the-art multimodal transformer for a token classification task, demonstrate promising results. However, these experiments also highlight the difficulty of the task, especially cross-textbook collection generalization. Finally, we discuss the extraction pipeline and the directions of future work.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en