Enrichment by Elimination, or: How to turn HTML into simple TEI using Python

Fiche du document

Date

23 mars 2014

Discipline
Types de document
Périmètre
Identifiant
Relations

Ce document est lié à :
info:eu-repo/semantics/reference/issn/2197-7682

Organisation

OpenEdition

Licence

info:eu-repo/semantics/openAccess



Citer ce document

Christof Schöch, « Enrichment by Elimination, or: How to turn HTML into simple TEI using Python », The Dragonfly's Gaze, ID : 10.58079/nwf0


Métriques


Partage / Export

Résumé 0

There are lots of full text repositories of literary works out there, be it the venerable Project Gutenberg (founded in 1971, when the internet was just a few dozen computers), a pioneer like Gallica (with increasing amounts of plain text in the 90-95% correct OCR range), or a crowdsourced efforts like Wikisource (with nifty quality indicators). Closer to my geographical location are initiatives like TextGrid's Digitale Bibliothek and the Deutsches Textarchiv (both very professional and acad...

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en