TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI

Fiche du document

Date

3 septembre 2021

Discipline
Type de document
Périmètre
Langue
Identifiant
Relations

Ce document est lié à :
info:eu-repo/semantics/reference/issn/2162-5603

Organisation

OpenEdition

Licences

https://creativecommons.org/licenses/by/4.0/ , info:eu-repo/semantics/openAccess




Citer ce document

Christophe Parisse et al., « TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI », Journal of the Text Encoding Initiative, ID : 10.4000/jtei.3464


Métriques


Partage / Export

Résumé 0

CORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in corpus linguistics, especially on spoken language corpora. Because of the time required to collect and transcribe spoken language resources, their number is limited and thus corpora need to be interoperable and reusable in order to improve research on themes such as phonology, prosody, interaction, syntax, and textometry. To help researchers reach this goal, CORLI has designed a pair of tools: TEICORPO to assist in the conversion and use of spoken language corpora, and TEIMETA for metadata purposes. TEICORPO is based on the principle of an underlying common format, namely TEI XML as described in its specification for spoken language use (ISO 2016). This tool enables the conversion of transcriptions created with alignment software such as CLAN, Transcriber, Praat, or ELAN as well as common file formats (CSV, XLSX, TXT, or DOCX) and the TEI format, which plays the role of a lossless pivot format. Backward conversion is possible in many cases, with limitations inherent in the destination target format. TEICORPO can run the Treetagger part-of-speech tagger and the Stanford CoreNLP tools on TEI files and can export the resulting files to textometric tools such as TXM, Le Trameur, or Iramuteq, making it suitable for spoken language corpora editing as well as for various research purposes.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en