The CompWHoB Corpus: Computational Construction, Annotation and Linguistic Analysis of the White House Press Briefings Corpus

Fiche du document

Date

11 novembre 2016

Discipline
Périmètre
Langue
Identifiants
Collection

OpenEdition Books

Organisation

OpenEdition

Licences

https://creativecommons.org/licenses/by-nc-nd/4.0/ , info:eu-repo/semantics/openAccess




Citer ce document

Fabrizio Esposito et al., « The CompWHoB Corpus: Computational Construction, Annotation and Linguistic Analysis of the White House Press Briefings Corpus », Accademia University Press, ID : 10.4000/books.aaccademia.1467


Métriques


Partage / Export

Résumé En It

The CompWHoB (Computational White House press Briefings) Corpus, currently being developed at the University of Naples Federico II, is a corpus of spoken American English focusing on political and media communication. It represents a large collection of the White House Press Briefings, namely, the daily meetings held by the White House Press Secretary and the news media. At the time of writing, the corpus amounts to more than 20 million words, covers a period of time of twenty-one years spanning from 1993 to 2014 and it is planned to be extended to the end of the second term of President Barack Obama. The aim of the present article is to describe the composition of the corpus and the techniques used to extract, process and annotate it. Moreover, attention is paid to the use of the Temporal Random Indexing (TRI) on the corpus as a tool for linguistic analysis.

Il CompWHoB Corpus, in sviluppo presso l’Università di Napoli Federico II, è un corpus di parlato inglese-americano comprendente le conferenze condotte dai segretari statunitensi per i rapporti con la stampa, definite come Press Briefings. Allo stato attuale il corpus è composto da più di 20 milioni di parole e si estende dal 1993 sino a fine 2014. L’obiettivo di questo articolo è di descrivere la composizione del corpus, le tecniche utilizzate per estrarre ed annotare i testi, e mostrare come possa fungere da fonte di analisi linguistica attraverso l’utilizzo del Temporal Random Indexing (TRI).

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en