Building a large dictionary of abbreviations for named entity recognition in Portuguese historical corpora

Fiche du document

Date

2008

Discipline
Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Sujets proches En

Brachygraphy Contractions

Citer ce document

Oto Vale et al., « Building a large dictionary of abbreviations for named entity recognition in Portuguese historical corpora », HAL-SHS : linguistique, ID : 10670/1.4n70gt


Métriques


Partage / Export

Résumé En

Abbreviated forms offer a special challenge in a historical corpus, since they show graphic variations, besides being frequent and ambiguous. The purpose of this paper is to present the process of building a large dictionary of historical Portuguese abbreviations, whose entries include the abbreviation and its expansion, as well as morphosyntactic and semantic information (a predefined set of named entities – NEs). This process has been carried out in a hybrid fashion that uses linguistic resources (such as a printed dictionary and lists of abbreviations) and abbreviations extracted from the Historical Dictionary of Brazilian Portuguese (HDPB) corpus via finite-state automata and regular expressions. Besides being useful to disambiguate the abbreviations found in the HDBP corpus, this dictionary can be used in other projects and tasks, mainly NE recognition.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en