A Lexicon of Verb and -mente Adverb Collocations in Portuguese: Extraction from Corpora and Classification

Fiche du document

Date

19 septembre 2012

Discipline
Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Sujets proches En

Adverb

Citer ce document

Lucas Nunes Vieira et al., « A Lexicon of Verb and -mente Adverb Collocations in Portuguese: Extraction from Corpora and Classification », HAL-SHS : linguistique, ID : 10670/1.581xon


Métriques


Partage / Export

Résumé En

Collocations started to be a target of research in the twentieth century after FIRTH (1957) coined the term and called attention to the fact that the way we combine words in natural language is far from being unconstrained. In the sense of Firth, a pair or group of words can be considered a collocation if the probability for their co-occurrence exceeds chance levels. For a long time this concept has prevailed in the literature as the rationale behind the task of collocation extraction. However, the more recent formulation of MEL’ČUK (2003) provides a more semantic-based view on the phenomenon that does not necessarily coincide with an attested high frequency of the word combinations. According to Mel’čuk, the meaning of certain words would dictate the adjacent use of others, forming groups or pairs of base words and collocates.Concerning the linguistic pattern investigated in this study, namely pairs of verb and -mente (‘-ly’)-ending adverbs, the verb would be the base of the combination, while the adverb would be the collocate. The strategy here adopted for the extraction of this pattern profits both from Firth's and Mel’čuk’s formulations, since, at different stages, it relies both on frequency of distribution and on meaning-oriented human annotations.The corpus used for the extraction of verb-adverb bigrams was the CETEMPúblico (SANTOS & ROCHA, 2001) corpus of European Portuguese, consisting of 192M words of journalistic texts. This is, to the best of our knowledge, the largest freely distributed corpus of Portuguese. Albeit constituting just over 10% of all simple adverb occurrences in the corpus, adverbs ending in -mente, henceforth Adv-mente, represent in fact the majority of the simple-word lemmas of this grammatical class, based on data from the CETEMPúblico.While a number of initiatives at collocation extraction have relied substantially on a search for adjacent words, as CHOUEKA (1988), newer studies suggest that methods involving some level of syntactical parsing might prove more precise for certain linguistic patterns (SERETAN, 2011). In view of this, we experiment with a syntax-based approach for the extraction of verb-adverb pairs from the corpus. This pattern could be deemed a challenging one in respect to the extraction task, since adverbs can occupy different positions in the sentence, being commonly associated with a rather loose mobility in speech (BECHARA, 2003).In the remainder of this paper, we describe the syntax-based approach adopted to extract collocations from the corpus, explain the linguistically motivated classification of collocation candidates, and present an empirical evaluation of statistical association measures in identifying {V, Adv-mente} collocations. We conclude by discussing the appropriateness of the methods experimented with in view of {V, Adv-mente} pairs and by proposing future work.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en