Domain Adaptation for Text Classification with Weird Embeddings

Fiche du document

Date

3 septembre 2021

Discipline
Périmètre
Langue
Identifiants
Collection

OpenEdition Books

Organisation

OpenEdition

Licences

https://www.openedition.org/12554 , info:eu-repo/semantics/openAccess




Citer ce document

Valerio Basile, « Domain Adaptation for Text Classification with Weird Embeddings », Accademia University Press, ID : 10.4000/books.aaccademia.8250


Métriques


Partage / Export

Résumé 0

Pre-trained word embeddings are often used to initialize deep learning models for text classification, as a way to inject precomputed lexical knowledge and boost the learning process. However, such embeddings are usually trained on generic corpora, while text classification tasks are often domain-specific. We propose a fully automated method to adapt pre-trained word embeddings to any given classification task, that needs no additional resource other than the original training set. The method is based on the concept of word weirdness, extended to score the words in the training set according to how characteristic they are with respect to the labels of a text classification dataset. The polarized weirdness scores are then used to update the word embeddings to reflect task-specific semantic shifts. Our experiments show that this method is beneficial to the performance of several text classification tasks in different languages.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en