Topic Modelling with Word Embeddings

This work aims at evaluating and comparing two different frameworks for the unsupervised topic modelling of the CompWHoB Corpus, namely our political-linguistic dataset. The first approach is represented by the application of the latent DirichLet Allocation (henceforth LDA), defining the evaluation of this model as baseline of comparison. The second framework employs Word2Vec technique to learn the word vector representations to be later used to topic-model our data. Compared to the previously defined LDA baseline, results show that the use of Word2Vec word embeddings significantly improves topic modelling performance but only when an accurate and task-oriented linguistic pre-processing step is carried out.

Topic Modelling with Word Embeddings

Fiche du document

Mots-clés En It Fr Und

Sujets proches En

Citer ce document

Métriques

Partage / Export

Résumé En It

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en