26 juillet 2017
https://creativecommons.org/licenses/by-nc-nd/4.0/ , info:eu-repo/semantics/openAccess
Fabrizio Esposito et al., « Topic Modelling with Word Embeddings », Accademia University Press, ID : 10.4000/books.aaccademia.1767
This work aims at evaluating and comparing two different frameworks for the unsupervised topic modelling of the CompWHoB Corpus, namely our political-linguistic dataset. The first approach is represented by the application of the latent DirichLet Allocation (henceforth LDA), defining the evaluation of this model as baseline of comparison. The second framework employs Word2Vec technique to learn the word vector representations to be later used to topic-model our data. Compared to the previously defined LDA baseline, results show that the use of Word2Vec word embeddings significantly improves topic modelling performance but only when an accurate and task-oriented linguistic pre-processing step is carried out.