24 octobre 2023
Ce document est lié à :
info:eu-repo/grantAgreement//965241/EU/European Consortium for Communicating Gene and Cell Therapy Information/EuroGCT
info:eu-repo/semantics/OpenAccess
Thomas Allouche et al., « Innovative therapy in European Parliament's positions: a numerical science-based vocabulary analysis », HAL SHS (Sciences de l’Homme et de la Société), ID : 10670/1.47df9b...
Once all documents containing our selected vocabulary were identified, we wished to conduct an automated research of associated documents. The method consists in producing a vector representation of each document in the collection in order to be able to compute a similarity metric. To produce these vector representations, we can define a common vocabulary for all documents and compute a weight for each term in a given document to obtain its vector. Regardless of the method used to produce these vectors, finding new relevant documents consists in looking for the nearest neighbours of target documents in the document vector space. We performed a preliminary experiment following this methodology using the TF/IDF weighting method. Unfortunately, this analysis did not yield interesting results so far, as documents with similar vectors did not seem to match our theme of innovation in biotechnologies. Future research may comprise the use of more advanced weighting methods based on embedding as well as considering document graphs. However, the very strong homogeneity of Parliament resolutions' texts makes this sort of tool difficult to operate.