16 mars 2022
Esteban Marquer et al., « Siganalogies - morphological analogies from Sigmorphon 2016 and 2019 », Recherche Data Gouv, ID : 10.12763/MLCFIE
The siganalogies dataset contains morphological analogies built upon Sigmorphon 2016 and Sigmorphon 2019 in PyTorch. An analogical proportion is defined as a 4-ary relation written A:B::C:D and which reads "A is to B as C is to D". In this dataset, we manipulate morphological analogies, i.e., on analogies involving character strings, where the transformations between the objects correspond to morphological transformations of words (e.g., conjugation or declension). In our dataset, A, B, C, and D are words. An example in English would be "dog : dogs :: cat : cats". The dataset contains: (i) a copy of Sigmorphon 2019 and Sigmorphon 2016 extended with Japanese data, (ii) serialized objects, one for each language, containing the indices of the analogies and other relevant data, and (iii) the code necessary to manipulate the dataset and serialized data.