This project proposes to build parallel corpora for three sub-groups of the Sino-Tibetan family, covering a total of 8 little-described oral languages. These corpora will be made up of texts and lexical data. Texts that share a similar (sometimes nearly identical) narrative frame will be selected, drawing from the strong mythological traditions of the Greater Himalayan region. One set of parallel texts will be assembled for each sub-group: Kiranti (from Nepal), and Rgyalrong and Naish (from China). (Note that “parallel texts” in the sense of our project does not refer to texts with aligned translation, as in “parallel Classical Greek texts”: Greek on one page and English on the facing page, but to traditional texts put in parallel according to the junctures of their narrative frames.) The alignment of texts on the basis of their narrative frame will allow for the cross-language comparison of highly similar native materials; among other benefits for research, the typologically salient features of each sub-group will thus come to the fore with greater accuracy than can be obtained through the type of elicitation often used for cross-language comparison, e.g. the Pear Story. HimalCo includes the two essential steps of (i) first-hand data collection in the field (in Nepal and China) and (ii) state-of-the-art transcription, annotation and formatting of the entire data set. In addition to classical interlinear morphemic glossing, translation and sound synchronization as implemented in the Lacito archive (to which the participants are regular contributors), the narratives will be organized into parallel corpora in each of the three language subgroups, and the lexical data will be serve as the basis for talking dictionaries: dictionaries combined with sound recordings of individual entries (words spoken in isolation) and of example sentences.
La séance du 10/11 accueillait, en visioconférence du fait de la situation sanitaire générale, Lionel Maurel, directeur adjoint scientifique à l’INSHS, en charge de la Science Ouverte et des données de la recherche, également auteur de longue date du précieux blog S.I.Lex. Après une introduction exp...
La séance du 6/10/2020 était animée par Raphaëlle Chossenot et Alexis Michaud. Les intervenants ont présenté leur propre expérience d'édition et de publication et ont dressé un panorama à la fois pratique et critique des usages et des lieux de publication en linguistique et phonologie, sans s'interd...
La séance d'ouverture du séminaire a eu lieu le 30/09/2020 et a été consacrée à l'insertion des chercheurs dans les réseaux sociaux généralistes. Elle était animée par Julie Giovacchini, ingénieure de recherche au Centre Jean Pépin (UMR8230-CNRS-ENS-PSL). La crise récente de la covid-19 a souligné l...
En mars 2018, le campus CNRS de Villejuif accueillait une matinée de réflexion et sensibilisation organisée par Julie Giovacchini et Elodie Chacon sur le thème Science ouverte et libres savoirs : l’Open Access , un défi scientifique et politique. Il s’agissait notamment de : comprendre la Loi pour u...
Ressources liées aux interventions En préparation de l’exposé de Lionel Maurel, une lecture incontournable : le Carnet de recherche S.I.Lex, dont la fréquentation régulière est vivement recommandée. On pourra débuter la lecture ici, par exemple (choix tout subjectif!) Bibliographie et autres ressour...
Un groupe de collègues du campus CNRS de Villejuif a entrepris d'organiser un séminaire doctoral "Science ouverte : enjeux et méthodes". La page du séminaire sur le site de la Sorbonne Nouvelle expose la problématique et présente le programme des séances. Les billets de ce Carnet de recherche vont p...
(Billet en français - This post is only available in French) A la demande de Danièle Bourcier, introductrice des licences CreativeCommons en France, j'ai rédigé un court texte au sujet de l'emploi de ces licences dans l'entreprise de documentation des langues en danger qui constitue, depuis sa fonda...
Data sets on which our publications are based need to be made available, for obvious reasons of verifiability, replicability, and cumulative progress in research. This point is made eloquently in many publications. For instance: "Openness is one of the central values of science. Open scientific prac...
This is a follow-up on a previous post about experiments using automatic transcription for Yongning Na. It's been an eventful and exciting time! Oliver Adams has now released his automatic phoneme transcription tool. It's all open-source code. Here is the address: https://github.com/oadams/persephon...
Automatic speech recognition tools have strong potential for facilitating language documentation. This blog note reports on highly encouraging tests using automatic transcription in the documentation of Yongning Na, a Sino-Tibetan language of Southwest China. After twenty months of fieldwork (spread...
Book cover: Tone in Yongning Na The volume Tone in Yongning Na is now published. It is open-access: the book is freely available as a PDF from the publisher's website. It is open-source: the LaTeX code is freely available on GitHub (from the same page as the PDF). Last but not least, the analyses ar...
The completion date of the HimalCo project (2013-2016) is the day after tomorrow: December 31st, 2016. It's a good time to look back at the past 48 months, and forward to more and more connected data: online grammars with one-click links to data, and other improvements for the greater benefit of lin...
The HimalCo project allowed for the completion of the linguist's "three treasures" for the Yongning Na language: a dictionary; a collection of texts; and a monograph. This monograph was submitted in June 2016 to Language Science Press. Language Science Press publishes high quality, peer-reviewed ope...
This is a {quick [progress note]} (i.e. a quick note about the project's progress) rather than a {[quick progress] note} (i.e. a note reporting quick progress in the project)! :-) Dictionaries are big pieces, which raise many (most?) issues that linguists can confront when tagging and encoding data....
How to avoid reverberation when recording To study less-documented languages where they are spoken, know-how of various kinds is required, including the essentials of sound recording (along with some knowledge of botany and medicine, if possible). From a technical point of view, if one is going to s...
Following the earthquakes in Nepal this spring, it was difficult to get the authorizations to travel to Nepal on official French government business, and so the process was begun to try to bring Chandra Kala Thulung to Paris. While French citizens can buy visas upon arrival in Nepal, the reverse is...
Céline Buret brilliantly completed her two-year contract with CNRS as the main engineer in the HimalCo project. This post is to say a BIG Thank You and wish her all the best in the next steps in her career! The Python library that Céline developed to implement the LMF ISO standard for dictionaries i...
Like anyone with a strong attachment to Nepal--and many more besides--we were heartbroken to witness from afar the two major earthquakes which devastated Nepal on April 25th and May 12th. Thankfully, the individuals we work with most closely have all reported that they are safe. While there was seri...
Guillaume and I recently returned from fieldwork in Nepal. The main goal for the trip was to finalize--to the best of our abilities--the Khaling verb dictionary that we have been working on. Verbs are of course traditionally presented in dictionaries in their infinitive form, a form which--for Kiran...
The book “Endangered Languages and New Technologies” edited by Mari C. Jones has just been published at Cambridge University Press: http://www.cambridge.org/cr/academic/subjects/languages-linguistics/sociolinguistics/endangered-languages-and-new-technologies#contentsTabAnchor New technologies can he...
I have just returned from another very successful field trip to Nepal. The frequency of trips--3 a year thanks to the ANR grant--contributes to the efficiency of data collection, as it provides continuity from one visit to the next, not only for me but also for the speakers involved. Because I work...
Here is a short report on a field trip to Yongning that took place this month. This time, three students of Dr. Yang Liquan 杨立权 (on the right in the picture) asked to come along to learn more about fieldwork. Not that there is anything I can add to the wealth of available publications on the topic o...
The project blog has gone quiet in the past 5 months, as texts and dictionaries are in the making in a (nonpublic) GitHub repository. Tools for data migration from Toolbox / MDF format to XML and LaTEX are now operational. Painstakingly collected linguistic data and innovative technology make a head...
The article Jacques (2013) has been recently published. While the HimalCo project has already produced several presentations at various conferences, this is the first published article of the project. Like all the scholarly productions of our project, it is based on corpus data (on the Japhug langua...
Last month our team organised the third edition of the Workshop on Sino-Tibetan Languages of Sichuan, which followed a first workshop in Taipei organised by Jackson T.-S. Sun, and a second one in Beijing in 2010. The conference was hosted at the EHESS. Alexis Michaud set up a lively website for the...