14 septembre 2022
Isabelle Auby et al., « Malabar project: datasets of pairwise distances between reads per sample for rbcL marker. », Recherche Data Gouv, ID : 10.57745/BAMEWX
What it contains This dataset contains hdf5 files of pairwise distances between reads in a sample, with one file per sample. Samples Samples ar combinations of one location over 4 one season over 4 one tide coefficent among spring/neap one tide among high/low one position between pelagic and benthic As there are no benthic samples at high tide, this leads to 4 x 4 x 2 x 3 = 96 samples. Among them, 32 dissimilarity arrays have been computed, and are available here. Sample names The naming f samples is explained here on an example: sample 190204_PM_PEL_Tey_f20_rbcL. This should be read as: 190204: the date of samplng; from the date, one can derive the season (winter) and the type of tide PM: the tide, PM for high tide and BM for low tide PEL: the water column: PEL for pelagic, BEN for benthic Tey: the location (B13 for Bouée 13, Com for Comprian, Jac for jaquet, Tey for Teychan) f20 is an indicator of the filter flering DNA (here: 20 µ); other values (in µ) are 3 and 0.2 rbcL is the marker used, here rbcL; other marker is 18S-V4, labeled as 18S short description Each file is in hdf5 format with three datasets (in the sense of hdf5 format): distances: array of floats seqid: list of strings; list of read identifiers word: list of strings; reads Dissimilarities have been computed with Smith-Waterman algorithm for local alignment score.