Probabilistic base calling of Solexa sequencing data.

J. Rougemont; A. Amzallag; C. Iseli; L. Farinelli; I. Xenarios; F. Naef

Probabilistic base calling of Solexa sequencing data.

Fiche du document

Auteurs

Date

2008

Type de document

Articles

Périmètre

Publications

Langue

Anglais

Identifiants

doi: 10.1186/1471-2105-9-431
issn: 1471-2105

Source

Serveur académique Lausannois

Relations

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/doi/10.1186/1471-2105-9-431

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/pmid/18851737

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/eissn/1471-2105

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/urn/urn:nbn:ch:serval-BIB_54200D66E5BC8

Collection

Serveur Académique Lausannois

Organisation

Université de Lausanne et CHUV

Licences

info:eu-repo/semantics/openAccess , Copying allowed only for non-profit organizations , https://serval.unil.ch/disclaimer

Mots-clés 0

Bacteriophage phi X 174/genetics; Base Sequence/genetics; Chromosome Mapping/methods; Cluster Analysis; DNA, Viral/analysis; Expressed Sequence Tags; Pattern Recognition, Automated/methods; Quality Control; Sequence Analysis, DNA/methods; Software; Spectrometry, Fluorescence/methods

Sujets proches En

Vocational guidance--Religious aspects Calling Deoxyribonucleic acid TNA (Nucleic acid) Desoxyribonucleic acid Thymonucleic acid

Citer ce document

J. Rougemont et al., « Probabilistic base calling of Solexa sequencing data. », Serveur académique Lausannois, ID : 10.1186/1471-2105-9-431

Partage / Export

Résumé 0

BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.

Probabilistic base calling of Solexa sequencing data.

Fiche du document

Mots-clés 0

Sujets proches En

Citer ce document

Métriques

Partage / Export

Résumé 0

Par les mêmes auteurs

Sur les mêmes sujets

Exporter en