Category Archives: Biopython
I’m pleased to announce the acceptance of OBF’s 2010 Google Summer of Code students, listed in alphabetical order with their project titles and primary mentors: Mark Chapman (PM Andreas Prlic) – Improvements to BioJava including Implementation of Multiple Sequence Alignment … Continue reading
In another quirk to the FASTQ story, recent Illumina FASTQ files don’t actually use the full range of PHRED scores – and a score of 2 has a special meaning, The Read Segment Quality Control Indicator (RSQCI, encoded as ‘B’). … Continue reading
This is another blog post to highlight one of the neat tricks you’ll be able to do with Biopython 1.54 (which you can help test with the Biopython 1.54 beta release). It is often useful to be able to extract … Continue reading
One of the small changes coming in Biopython 1.54 (which you can try out already using the Biopython 1.54 beta) is to Bio.SeqIO and Bio.AlignIO. Previously the input and output functions had required file handles, but they will now also … Continue reading
A beta release for Biopython 1.54 is now available for download and testing.
O|B|F is in Google Summer of Code, student applications due to Google April 9, 2010. Continue reading
I’m delighted to announce an open access publication in Nucleic Acids Research describing the FASTQ file format based on the conventions agreed by the OBF projects: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ … Continue reading
We are pleased to announce the availability of Biopython 1.53, a new stable release of the Biopython library, three months after the release of Biopython 1.52. This is our first release since migrating from CVS to git for source code … Continue reading
This post is about paired end data (FASTA or FASTQ) and manipulating it with Biopython’s Bio.SeqIO module (see also FASTQ conversions & speeding up FASTQ).
Biopython’s SeqIO interface revolves around SeqRecord objects which can impose a speed penalty. For FASTQ files the quality string gets turned into a list of integers on parsing, and then re-encoded back to ASCII on writing. Working directly with the raw strings is less flexible, but much faster. Continue reading