<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>O&#124;B&#124;F News &#187; Community</title>
	<atom:link href="http://news.open-bio.org/news/category/community/feed/" rel="self" type="application/rss+xml" />
	<link>http://news.open-bio.org/news</link>
	<description>Open Source Bioinformatics news</description>
	<lastBuildDate>Wed, 10 Mar 2010 22:45:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
		<item>
		<title>BOSC 2010 Call for Abstracts</title>
		<link>http://news.open-bio.org/news/2010/03/bosc-2010-call-for-abstracts/</link>
		<comments>http://news.open-bio.org/news/2010/03/bosc-2010-call-for-abstracts/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 01:19:29 +0000</pubDate>
		<dc:creator>Kam Dahlquist</dc:creator>
				<category><![CDATA[BOSC/ISMB]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[BOSC]]></category>
		<category><![CDATA[open-source]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=612</guid>
		<description><![CDATA[<br/>Abstract submissions for the 11th Annual Bioinformatics  Open Source Conference (BOSC 2010) are now open.

At-a-glance
BOSC is an ISMB 2010 Special Interest Group (SIG)
Date: July 9-10, 2010
Location: Boston, Massachusetts, USA
BOSC 2010 web site: http://www.open-bio.org/wiki/BOSC_2010
Abstract submission via Open Conference System site:  http://events.open-bio.org/BOSC2010/openconf.php
E-mail: bosc@open-bio.org
Bosc-announce list:  http://lists.open-bio.org/mailman/listinfo/bosc-announce
Important Dates
April 15: Abstract deadline
May 5:  Notification of accepted abstracts
May 28: Early [...]]]></description>
			<content:encoded><![CDATA[<br/><p><span><span style="font-size: x-small"><strong>Abstract submissions for the 11th Annual Bioinformatics  Open Source Conference (BOSC 2010) are now open.</strong><br />
</span></span></p>
<p><span><span style="font-size: x-small"><strong>At-a-glance</strong><br />
BOSC is an ISMB 2010 Special Interest Group (SIG)<br />
Date: July 9-10, 2010<br />
Location: Boston, Massachusetts, USA<br />
BOSC 2010 web site: <a href="http://www.open-bio.org/wiki/BOSC_2010">http://www.open-bio.org/wiki/BOSC_2010</a><br />
Abstract submission via Open Conference System site:  <a href="http://events.open-bio.org/BOSC2010/openconf.php">http://events.open-bio.org/BOSC2010/openconf.php</a><br />
E-mail: bosc@open-bio.org<br />
Bosc-announce list:  <a href="http://lists.open-bio.org/mailman/listinfo/bosc-announce">http://lists.open-bio.org/mailman/listinfo/bosc-announce</a></span></span></p>
<p><strong>Important Dates</strong><br />
<strong>April 15: Abstract deadline</strong><br />
May 5:  Notification of accepted abstracts<br />
May 28: Early Registration Discount Cut-off date<br />
July 8-9:  Codefest 2010<br />
<strong>July 9-10: BOSC 2010</strong><br />
August 15:  Manuscript deadline for BOSC 2010 Proceedings published in  BMC Bioinformatics</p>
<p>The Bioinformatics Open Source Conference (BOSC) is sponsored by the  Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to  promoting the practice and philosophy of Open Source software  development within the biological research community. To be considered  for acceptance, software systems representing the central topic in a  presentation submitted to BOSC must be licensed with a recognized Open  Source License, and be freely available for download in source code  form.</p>
<p>We have some exciting things planned this year, including:</p>
<ul>
<li><span><span style="font-size: x-small">Codefest 2010 programming session for the two days preceeding BOSC:   See <a href="http://www.open-bio.org/wiki/Codefest_2010">http://www.open-bio.org/wiki/Codefest_2010</a> for  details.</span></span></li>
<li>OpenBio Solution Challenge:  See session description below and <a href="http://www.open-bio.org/wiki/SolutionChallenge">http://www.open-bio.org/wiki/SolutionChallenge</a> for  details.</li>
<li>Student Travel Fellowships:  Through generous sponsorship from Eagle  Genomics and an anonymous donor, we are pleased to announce the  competition for three Student Travel Awards for BOSC 2010. Each winner  will be awarded $250 to defray the costs of travel to BOSC 2010.  See <a href="http://www.open-bio.org/wiki/BOSC_2010#Student_Travel_Awards">http://www.open-bio.org/wiki/BOSC_2010#Student_Travel_Awards</a> for details.</li>
<li>First-ever BOSC Proceedings will be published in the Open Access  journal, BMC Bioinformatics.  Manuscripts will be due after BOSC on  August 15.  See <a href="http://www.open-bio.org/wiki/BOSC_2010#First-ever_Published_BOSC_Proceedings">http://www.open-bio.org/wiki/BOSC_2010#First-ever_Published_BOSC_Proceedings</a> for details.</li>
<li>Sessions on approaches to analyzing high-throughput &#8216;omics data,  cloud-based approaches to improving software and data accessibility, the  semantic web in open source bioinformatics, see below:</li>
</ul>
<p><span><span style="font-size: x-small"><br />
We invite abstracts for talks at the following sessions:</span></span></p>
<p><strong>OpenBio SolutionChallenge</strong> &#8212; Bioinformatics library providers: please  join us in a friendly competition to solve a shared biological problem,  demonstrating the utility of your toolkit alongside other developers.  Instead of the traditional Bio* updates that we&#8217;ve had at previous  conferences, this year, we&#8217;re planning to organize these talks around a  central theme: the OpenBio Solution Challenge. We start with a  biological question of general interest, and the project talks will  focus around how you would solve that problem using your toolkit and  programming language. This is meant to provide a challenge for OpenBio  contributors, a nice tutorial style overview of various projects and  approaches for other programmers, and a fun opportunity to compete and  learn from other projects. Conference attendees will vote on their  favorite solution, with the winner receiving fame and fortune (warning:  fortune not guaranteed). Specific challenges are being discussed on the  SolutionChallenge page and through the various Bio* mailing lists.  Alternately, each project could highlight a challenge that they  particularly do well, focusing tutorial-style on how to solve a  particular problem.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/03/bosc-2010-call-for-abstracts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BioPerl at GMOD Meeting 2010</title>
		<link>http://news.open-bio.org/news/2010/01/bioperl-at-gmod-meeting-2010/</link>
		<comments>http://news.open-bio.org/news/2010/01/bioperl-at-gmod-meeting-2010/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 03:58:58 +0000</pubDate>
		<dc:creator>Chris Fields</dc:creator>
				<category><![CDATA[BioPerl]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=603</guid>
		<description><![CDATA[<br/>BioPerl developers and users attended the BioPerl satellite meeting on January 13th, just prior to the GMOD Meeting.  Several items were covered on the agenda:

In order to start addressing whole genome data with more lightweight objects, we are planning on setting up a lightweight Bio::SeqI object that has a flexible DB backend (i.e. Bio::DB::SeqFeature::Store or similar). [...]]]></description>
			<content:encoded><![CDATA[<br/><p>BioPerl developers and users attended the <a href="http://www.bioperl.org/wiki/GMOD_2010_Meeting">BioPerl satellite meeting</a> on January 13th, just prior to the <a href="http://gmod.org/wiki/January_2010_GMOD_Meeting">GMOD Meeting</a>.  Several items were covered on the agenda:</p>
<ul>
<li>In order to start addressing whole genome data with more lightweight objects, we are planning on setting up a lightweight Bio::SeqI object that has a flexible DB backend (i.e. Bio::DB::SeqFeature::Store or similar).  We are also contemplating adding lazy parsing for some parsers, possibly using the Bio::PullParserI methods (or similar) that Sendu Bala created.</li>
<li>After a final  1.6 branch point release, we may &#8216;freeze&#8217; BioPerl in a maintenance mode, primarily so that we can reorganize core into several more easily installed subdistributions on a branch.  New modules will essentially be additional separate repos that will depend on BioPerl core.  This reorganization has been discussed for a few years now, and as we edge closer to starting this (probably this spring) we&#8217;ll announce more details.</li>
<li>Some initial thoughts on how to handle circular genomes more efficiently.  We essentially do this already, but it isn&#8217;t full-proof.</li>
<li>Need some significant time dedicated towards GFF3-based coding (reimplement FeatureIO but allow some flexibility).  Rob Buels had started the initial run at splitting out FeatureIO, so next step is a true reimplementation.</li>
<li>We don&#8217;t plan on including Moose support for the immediate future, feeling that it would be better to reimplement some of the classes from scratch using Moose and similar as a BioPerl 2.0, or possibly await the impending Rakudo Perl 6 alpha and start afresh using that instead of Moose.</li>
</ul>
<p>Anything we missed?  Anything you would like to address?  Please add comments and we&#8217;ll discuss them on list.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/01/bioperl-at-gmod-meeting-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BOSC 2010 Request for Input</title>
		<link>http://news.open-bio.org/news/2009/12/bosc-2010-request-for-input/</link>
		<comments>http://news.open-bio.org/news/2009/12/bosc-2010-request-for-input/#comments</comments>
		<pubDate>Sat, 19 Dec 2009 01:39:24 +0000</pubDate>
		<dc:creator>Kam Dahlquist</dc:creator>
				<category><![CDATA[BOSC/ISMB]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[BOSC]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=583</guid>
		<description><![CDATA[<br/>The BOSC organizing committee is soliciting input on the planning of BOSC 2010 so that we can make it a successful and productive conference for the O&#124;B&#124;F community.  You may send your suggestions to the bosc@open-bio.org e-mail address or add suggestions to the BOSC 2010 talk/discussion wiki page at: http://www.open-bio.org/wiki/Talk:BOSC_2010. Please respond to the questions in this post by January 8, 2010.
]]></description>
			<content:encoded><![CDATA[<br/><p>BOSC 2010 is currently in the planning stages. It will be held for 2 days in conjunction with the 18th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2010) in Boston, Massachusetts, USA. The dates of BOSC 2010 are July 9-10; the main ISMB Conference runs July 11-13, 2010.  The BOSC 2010 web site can be accessed here:  <a href="http://www.open-bio.org/wiki/BOSC_2010">http://www.open-bio.org/wiki/BOSC_2010</a>.</p>
<p>The BOSC organizing committee is soliciting input on the planning of BOSC 2010 so that we can make it a successful and productive conference for the O|B|F community.  You may send your suggestions to the <a class="external text" title="mailto:bosc@open-bio.org" rel="nofollow" href="mailto:bosc@open-bio.org">bosc@open-bio.org</a> e-mail address  or add suggestions to the BOSC 2010 talk/discussion wiki page at: <a href="http://www.open-bio.org/wiki/Talk:BOSC_2010.%A0">http://www.open-bio.org/wiki/Talk:BOSC_2010. </a>Please respond to any or all of the questions below:</p>
<p>1.  For the last several years BOSC has consisted mainly of one or two keynote presentations, other talks chosen from among the submitted abstracts organized into sessions by topic, updates from the Bio* projects, Lightning Talks, and informal Birds of a Feather sessions.  Would you rather see BOSC continue in this fashion, or would you support changing the format to one or all of the following:</p>
<ul>
<li> <strong>Tutorials</strong> where there were in depth demonstrations and code tutorials. This could be lead off by the OBF projects instead of the traditional update talks, but could feature any open source projects interested. These would be hands on sessions with real code examples, with a focus on teaching people how to leverage various code bases to make real life work easier.  <strong>Would you be willing to organize/lead such a session for your project?</strong></li>
<li><strong>Discussion</strong> following the hands on tutorials, these would be interactive sessions focused around dealing with unsolved issues. The &#8220;speaker&#8221; would be responsible for setting up a set of discussion topics around an issue of interest, and then facilitating ideas and opinions from the attendees. The goals would be to talk through problems and gather a consensus about options for solving them.  <strong>Would you be willing to organize/lead such a session for your project?</strong></li>
<li><strong>Mini-hackathon</strong> either before, during, or after the 2-day BOSC.  The subject of the hackathon would need to be organized by the individual project leaders/teams.  Some suggestions would be adding/extending support for next-gen sequencing; organizing bugs/tasks so that new beginners can start contributing to the project easily and working on some of those bugs/tasks; organizing some type of contest like the Genome Annotation Assessment Project (GASP) where solutions from different projects compete on arriving at some type of goal.  <strong>Would you be willing to organize/lead this type of session?</strong></li>
<li>Organizing/creating a <strong>LiveCD</strong> or Debian download of Bio* projects with documentation to support outreach to the larger bioinformatics community.  <strong>Would you be willing to organize/lead this type of session?</strong></li>
<li>What <strong>session topics</strong> would you like to see represented for traditional talks?</li>
<li> Who would you like to hear as a <strong>keynote speaker</strong>?</li>
</ul>
<p>2.  The BOSC 2010 organizing committee is in discussion with an open access journal to publish a formal Proceedings for BOSC.  If you are planning on submitting an abstract for BOSC 2010, are you interested in submitting a more formal paper to the BOSC proceedings, given that as the author you would need to pay the page charges that could run between US$500-1000?  We are likely to move ahead with plans to have a proceedings, but it would be helpful to know how many submissions to expect.</p>
<p>3.  Call for <strong>volunteers</strong>.  Organizing tutorial/hackathons and such will only be possible if individuals step forward to lead these sessions.  Please let us know if you would be willing to serve in any capacity.  We also need volunteers to review abstracts for the more &#8220;traditional&#8221; sessions, please let us know if you are willing to do this as well.</p>
<p><strong>Timeline:</strong> We are planning on putting out the Call for Abstracts in mid-January.  To be on track, we would like to receive your input by <strong>Friday, January 8</strong>.  If you are willing to step forward to organize a tutorial/discussion/hackathon, you would need to commit by that time, although there would still be some more time to put the actual program together in the new year.</p>
<p>Thanks and Happy Holidays!</p>
<p>Kam Dahlquist<br />
Chair, BOSC 2010 on behalf of the BOSC 2010 Organizing committee:<br />
Brad Chapman, Michael Heur, Darin London, Anton Nekrutenko, Steffen Moeller, Jim Procter<br />
And the O|B|F Board:<br />
Chris Dagdigian, Nomi Harris, Hilmar Lapp, Jason Stajich</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2009/12/bosc-2010-request-for-input/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sanger FASTQ format and the Solexa/Illumina variants</title>
		<link>http://news.open-bio.org/news/2009/12/nar-fastq-format/</link>
		<comments>http://news.open-bio.org/news/2009/12/nar-fastq-format/#comments</comments>
		<pubDate>Thu, 17 Dec 2009 16:28:55 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[BioJava]]></category>
		<category><![CDATA[BioPerl]]></category>
		<category><![CDATA[BioRuby]]></category>
		<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[EMBOSS]]></category>
		<category><![CDATA[FASTQ]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=574</guid>
		<description><![CDATA[<br/>I&#8217;m delighted to announce an open access publication in Nucleic Acids Research describing the FASTQ file format based on the conventions agreed by the OBF projects:
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
Peter J. A. Cock (Biopython), Christopher J. Fields (BioPerl), Naohisa Goto (BioRuby), Michael L. Heuer (BioJava) [...]]]></description>
			<content:encoded><![CDATA[<br/><p>I&#8217;m delighted to announce an open access publication in <em>Nucleic Acids Research</em> describing the FASTQ file format based on the conventions agreed by the OBF projects:</p>
<blockquote><p><a href="http://dx.doi.org/10.1093/nar/gkp1137">The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants</a><br />
Peter J. A. Cock (<a href="http://www.biopython.org">Biopython</a>), Christopher J. Fields (<a href="http://www.bioperl.org">BioPerl</a>), Naohisa Goto (<a href="http://www.bioruby.org">BioRuby</a>), Michael L. Heuer (<a href="http://www.biojava.org">BioJava</a>) and Peter M. Rice (<a href="http://emboss.sourceforge.net/">EMBOSS</a>).<br />
Nucleic Acids Research, <a href="http://dx.doi.org/10.1093/nar/gkp1137">doi:10.1093/nar/gkp1137</a></p></blockquote>
<p>This will hopefully serve as a reference describing the original standard Sanger FASTQ, and the two variants from Solexa/Illumina, and how to inter-convert between them.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2009/12/nar-fastq-format/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interleaving paired FASTQ files with Biopython</title>
		<link>http://news.open-bio.org/news/2009/12/interleaving-paired-fastq-files-with-biopython/</link>
		<comments>http://news.open-bio.org/news/2009/12/interleaving-paired-fastq-files-with-biopython/#comments</comments>
		<pubDate>Mon, 14 Dec 2009 14:03:41 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[HOWTO]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[FASTQ]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=513</guid>
		<description><![CDATA[<br/>This post is about paired end data (FASTA or FASTQ) and manipulating it with Biopython&#8217;s Bio.SeqIO module (see also FASTQ conversions &#38; speeding up FASTQ).

There are two main ways of presenting paired end data in FASTA or FASTQ files:

Paired files, with matching entries for the forward and reverse reads (probably the norm with Illumina data)
Single [...]]]></description>
			<content:encoded><![CDATA[<br/><p>This post is about paired end data (FASTA or FASTQ) and manipulating it with Biopython&#8217;s <a href="http://biopython.org/wiki/SeqIO">Bio.SeqIO</a> module (see also <a href="http://news.open-bio.org/news/2009/09/biopython-convert-function/">FASTQ conversions</a> &amp; <a href="http://news.open-bio.org/news/2009/09/biopython-fast-fastq/">speeding up FASTQ</a>).<br />
<span id="more-513"></span></p>
<p>There are two main ways of presenting paired end data in FASTA or FASTQ files:</p>
<ul>
<li>Paired files, with matching entries for the forward and reverse reads (probably the norm with Illumina data)</li>
<li>Single files, with alternating entries for the forward and reverse reads (used by Velvet)</li>
</ul>
<p>Converting between these two is a relatively common operation, and is normally pretty easy. There was a <a href="http://lists.open-bio.org/pipermail/biopython/2009-September/005584.html">short example</a> of how you might do this in Biopython on a recent (September 2009) Velvet users/Biopython mailing list discussion. That script didn&#8217;t check the record IDs matched up (but neither does the Perl script shuffleSequences_fastq.pl included with Velvet for this task).</p>
<p>It would be safer to check the record IDs do match. However, there are several different naming schemes for reads, most typically suffixes of <tt>/1</tt> and <tt>/2</tt>, but also things like <tt>.f</tt> and <tt>.r</tt> get used. In the case of FASTQ files from the NCBI SRA, the reads have no suffixes, so to feed those into Velvet you may want to check they are equal and then add a suffix as shown below.</p>
<p><code> </code></p>
<pre>#This Python script requires Biopython 1.51 or later
from Bio import SeqIO
import itertools

#Setup variables (could parse command line args instead)
file_f = "SRR001666_1.fastq"
file_r = "SRR001666_2.fastq"
file_out = "SRR001666_interleaved.fastq"
format = "fastq" #or "fastq-illumina", or "fasta", or ...

def interleave(iter1, iter2) :
    for (forward, reverse) in itertools.izip(iter1,iter2):
        assert forward.id == reverse.id
        forward.id += "/1"
        reverse.id += "/2"
        yield forward
        yield reverse

records_f = SeqIO.parse(open(file_f,"rU"), format)
records_r = SeqIO.parse(open(file_r,"rU"), format)

handle = open(file_out, "w")
count = SeqIO.write(interleave(records_f, records_r), handle, format)
handle.close()
print "%i records written to %s" % (count, file_out)
</pre>
<p>&nbsp;</p>
<p>This example uses the <a href="http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=viewer&amp;m=data&amp;s=viewer&amp;run=SRR001666">SRR001666</a> files from the <a href="ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX000/SRX000430/">NCBI SRA FTP site</a>.</p>
<p>Now that works fine, and just by changing the filenames and the format name this could be used on FASTA data (or another supported file format). The bad news is it took 14 minutes to produce a 2GB FASTQ. However, going a little more low-level <a href="http://news.open-bio.org/news/2009/09/biopython-fast-fastq/">as discussed before</a> can really pay off. This FASTQ-only version takes just 2 minutes:</p>
<p><code> </code></p>
<pre>#This Python script requires Biopython 1.51 or later
from Bio.SeqIO.QualityIO import FastqGeneralIterator
import itertools

#Setup variables (could parse command line args instead)
file_f = "SRR001666_1.fastq"
file_r = "SRR001666_2.fastq"
file_out = "SRR001666_interleaved.fastq"

handle = open(file_out, "w")
count = 0

f_iter = FastqGeneralIterator(open(file_f,"rU"))
r_iter = FastqGeneralIterator(open(file_r,"rU"))
for (f_id, f_seq, f_q), (r_id, r_seq, r_q) \
in itertools.izip(f_iter,r_iter):
    assert f_id == r_id
    count += 2
    #Write out both reads with "/1" and "/2" suffix on ID
    handle.write("@%s/1\n%s\n+\n%s\n@%s/2\n%s\n+\n%s\n" \
                 % (f_id, f_seq, f_q, r_id, r_seq, r_q))
handle.close()
print "%i records written to %s" % (count, file_out)</pre>
<p>&nbsp;</p>
<p>You can make this a little faster still by missing out most of the validation done by the Biopython FASTQ parser &#8211; but personally I wouldn&#8217;t take that risk. I&#8217;d much rather know about any errors in the data.</p>
<p style="text-align: right;">Peter</p>
<p>P.S.</p>
<p>Things get more interesting if you want to do quality filtering or trimming. If only one of a pair passes the quality assurance step, then you may want to keep it and treat it as an unpaired read. To give such cleaned up data to Velvet, you would need one file of alternating paired end reads, and a separate file of the orphaned effectively unpaired reads. That deserves another post going into more detail&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2009/12/interleaving-paired-fastq-files-with-biopython/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BioPerl interview in latest FLOSS Weekly</title>
		<link>http://news.open-bio.org/news/2009/11/bioperl-interview-for-floss-weekly/</link>
		<comments>http://news.open-bio.org/news/2009/11/bioperl-interview-for-floss-weekly/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 20:27:50 +0000</pubDate>
		<dc:creator>Chris Fields</dc:creator>
				<category><![CDATA[BOSC/ISMB]]></category>
		<category><![CDATA[BioPerl]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=508</guid>
		<description><![CDATA[<br/>Two of the core BioPerl developers, Jason Stajich and Chris Fields, were interviewed by FLOSS Weekly.  The interview is now available as an MP3 on the FLOSS Weekly website; several streaming versions (including podcast) are also available.
]]></description>
			<content:encoded><![CDATA[<br/><p>Two of the core BioPerl developers, Jason Stajich and Chris Fields, were interviewed by FLOSS Weekly.  The interview is now available <a href="http://www.podtrac.com/pts/redirect.mp3/twit.cachefly.net/floss0096.mp3" target="_blank">as an MP3</a> on the <a href="http://twit.tv/floss96">FLOSS Weekly</a> website; several streaming versions (including podcast) are also available.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2009/11/bioperl-interview-for-floss-weekly/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://www.podtrac.com/pts/redirect.mp3/twit.cachefly.net/floss0096.mp3" length="33809093" type="audio/mpeg" />
		</item>
		<item>
		<title>BioPerl 1.6.1 released</title>
		<link>http://news.open-bio.org/news/2009/09/bioperl-1-6-1-released/</link>
		<comments>http://news.open-bio.org/news/2009/09/bioperl-1-6-1-released/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 17:55:27 +0000</pubDate>
		<dc:creator>Chris Fields</dc:creator>
				<category><![CDATA[BioPerl]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[FASTQ]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=448</guid>
		<description><![CDATA[<br/>We are pleased to announce the immediate availability of BioPerl 1.6.1, the latest release of BioPerl&#8217;s core code.  You can grab it here:
Via CPAN:
http://search.cpan.org/~cjfields/BioPerl-1.6.1/
Via the BioPerl website:
http://bioperl.org/DIST/BioPerl-1.6.1.tar.bz2
 http://bioperl.org/DIST/BioPerl-1.6.1.tar.gz
 http://bioperl.org/DIST/BioPerl-1.6.1.zip
The PPM for Windows should also finally be available this week, ActivePerl problems permitting (we will post more information when it becomes available).
Tons of bug fixes [...]]]></description>
			<content:encoded><![CDATA[<br/><p>We are pleased to announce the immediate availability of BioPerl 1.6.1, the latest release of BioPerl&#8217;s core code.  You can grab it here:</p>
<p>Via CPAN:</p>
<p><a href="http://search.cpan.org/~cjfields/BioPerl-1.6.1/">http://search.cpan.org/~cjfields/BioPerl-1.6.1/</a></p>
<p>Via the BioPerl website:</p>
<p><a href="http://bioperl.org/DIST/BioPerl-1.6.1.tar.bz2">http://bioperl.org/DIST/BioPerl-1.6.1.tar.bz2</a><br />
<a href="http://bioperl.org/DIST/BioPerl-1.6.1.tar.gz"> http://bioperl.org/DIST/BioPerl-1.6.1.tar.gz</a><br />
<a href="http://bioperl.org/DIST/BioPerl-1.6.1.zip"> http://bioperl.org/DIST/BioPerl-1.6.1.zip</a></p>
<p>The PPM for Windows should also finally be available this week, ActivePerl problems permitting (we will post more information when it becomes available).</p>
<p>Tons of bug fixes and changes have been incorporated into this release.  For a more complete change list please see the &#8216;Changes&#8217; file included with the distribution.</p>
<p>A few highlights:</p>
<ul>
<li>FASTQ parsing and interconversion of the three FASTQ variants (Sanger, Illumina, Solexa) now works (a concerted OBF effort!)</li>
<li>Significant refactoring of Bio::Restriction methods</li>
<li>Complete refactoring of Bio::Search-related tiling code, including HOWTO documentation</li>
<li>GBrowse-related fixes:</li>
<li>- <em>berkeleydb database now autoindexes wig files and locks correctly</em></li>
<li>- <em>add Pg, SQLite, and faster BerkeleyDB implementations</em></li>
<li>Infernal 1.0 output is now parsed</li>
<li>New SearchIO-based parser for gmap -f9 output</li>
<li>BLAST XML parsing essentially complete</li>
<li>Installation via CPANPLUS should now work</li>
<li>For those using Strawberry Perl on Windows, the latest build is expected to pass all tests.</li>
<li>&#8216;raw&#8217; sequence format now parsed by line or optionally as a single sequence</li>
<li>SCF parsing/writing now round-trips</li>
<li>Demo code for using RPS-BLAST and Bio::Tools::Run::RemoteBlast</li>
<li>Bio::Tools::SeqPattern now has a backtranslate() method</li>
<li>Bio::Tree::Statistics now has methods to calculate Fitch-based score, internal trait values, statratio(), sum of leaf distances</li>
<li>Scripts</li>
<li><em>- update to bp_seqfeature_load for SQLite</em></li>
<li><em>- hivq.pl &#8211; commmand-line interface to Bio::DB::HIV</em></li>
<li><em>- fastam9_to_table &#8211; fix for MPI output</em></li>
<li><em>- gccalc &#8211; total stats</em></li>
<li><em>- einfo  &#8211; simple script to find up-to-date NCBI databases, list field and link values for a specific database</em></li>
</ul>
<p>We will shortly release updates for BioPerl-db, BioPerl-run, and BioPerl-network.  Enjoy!</p>
<p>chris</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2009/09/bioperl-1-6-1-released/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Working with FASTQ files in Biopython when speed matters</title>
		<link>http://news.open-bio.org/news/2009/09/biopython-fast-fastq/</link>
		<comments>http://news.open-bio.org/news/2009/09/biopython-fast-fastq/#comments</comments>
		<pubDate>Fri, 25 Sep 2009 11:49:53 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[HOWTO]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[FASTQ]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=432</guid>
		<description><![CDATA[<br/>Biopython's SeqIO interface revolves around SeqRecord objects which can impose a speed penalty. For FASTQ files the quality string gets turned into a list of integers on parsing, and then re-encoded back to ASCII on writing. Working directly with the raw strings is less flexible, but much faster.]]></description>
			<content:encoded><![CDATA[<br/><p><a href="http://news.open-bio.org/news/2009/08/biopython-1-51-released/">Biopython 1.51</a> onward includes support for Sanger, Solexa and Illumina 1.3+ FASTQ files in <a href="http://biopython.org/wiki/SeqIO">Bio.SeqIO</a>, which allows a lot of neat tricks very concisely. For example, the <a href="http://biopython.org/DIST/docs/tutorial/Tutorial.html">tutorial</a> (<a href="http://biopython.org/DIST/docs/tutorial/Tutorial.pdf">PDF</a>) has examples finding and removing primer or adaptor sequences.</p>
<p>However, because the Bio.SeqIO interface revolves around <a href="http://biopython.org/wiki/SeqRecord">SeqRecord objects</a> there is often a speed penalty. For example for <a href="http://en.wikipedia.org/wiki/FASTQ_format">FASTQ files</a>, the quality string gets turned into a list of integers on parsing, and then re-encoded back to ASCII on writing. </p>
<p>The new <a href="http://news.open-bio.org/news/2009/09/biopython-convert-function/">Bio.SeqIO.convert(&#8230;)</a> function in <a href="http://news.open-bio.org/news/2009/09/biopython-release-152/">Biopython 1.52</a> onwards makes converting from FASTQ to FASTA, or between the FASTQ variants about five times faster. It can do this because it doesn&#8217;t bother with creating any objects &#8211; it just uses Python strings.</p>
<p>You can use the same approach in your own scripts. For example, suppose you have a Solexa FASTQ file where you want to trim all the reads, taking just the first 21 bases (say). Why might you want to do this? Well, in Solexa/Illumina there is a general decline in read quality along the sequence, so it can make sense to trim, and some algorithms like to have all the input reads the same length. Here is how I would write this using the standard Bio.SeqIO functions:</p>
<p><code>
<pre>from Bio import SeqIO
records = (rec[:21] for rec in SeqIO.parse(open("untrimmed.fastq"), "fastq-solexa"))
handle = open("trimmed21.fastq", "w")
count = SeqIO.write(records, handle, "fastq-solexa")
handle.close()
print "Trimmed %i FASTQ records" % count</pre>
<p></code></p>
<p>This works, and is very simple and general. The same template can be used on any file formats supported by <a href="http://biopython.org/wiki/SeqIO">Bio.SeqIO</a>. However, it might be a bit slow for large next generation sequence files.</p>
<p>Instead, we can get a little more low level &#8211; and work directly with strings. This requires you to know more about the details of the FASTQ file format. Parsing FASTQ files is surprising complicated (with nasty things like line wrapping technically allowed), so we&#8217;ll still get Biopython to do that bit &#8211; but not bother with constructing SeqRecord objects and decoding the FASTQ quality strings. On the other hand, doing the FASTQ output explicitly isn&#8217;t actually too bad once you know how things work:</p>
<p><code>
<pre>from Bio.SeqIO.QualityIO import FastqGeneralIterator
trim = 21
handle = open("trimmed21.fastq", "w")
for title, seq, qual in FastqGeneralIterator(open("untrimmed.fastq")) :
&nbsp;&nbsp;&nbsp;&nbsp;handle.write("@%s\n%s\n+\n%s\n" % (title, seq[:trim], qual[:trim]))
handle.close()</pre>
<p></code></p>
<p>Again, the solution is a very short script &#8211; but this time it is much less flexible, and not nearly as clear what is going on. On the bright side, it is many times faster. Deciding on this trade-off is down to you, but I hope this blog post has highlighted the potential usefulness of the FastqGeneralIterator function in Bio.SeqIO.QualityIO, which you might otherwise have overlooked. To find out more, please read the built in documentation (also available <a title="Documentation for Bio.SeqIO.QualityIO function FastqGeneralIterator" href="http://biopython.org/DIST/docs/api/Bio.SeqIO.QualityIO-module.html#FastqGeneralIterator">online</a>):</p>
<p><code>&gt;&gt;&gt; from Bio.SeqIO.QualityIO import FastqGeneralIterator<br />
&gt;&gt;&gt; help(FastqGeneralIterator)<br />
...<br />
</code></p>
<p>Please sign up to the <a href="http://biopython.org/wiki/Mailing_lists">Biopython mailing list</a> if you want to discuss this topic further.</p>
<p>Peter</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2009/09/biopython-fast-fastq/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Biopython CVS to git migration</title>
		<link>http://news.open-bio.org/news/2009/09/biopython-cvs-to-git-migration/</link>
		<comments>http://news.open-bio.org/news/2009/09/biopython-cvs-to-git-migration/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 13:00:04 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=427</guid>
		<description><![CDATA[<br/>Biopython has now moved from CVS to a <a title="Biopython on github" href="http://github.com/biopython/biopython">git repository</a>, hosted on <a href="http://github.com">github.com</a> who kindly provide git hosting for open source projects free of charge. The <a href="http://bioruby.org">BioRuby project</a> have been <a title="BioRuby on github" href="http://github.com/bioruby/bioruby">using github</a> for some time, so we are in good company.]]></description>
			<content:encoded><![CDATA[<br/><p>The release of <a href="http://news.open-bio.org/news/2009/09/biopython-release-152/">Biopython 1.52</a> earlier this week marked the end of an era, it was our last release using CVS for source code control.</p>
<p>As of now, Biopython is using a <a title="Biopython on github" href="http://github.com/biopython/biopython">git repository</a>, hosted on <a href="http://github.com">github.com</a> who kindly provide git hosting for open source projects free of charge. The <a href="http://bioruby.org">BioRuby project</a> have been <a title="BioRuby on github" href="http://github.com/bioruby/bioruby">using github</a> for some time, so we are in good company.</p>
<p>Our existing OBF hosted CVS repository will be maintained in the short to medium term as a backup, but will not be updated.</p>
<p>Although many people have been involved in this move, we&#8217;d like to thank Bartek Wilczynski in particular for handling the CVS to git conversion, and the mirroring our CVS updates to git during the transition period. In the next few weeks hopefully we&#8217;ll get our <a href="http://biopython.org/wiki/GitUsage">git usage wiki pages</a> perfected, as we start using git for real.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2009/09/biopython-cvs-to-git-migration/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BioRuby 1.3.1 released</title>
		<link>http://news.open-bio.org/news/2009/09/bioruby-1-3-1-released/</link>
		<comments>http://news.open-bio.org/news/2009/09/bioruby-1-3-1-released/#comments</comments>
		<pubDate>Wed, 02 Sep 2009 13:47:09 +0000</pubDate>
		<dc:creator>Naohisa Goto</dc:creator>
				<category><![CDATA[BioRuby]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=390</guid>
		<description><![CDATA[<br/>We are pleased to announce the release of BioRuby 1.3.1. This new release fixes many bugs existed in 1.3.0.
Here is a brief summary of changes.

Refactoring of BioSQL support.
Bio::PubMed bug fixes.
Bio::NCBI::REST bug fixes.
Bio::GCG::Msf bug fixes.
Bio::Fasta::Report bug fixes and added support for multiple query sequences.
Bio::Sim4::Report bug fixes.
Added unit tests for Bio::GCG::Msf and Bio::Sim4::Report.
License of BioRuby is clarified.

In [...]]]></description>
			<content:encoded><![CDATA[<br/><p>We are pleased to announce the release of <a title="BioRuby" href="http://bioruby.org/">BioRuby</a> 1.3.1. This new release fixes many bugs existed in 1.3.0.</p>
<p>Here is a brief summary of changes.</p>
<ul>
<li>Refactoring of BioSQL support.</li>
<li>Bio::PubMed bug fixes.</li>
<li>Bio::NCBI::REST bug fixes.</li>
<li>Bio::GCG::Msf bug fixes.</li>
<li>Bio::Fasta::Report bug fixes and added support for multiple query sequences.</li>
<li>Bio::Sim4::Report bug fixes.</li>
<li>Added unit tests for Bio::GCG::Msf and Bio::Sim4::Report.</li>
<li>License of BioRuby is clarified.</li>
</ul>
<p>In addition, many changes have been made, mainly bug fixes. For more information, you can see <a title="ChangeLog" href="http://github.com/bioruby/bioruby/blob/e731c6e52bc9a672e4546eeca4f2d2d968bdba09/ChangeLog">ChangeLog</a>.</p>
<p>The archive is available at: <a title="http://bioruby.org/archive/bioruby-1.3.1.tar.gz" href="http://bioruby.org/archive/bioruby-1.3.1.tar.gz">http://bioruby.org/archive/bioruby-1.3.1.tar.gz</a></p>
<p>We also put RubyGems pacakge at RubyForge as always. You can easily install by using RubyGems.<br />
% sudo gem install bio</p>
<p>You can also obtain bioruby gem file from <a title="bioruby.org" href="http://bioruby.org/">bioruby.org</a>.<br />
<a title="http://bioruby.org/archive/gems/bio-1.3.1.gem" href="http://bioruby.org/archive/gems/bio-1.3.1.gem">http://bioruby.org/archive/gems/bio-1.3.1.gem</a></p>
<p>Hope you enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2009/09/bioruby-1-3-1-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
