<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>O&#124;B&#124;F News</title>
	<atom:link href="http://news.open-bio.org/news/feed/" rel="self" type="application/rss+xml" />
	<link>http://news.open-bio.org/news</link>
	<description>Open Source Bioinformatics news</description>
	<lastBuildDate>Tue, 31 Aug 2010 22:49:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
		<item>
		<title>Biopython 1.55 released</title>
		<link>http://news.open-bio.org/news/2010/08/biopython-1-55-released/</link>
		<comments>http://news.open-bio.org/news/2010/08/biopython-1-55-released/#comments</comments>
		<pubDate>Tue, 31 Aug 2010 22:49:52 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=736</guid>
		<description><![CDATA[<br/>The Biopython team is proud to announce Biopython 1.55, a new stable release, about three months after our last stable release (Biopyton 1.54) and the beta release earlier in August. A lot of work has been towards Python 3 support (via the 2to3 script), but unless we broke something you shouldn&#8217;t notice any changes In [...]]]></description>
			<content:encoded><![CDATA[<br/><p>The Biopython team is proud to announce Biopython 1.55, a new stable release, about three months after our last stable release (<a href="http://news.open-bio.org/news/2010/05/biopython-release-154/">Biopyton 1.54</a>) and the <a href="http://news.open-bio.org//news/2010/08/biopython-1-55-beta-released/">beta release</a> earlier in August.</p>
<p>A lot of work has been towards Python 3 support (via the 2to3 script), but unless we broke something you shouldn&#8217;t notice any changes <img src='http://news.open-bio.org/news/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>In terms of new features, the most noticeable highlight is that the command line tool application wrapper classes are now executable, which should make it much easier to call external tools. This is described in the updated <a href="http://www.biopython.org/wiki/Documentation">documentation</a>.</p>
<p>Additionally GenBank and EMBL parsing has been sped up, the <a href="http://www.biopython.org/wiki/BioSQL">BioSQL</a> classes act more like Python dictionaries, and Bio.PDB should handle model numbers and a missing element column better.</p>
<p>Note we are phasing out support for Python 2.4. We will continue to support it for at least one further release (i.e. Biopython 1.56). This could be delayed given feedback from our users (e.g. if this proves to be a problem in combination with other libraries or a popular Linux distribution).</p>
<p>(At least) 12 people have contributed to this release, including 6 new people &#8211; thank you all:</p>
<ul>
<li>Andres Colubri (first contribution)</li>
<li>Carlos Rios Vera (first contribution)</li>
<li>Claude Paroz (first contribution)</li>
<li>Cymon Cox</li>
<li>Eric Talevich</li>
<li>Frank Kauff</li>
<li>Joao Rodrigues (first contribution)</li>
<li>Konstantin Okonechnikov (first contribution)</li>
<li>Michiel de Hoon</li>
<li>Nathan Edwards (first contribution)</li>
<li>Peter Cock</li>
<li>Tiago Antao</li>
</ul>
<p>Source distributions and Windows installers are available from the <a href="http://www.biopython.org/wiki/Download">downloads page</a> on the <a href="http://www.biopython.org">Biopython website (biopython.org)</a>.</p>
<p>As usual, feedback is most welcome on the <a href="http://biopython.org/wiki/Mailing_lists">mailing lists</a> (or <a href="http://bugzilla.open-bio.org/">bugzilla</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/08/biopython-1-55-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BioRuby paper published</title>
		<link>http://news.open-bio.org/news/2010/08/bioruby-paper-published/</link>
		<comments>http://news.open-bio.org/news/2010/08/bioruby-paper-published/#comments</comments>
		<pubDate>Fri, 27 Aug 2010 00:13:15 +0000</pubDate>
		<dc:creator>Naohisa Goto</dc:creator>
				<category><![CDATA[BioRuby]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=731</guid>
		<description><![CDATA[<br/>After 10 years of development, the BioRuby paper is finally published in the Bioinformatics journal.  The article is open access, so please take a look. BioRuby: Bioinformatics software for the Ruby programming language Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama Bioinformatics 2010; doi: 10.1093/bioinformatics/btq475]]></description>
			<content:encoded><![CDATA[<br/><p>After 10 years of development, the BioRuby paper is finally published in the <a href="http://bioinformatics.oxfordjournals.org/"><em>Bioinformatics</em></a> journal.  The article is open access, so please take a look.</p>
<p style="padding-left: 30px">BioRuby: Bioinformatics software for the Ruby programming language<br />
Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama<br />
<em>Bioinformatics</em> 2010; <a href="http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btq475">doi: 10.1093/bioinformatics/btq475</a></p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/08/bioruby-paper-published/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Biopython 1.55 beta released</title>
		<link>http://news.open-bio.org/news/2010/08/biopython-1-55-beta-released/</link>
		<comments>http://news.open-bio.org/news/2010/08/biopython-1-55-beta-released/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 20:52:54 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=722</guid>
		<description><![CDATA[<br/>We&#8217;ve just released a beta of Biopython 1.55 for user testing. Since Biopython 1.54 was released three months ago, we&#8217;ve made a good start on work for Python 3 support (via the 2to3 script), but as a side effect of this we&#8217;ve had to update quite a lot of the older parts of the library. [...]]]></description>
			<content:encoded><![CDATA[<br/><p>We&#8217;ve just released a <em>beta</em> of Biopython 1.55 for user testing.</p>
<p>Since <a href="http://news.open-bio.org/news/2010/05/biopython-release-154/">Biopython 1.54</a> was released three months ago, we&#8217;ve made a good start on work for Python 3 support (via the 2to3 script), but as a side effect of this we&#8217;ve had to update quite a lot of the older parts of the library. Although the unit tests are all fine, there is a small but real chance that we&#8217;ve accidentally broken things &#8211; which is why we&#8217;re doing this beta release.</p>
<p>In terms of new features, the most noticeable highlight is that the command line tool application wrapper classes are now executable, which should make it much easier to call external tools. This is described in the updated <a href="http://www.biopython.org/wiki/Documentation">documentation</a>.</p>
<p>Note we are phasing out support for Python 2.4. We will continue to support it for at least one further release (i.e. Biopython 1.56). This could be delayed given feedback from our users (e.g. if this proves to be a problem in combination with other libraries or a popular Linux distribution).</p>
<p>(At least) 10 people have contributed to this release (so far), including 5 new people:</p>
<ul>
<li>Andres Colubri (first contribution)</li>
<li>Carlos Rios Vera (first contribution)</li>
<li>Claude Paroz (first contribution)</li>
<li>Eric Talevich</li>
<li>Frank Kauff</li>
<li>Joao Rodrigues (first contribution)</li>
<li>Konstantin Okonechnikov (first contribution)</li>
<li>Michiel de Hoon</li>
<li>Peter Cock</li>
<li>Tiago Antao</li>
</ul>
<p>Source distributions and Windows installers are available from the <a href="http://www.biopython.org/wiki/Download">downloads page</a> on the <a href="http://www.biopython.org">Biopython website (biopython.org)</a>.</p>
<p>So, please let us know what works and what doesn’t through the <a href="http://biopython.org/wiki/Mailing_lists">mailing lists</a> (or <a href="http://bugzilla.open-bio.org/">bugzilla</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/08/biopython-1-55-beta-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Biopython 1.54 released</title>
		<link>http://news.open-bio.org/news/2010/05/biopython-release-154/</link>
		<comments>http://news.open-bio.org/news/2010/05/biopython-release-154/#comments</comments>
		<pubDate>Thu, 20 May 2010 19:04:27 +0000</pubDate>
		<dc:creator>davidw</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Google Summer of Code]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[release]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=653</guid>
		<description><![CDATA[<br/>The Biopython team is proud to announce Biopython 1.54, a new stable release of the Biopython library. Biopython 1.54 comes five months after our last release and brings new features, tweaks to some established functions and the usual collection of bug fixes. This is the first stable release to feature the new Bio.Phylo module which [...]]]></description>
			<content:encoded><![CDATA[<br/><p>The Biopython team is proud to announce Biopython 1.54, a new stable release of the Biopython library. Biopython 1.54 comes five months after our last release and brings new features,  tweaks to some established functions and the usual collection of bug fixes.</p>
<p>This is the first stable release to feature the new <a title="Bio.Phylo documentation on the wiki" href="http://www.biopython.org/wiki/Phylo">Bio.Phylo</a> module which can be used to read, write and take data from phylogenetic trees in Newick, Nexus and <a title="PhyloXML decription" href="http://www.phyloxml.org/">PhyloXML</a> formats. The module is the result of Eric Talevich&#8217;s Google Summer of Code project which was supported by<a href="http://www.nescent.org/index.php"> The  National Evolutionary Synthesis Center (NESCent)</a>.</p>
<p>Biopython now supports the reading, writing and indexing of Standard Flowgram Format (SFF) files produced in 454 sequencing. Jose Blanca (the brains behind the widely used <a title="sff_extract homepage" href="http://bioinf.comav.upv.es/sff_extract/">sff_extract</a> tool) provided code to handle SFF files and Peter Cock has integrated that code with <tt>Bio.SeqIO</tt>. Adding SFF support to <tt>SeqIO</tt> makes it possible to convert these files to the FASTQ,  FASTA and QUAL formats (as trimmed or untrimmed reads).</p>
<p>As well as adding features the new release tweaks and extends some of  the core modules:</p>
<ul>
<li> Both <tt>Bio.SeqIO</tt> and <tt>Bio.AlignIO</tt> will accept filenames as well as  handles, <a href="http://news.open-bio.org/news/2010/04/biopython-seqio-and-alignio-easier/">as detailed here</a>.</li>
<li> The multiple sequence alignment object that underlies Bio.AlignIO  has been improved.</li>
<li> <tt>Bio.SeqIO</tt> can read and write EMBL nucleotide files.</li>
<li> The dictionary-like objects returned by <tt>Bio.SeqIO.index()</tt> have a new method &#8220;<tt>get_raw</tt>&#8221; that gets unparsed data from a file as a string, <a href="http://news.open-bio.org/news/2010/04/partial-seq-files-biopython/">as detailed here</a>.</li>
<li> <tt>Bio.Entrez</tt> includes some more DTD files, in particular <tt>eLink_090910.dtd</tt>, used by our NCBI Entrez Utilities XML parser.</li>
</ul>
<p>Binaries and source files for Biopython 1.54 are available from the  <a href="http://www.biopython.org/wiki/Download">downloads page</a>. The <a title="Biopython Documentation" href="http://www.biopython.org/wiki/Documentation">documentation</a> has been updated to include the changes made since our last release.</p>
<p>A big thanks to every one who tested our beta release or submitted bugs since <a href="http://news.open-bio.org/news/2009/12/biopython-release-153/">Biopython 1.53</a>. And an especially big thanks to everyone who contributed to this release, including five first time contributors:</p>
<ul>
<li>Anne Pajon (first contribution)</li>
<li> Brad Chapman</li>
<li> Christian Zmasek</li>
<li> Diana Jaunzeikare (first contribution) </li>
<li> Eric Talevich</li>
<li> Jose Blanca (first contribution)</li>
<li>Kevin Jacobs (first contribution)</li>
<li> Leighton Pritchard</li>
<li> Michiel de Hoon</li>
<li> Peter Cock</li>
<li> Thomas Holder (first contribution)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/05/biopython-release-154/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BioPerl has moved to GitHub</title>
		<link>http://news.open-bio.org/news/2010/05/bioperl-has-moved-to-github/</link>
		<comments>http://news.open-bio.org/news/2010/05/bioperl-has-moved-to-github/#comments</comments>
		<pubDate>Fri, 14 May 2010 04:18:33 +0000</pubDate>
		<dc:creator>Chris Fields</dc:creator>
				<category><![CDATA[BioPerl]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=695</guid>
		<description><![CDATA[<br/>BioPerl has migrated to git and GitHub!  We have also set up a mirror set of several key repositories at the great public git hosting site repo.or.cz. If you are a current BioPerl developer (had a previous account for direct access to our prior Subversion repository), please sign up for a GitHub account and let us [...]]]></description>
			<content:encoded><![CDATA[<br/><p>BioPerl has migrated to <a href="http://git-scm.com/">git</a> and <a href="http://github.com/bioperl">GitHub</a>!  We have also set up a mirror set of several key repositories at the great public git hosting site<a href="http://repo.or.cz/w"> repo.or.cz</a>.</p>
<p>If you are a current BioPerl developer (had a previous account for direct access to our prior Subversion repository), please sign up for a GitHub account and let us know your user ID.  Also, add the extra email <a href="http://news.open-bio.org/news/wp-content/uploads/2010/05/generic.jpg"><img class="alignnone size-full wp-image-703" src="http://news.open-bio.org/news/wp-content/uploads/2010/05/generic.jpg" alt="" width="137" height="15" /></a> (where &#8216;DEVNAME&#8217; is your <strong>original Subversion account ID</strong>).  This should map any previous commits from the older Subversion and CVS repository to your new GitHub account.</p>
<p>The following are ways everyone can download the latest code.</p>
<h2>Using git</h2>
<p>Note you can replace &#8216;bioperl-live.git&#8217; with any of the repository names (bioperl-db, bioperl-run, etc).  For BioPerl developers (GitHub collaborators) you have a choice of SSH or HTTP:</p>
<pre><span style="font-family: Consolas, Monaco, 'Courier New', Courier, monospace;line-height: 18px;font-size: 12px">  git clone git@github.com:bioperl/bioperl-live.git</span></pre>
<pre><span style="font-family: Consolas, Monaco, 'Courier New', Courier, monospace;line-height: 18px;font-size: 12px">  git clone https://bioperl@github.com/bioperl/bioperl-live.git</span></pre>
<p>The open read-only link (for everyone):</p>
<pre>  git clone git://github.com/bioperl/bioperl-live.git</pre>
<p>or using the mirror site:</p>
<pre><code>  git clone git://repo.or.cz/bioperl-live.git</code></pre>
<h2>Using SVN (read-only)</h2>
<p>We will also support read-only access to GitHub with Subversion.  We may allow write support at some later point.  To use svn:</p>
<pre>  svn checkout http://svn.github.com/bioperl/bioperl-live.git</pre>
<h2>Direct downloads</h2>
<p>Tagged releases can be found here:</p>
<p><a href="http://github.com/bioperl/bioperl-live/downloads">http://github.com/bioperl/bioperl-live/downloads</a></p>
<p>The latest source code here:</p>
<p><a href="http://github.com/bioperl/bioperl-live/archives/master">http://github.com/bioperl/bioperl-live/archives/master</a></p>
<h2><strong>Forking BioPerl and Pull Requests</strong></h2>
<p>We intend on using git and GitHub to their fullest.  With that in mind, we encourage users to <a href="http://help.github.com/forking/">fork</a> BioPerl code, make changes, commit them to your forked repository, and submit <a href="http://github.com/guides/pull-requests">pull requests</a>.</p>
<h2>Documentation</h2>
<p>We&#8217;re also in the process of updating our local developer documents for help with those who haven&#8217;t used git before.  In particular, we have a <a href="http://www.bioperl.org/wiki/Using_Git">Using Git</a> page, and have added <a href="http://www.bioperl.org/wiki/Tracking_Git_commits">RSS feeds</a> for our repository commits.</p>
<p>Enjoy!</p>
<p>chris</p>
<p><strong>Update: </strong>SVN version fixed, thanks to DaveMessina++ for pointing it out.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/05/bioperl-has-moved-to-github/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>O&#124;B&#124;F Google Summer of Code Accepted Students</title>
		<link>http://news.open-bio.org/news/2010/05/obf-google-summer-of-code-accepted-students/</link>
		<comments>http://news.open-bio.org/news/2010/05/obf-google-summer-of-code-accepted-students/#comments</comments>
		<pubDate>Sun, 02 May 2010 19:37:03 +0000</pubDate>
		<dc:creator>rbuels</dc:creator>
				<category><![CDATA[BioDAS]]></category>
		<category><![CDATA[BioJava]]></category>
		<category><![CDATA[BioLib]]></category>
		<category><![CDATA[BioMOBY]]></category>
		<category><![CDATA[BioPerl]]></category>
		<category><![CDATA[BioRuby]]></category>
		<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blipkit]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Google Summer of Code]]></category>
		<category><![CDATA[OBDA / BioSQL]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=691</guid>
		<description><![CDATA[<br/>I&#8217;m pleased to announce the acceptance of OBF&#8217;s 2010 Google Summer of Code students, listed in alphabetical order with their project titles and primary mentors: Mark Chapman (PM Andreas Prlic) &#8211; Improvements to BioJava including Implementation of Multiple Sequence Alignment Algorithms Jianjiong Gao (PM Peter Rose) &#8211; BioJava Packages for Identification, Classification, and Visualization of [...]]]></description>
			<content:encoded><![CDATA[<br/><p>I&#8217;m pleased to announce the acceptance of <a href="http://www.open-bio.org/wiki/Google_Summer_of_Code">OBF&#8217;s 2010 Google Summer of Code</a> students, listed in alphabetical order with their project titles and primary mentors:</p>
<p>Mark Chapman (PM Andreas Prlic) &#8211; Improvements to BioJava including Implementation of Multiple Sequence Alignment Algorithms</p>
<p>Jianjiong Gao (PM Peter Rose) &#8211; BioJava Packages for Identification, Classification, and Visualization of Posttranslational Modification of Proteins</p>
<p>Kazuhiro Hayashi (PM Naohisa Goto) &#8211; Ruby 1.9.2 support of BioRuby</p>
<p>Sara Rayburn (PM Christian Zmasek) &#8211; Implementing Speciation &amp; Duplication Inference Algorithm for Binary and Non-binary Species Tree</p>
<p>Joao Pedro Garcia Lopes Maia Rodrigues (PM Eric Talevich) &#8211; Extending Bio.PDB: broadening the usefulness of BioPython&#8217;s Structural Biology module</p>
<p>Jun Yin (PM Chris Fields) &#8211; BioPerl Alignment Subsystem Refactoring</p>
<p>Congratulations to our accepted students!</p>
<p>All told, we had 52 applications submitted for the 6 slots (5 originally assigned, plus 1 extra) allotted to us by Google.  Proposals were extremely competitive: 6 out of 52 translates to an 11.5% acceptance rate.  We received a lot of really excellent proposals, the decisions were not easy.</p>
<p>Thanks very much to all the students who applied, we very much appreciate your hard work.</p>
<p>Here&#8217;s to a great 2010 Summer of Code, I&#8217;m sure these students will do wonderful work.</p>
<p>Rob Buels<br />
O|B|F GSoC 2010 Administrator</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/05/obf-google-summer-of-code-accepted-students/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Illumina FASTQ files &#8211; Read Segment Quality Control Indicator</title>
		<link>http://news.open-bio.org/news/2010/04/illumina-q2-trim-fastq/</link>
		<comments>http://news.open-bio.org/news/2010/04/illumina-q2-trim-fastq/#comments</comments>
		<pubDate>Fri, 30 Apr 2010 07:49:45 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[HOWTO]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[FASTQ]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=677</guid>
		<description><![CDATA[<br/>In another quirk to the FASTQ story, recent Illumina FASTQ files don&#8217;t actually use the full range of PHRED scores &#8211; and a score of 2 has a special meaning, The Read Segment Quality Control Indicator (RSQCI, encoded as &#8216;B&#8217;). Hats off to Dr Torsten Seemann for raising awareness of this issue in his post [...]]]></description>
			<content:encoded><![CDATA[<br/><p>In another quirk to the <a href="http://news.open-bio.org/news/2009/12/nar-fastq-format/">FASTQ story</a>, recent Illumina FASTQ files don&#8217;t actually use the full range of PHRED scores &#8211; and a score of 2 has a special meaning, <i>The Read Segment Quality Control Indicator</i> (RSQCI, encoded as &#8216;B&#8217;).</p>
<p>Hats off to <i>Dr Torsten Seemann</i> for raising awareness of this issue in <a href="http://seqanswers.com/forums/showpost.php?p=17491&#038;postcount=3">his post on the seqanswers.com forum</a>, referring to a presentation by <i>Tobias Mann</i> of Illumina which says:</p>
<blockquote><p><i>The Read Segment Quality Control Indicator:</p>
<ul>
<li>At the ends of some reads, quality scores are unreliable. Illumina has an algorithm for identifying these unreliable runs of quality scores, and we use a special indicator to flag these portions of reads
</li>
<li>A quality score of 2, encoded as a &#8220;B&#8221;, is used as a special indicator. A quality score of 2 does not imply a specific error rate, but rather implies that the marked region of the read should not be used for downstream analysis.
</li>
<li>Some reads will end with a run of B (or Q2) basecalls, but there will never  be an isolated Q2 basecall.
</li>
</ul>
<p></i><i></i></p></blockquote>
<p>So, armed with this knowledge, you might want to apply a simple trimming criteria to any Illumina FASTQ files &#8211; remove anything after and including a PHRED quality score of 2 (encoded as ASCII &#8216;B&#8217;).</p>
<p>We could do this with the rich object orientated <tt>SeqRecord</tt> based API in Biopython, but when <a href="http://news.open-bio.org/news/2009/09/biopython-fast-fastq/">dealing with massive FASTQ files</a> this overhead matters. Instead we&#8217;ll stick with plain Python strings:</p>
<p><code>
<pre>from Bio.SeqIO.QualityIO import FastqGeneralIterator
handle = open("B_trimmed.fastq", "w")
min_length = 10
for title, seq, qual in FastqGeneralIterator(open("untrimmed.fastq")) :
    #Find the location of the first "B" (PHRED quality 2)
    trim = qual.find("B")
    if trim == -1:
        #No need to trim
        handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
    elif trim >= min_length:
        #Take everything up to the first B
        handle.write("@%s\n%s\n+\n%s\n" % (title, seq[:trim], qual[:trim]))
handle.close()</pre>
<p></code></p>
<p>The above will work fine on any recent Illumina FASTQ files using the RSQCI scheme, but on older Illumina FASTQ files the &#8220;B&#8221; character is just a low quality score &#8211; and can occur even in the middle of a read. Here trimming at the first &#8220;B&#8221; might be unwise. Instead, we can trim any trailing &#8220;B&#8221; characters &#8211; which will do the same thing on RSQCI based FASTQ files where the &#8220;B&#8221; should only appear at the end:</p>
<p><code>
<pre>from Bio.SeqIO.QualityIO import FastqGeneralIterator
handle = open("B_trimmed.fastq", "w")
min_length = 10
for title, seq, qual in FastqGeneralIterator(open("untrimmed.fastq")) :
    qual = qual.rstrip("B") #Remove any trailing B characters
    length = len(qual)
    if length >= min_length:
        seq = seq[:length] #trim to match
        handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
handle.close()</pre>
<p></code></p>
<p>You could easily modify this example to read from stdin and write to stdout (see this <a href="http://www.biopython.org/wiki/Reading_from_unix_pipes">cookbook example</a>), or take filenames as command line arguments to turn this into a general purpose &#8220;FASTQ B-trimming script&#8221;.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/04/illumina-q2-trim-fastq/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Partial sequence files with Biopython</title>
		<link>http://news.open-bio.org/news/2010/04/partial-seq-files-biopython/</link>
		<comments>http://news.open-bio.org/news/2010/04/partial-seq-files-biopython/#comments</comments>
		<pubDate>Tue, 27 Apr 2010 13:03:04 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[HOWTO]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=659</guid>
		<description><![CDATA[<br/>This is another blog post to highlight one of the neat tricks you&#8217;ll be able to do with Biopython 1.54 (which you can help test with the Biopython 1.54 beta release). It is often useful to be able to extract a few records from a larger sequence file &#8211; for example, some sequences of interest [...]]]></description>
			<content:encoded><![CDATA[<br/><p>This is another blog post to highlight one of the neat tricks you&#8217;ll be able to do with Biopython 1.54 (which you can help test with the <a href="http://news.open-bio.org/news/2010/04/biopython-1-54-beta-released/">Biopython 1.54 beta</a> release).</p>
<p>It is often useful to be able to extract a few records from a larger sequence file &#8211; for example, some sequences of interest from a full UniProt or GenBank dump. One obvious way to try to do this is to parse the file into an object representation (i.e. <tt>SeqRecord</tt> objects using <tt>Bio.SeqIO.parse(...)</tt>), filter to pick out the entries you want, and then write them back to disk (using <tt>Bio.SeqIO.write(...)</tt>). However, for complex file formats like GenBank this can be lossy (<tt>Bio.SeqIO</tt> does not support a 100% identical round trip), and Biopython don&#8217;t currently support writing out the SwissProt plain text format used by UniProt. So, that approach won&#8217;t work.</p>
<p><a href="http://news.open-bio.org/news/2009/09/biopython-release-152/">Biopython 1.52</a> introduced a new <a href="http://news.open-bio.org/news/2009/09/biopython-seqio-index/">indexing function</a>, <tt>Bio.SeqIO.index(...)</tt>, which allows a large multi-sequence file to be treated like a Python dictionary &#8211; parsing requested records on request. This has been enhanced for Biopython 1.54 with a method <tt>get_raw(...)</tt> to extract the raw for a record as a string.</p>
<p>How is this useful? Well, take your large (UniProt) file, index it, then extract the records you want and write them to your output file unmodified:</p>
<p><code>
<pre>from Bio import SeqIO
uniprot = SeqIO.index("uniprot_sprot.dat", "swiss")
handle = open("selected.dat", "w")
for acc in ["P33487", "P19801", "P13689", "Q8JZQ5", "Q9TRC7"]:
    handle.write(uniprot.get_raw(acc))
handle.close()
</pre>
<p></code></p>
<p>Another neat use of this functionality would be to sort entries in a sequential file format, and there is an example of that in the <a href="http://biopython.org/DIST/docs/tutorial/Tutorial.html">Biopython Tutorial</a> (<a href="http://biopython.org/DIST/docs/tutorial/Tutorial.pdf">PDF</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/04/partial-seq-files-biopython/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reminder:  BOSC Abstract Deadline April 15</title>
		<link>http://news.open-bio.org/news/2010/04/reminder-bosc-abstract-deadline-april-15/</link>
		<comments>http://news.open-bio.org/news/2010/04/reminder-bosc-abstract-deadline-april-15/#comments</comments>
		<pubDate>Wed, 07 Apr 2010 05:46:00 +0000</pubDate>
		<dc:creator>Kam Dahlquist</dc:creator>
				<category><![CDATA[BOSC/ISMB]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Mailing lists]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=657</guid>
		<description><![CDATA[<br/>Just a friendly reminder that abstracts for BOSC 2010 are due next Thursday, April 15.  See the BOSC web site at http://www.open-bio.org/wiki/BOSC_2010 for details.  Submissions will only be accepted electronically at http://events.open-bio.org/BOSC2010/openconf.php. Graduate students, don&#8217;t forget we are offering $250 student travel awards this year. Be sure to check the box indicating that you are [...]]]></description>
			<content:encoded><![CDATA[<br/><p>Just a friendly reminder that abstracts for BOSC 2010 are due next Thursday, April 15.  See the BOSC web site at http://www.open-bio.org/wiki/BOSC_2010 for details.  Submissions will only be accepted electronically at http://events.open-bio.org/BOSC2010/openconf.php.</p>
<p>Graduate students, don&#8217;t forget we are offering $250 student travel awards this year. Be sure to check the box indicating that you are a graduate student to be considered for the award.</p>
<p>We are also pleased to announce that Guy Coates, Group leader of the Informatics Systems Group at the Wellcome Trust Sanger Institute, and Ross Gardler, Vice President of the Apache Software Foundation, will be giving keynote presentations at BOSC.<a title="http://www.sanger.ac.uk/" rel="nofollow" href="http://www.sanger.ac.uk/"></a></p>
<p>On behalf of the BOSC 2010 organizing committee, I hope to see you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/04/reminder-bosc-abstract-deadline-april-15/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Making Biopython SeqIO and AlignIO easier</title>
		<link>http://news.open-bio.org/news/2010/04/biopython-seqio-and-alignio-easier/</link>
		<comments>http://news.open-bio.org/news/2010/04/biopython-seqio-and-alignio-easier/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 16:37:11 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=643</guid>
		<description><![CDATA[<br/>One of the small changes coming in Biopython 1.54 (which you can try out already using the Biopython 1.54 beta) is to Bio.SeqIO and Bio.AlignIO. Previously the input and output functions had required file handles, but they will now also accept filenames. This is a case of practicality beats purity (to quote the Zen of [...]]]></description>
			<content:encoded><![CDATA[<br/><p>One of the small changes coming in Biopython 1.54 (which you can try out already using the  <a href="http://news.open-bio.org/news/2010/04/biopython-1-54-beta-released/">Biopython 1.54 beta</a>) is to <a href="http://www.biopython.org/wiki/SeqIO">Bio.SeqIO</a> and <a href="http://www.biopython.org/wiki/AlignIO">Bio.AlignIO</a>. Previously the input and output functions had required file <em>handles</em>, but they will now also accept <em>filenames</em>.</p>
<p>This is a case of <em>practicality beats purity</em> (to quote <a href="http://www.python.org/dev/peps/pep-0020/">the Zen of Python</a>), and is particularly handy when doing very short scripts or working at the Python prompt.</p>
<p>For example, filtering a FASTA file to take only entries with a minimum length of 100 can be done like this (with handles):</p>
<p><code>from Bio import SeqIO<br />
in_handle = open("example.fasta", "rU")<br />
out_handle = open("long.fasta", "w")<br />
records = (rec for rec in SeqIO.parse(in_handle, "fasta") if len(rec)>100)<br />
SeqIO.write(records, out_handle, "fasta")<br />
in_handle.close()<br />
out_handle.close()</code></p>
<p>Using filenames it becomes much more concise &#8211; just three lines:</p>
<p><code>from Bio import SeqIO<br />
records = (rec for rec in SeqIO.parse("example.fasta", "fasta") if len(rec)>100)<br />
SeqIO.write(records, "long.fasta", "fasta")</code></p>
<p>This also means Python and Biopython beginners can postpone learning about file handles a little longer, although that may not be an entirely good thing <img src='http://news.open-bio.org/news/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/04/biopython-seqio-and-alignio-easier/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
