<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>O&#124;B&#124;F News &#187; Code</title>
	<atom:link href="http://news.open-bio.org/news/category/development/code/feed/" rel="self" type="application/rss+xml" />
	<link>http://news.open-bio.org/news</link>
	<description>Open Source Bioinformatics news</description>
	<lastBuildDate>Thu, 20 May 2010 19:04:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
		<item>
		<title>Biopython 1.54 released</title>
		<link>http://news.open-bio.org/news/2010/05/biopython-release-154/</link>
		<comments>http://news.open-bio.org/news/2010/05/biopython-release-154/#comments</comments>
		<pubDate>Thu, 20 May 2010 19:04:27 +0000</pubDate>
		<dc:creator>davidw</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Google Summer of Code]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[release]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=653</guid>
		<description><![CDATA[<br/>The Biopython team is proud to announce Biopython 1.54, a new stable release of the Biopython library. Biopython 1.54 comes five months after our last release and brings new features, tweaks to some established functions and the usual collection of bug fixes. This is the first stable release to feature the new Bio.Phylo module which [...]]]></description>
			<content:encoded><![CDATA[<br/><p>The Biopython team is proud to announce Biopython 1.54, a new stable release of the Biopython library. Biopython 1.54 comes five months after our last release and brings new features,  tweaks to some established functions and the usual collection of bug fixes.</p>
<p>This is the first stable release to feature the new <a title="Bio.Phylo documentation on the wiki" href="http://www.biopython.org/wiki/Phylo">Bio.Phylo</a> module which can be used to read, write and take data from phylogenetic trees in Newick, Nexus and <a title="PhyloXML decription" href="http://www.phyloxml.org/">PhyloXML</a> formats. The module is the result of Eric Talevich&#8217;s Google Summer of Code project which was supported by<a href="http://www.nescent.org/index.php"> The  National Evolutionary Synthesis Center (NESCent)</a>.</p>
<p>Biopython now supports the reading, writing and indexing of Standard Flowgram Format (SFF) files produced in 454 sequencing. Jose Blanca (the brains behind the widely used <a title="sff_extract homepage" href="http://bioinf.comav.upv.es/sff_extract/">sff_extract</a> tool) provided code to handle SFF files and Peter Cock has integrated that code with <tt>Bio.SeqIO</tt>. Adding SFF support to <tt>SeqIO</tt> makes it possible to convert these files to the FASTQ,  FASTA and QUAL formats (as trimmed or untrimmed reads).</p>
<p>As well as adding features the new release tweaks and extends some of  the core modules:</p>
<ul>
<li> Both <tt>Bio.SeqIO</tt> and <tt>Bio.AlignIO</tt> will accept filenames as well as  handles, <a href="http://news.open-bio.org/news/2010/04/biopython-seqio-and-alignio-easier/">as detailed here</a>.</li>
<li> The multiple sequence alignment object that underlies Bio.AlignIO  has been improved.</li>
<li> <tt>Bio.SeqIO</tt> can read and write EMBL nucleotide files.</li>
<li> The dictionary-like objects returned by <tt>Bio.SeqIO.index()</tt> have a new method &#8220;<tt>get_raw</tt>&#8221; that gets unparsed data from a file as a string, <a href="http://news.open-bio.org/news/2010/04/partial-seq-files-biopython/">as detailed here</a>.</li>
<li> <tt>Bio.Entrez</tt> includes some more DTD files, in particular <tt>eLink_090910.dtd</tt>, used by our NCBI Entrez Utilities XML parser.</li>
</ul>
<p>Binaries and source files for Biopython 1.54 are available from the  <a href="http://www.biopython.org/wiki/Download">downloads page</a>. The <a title="Biopython Documentation" href="http://www.biopython.org/wiki/Documentation">documentation</a> has been updated to include the changes made since our last release.</p>
<p>A big thanks to every one who tested our beta release or submitted bugs since <a href="http://news.open-bio.org/news/2009/12/biopython-release-153/">Biopython 1.53</a>. And an especially big thanks to everyone who contributed to this release, including five first time contributors:</p>
<ul>
<li>Anne Pajon (first contribution)</li>
<li> Brad Chapman</li>
<li> Christian Zmasek</li>
<li> Diana Jaunzeikare (first contribution) </li>
<li> Eric Talevich</li>
<li> Jose Blanca (first contribution)</li>
<li>Kevin Jacobs (first contribution)</li>
<li> Leighton Pritchard</li>
<li> Michiel de Hoon</li>
<li> Peter Cock</li>
<li> Thomas Holder (first contribution)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/05/biopython-release-154/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BioPerl has moved to GitHub</title>
		<link>http://news.open-bio.org/news/2010/05/bioperl-has-moved-to-github/</link>
		<comments>http://news.open-bio.org/news/2010/05/bioperl-has-moved-to-github/#comments</comments>
		<pubDate>Fri, 14 May 2010 04:18:33 +0000</pubDate>
		<dc:creator>Chris Fields</dc:creator>
				<category><![CDATA[BioPerl]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=695</guid>
		<description><![CDATA[<br/>BioPerl has migrated to git and GitHub!  We have also set up a mirror set of several key repositories at the great public git hosting site repo.or.cz. If you are a current BioPerl developer (had a previous account for direct access to our prior Subversion repository), please sign up for a GitHub account and let us [...]]]></description>
			<content:encoded><![CDATA[<br/><p>BioPerl has migrated to <a href="http://git-scm.com/">git</a> and <a href="http://github.com/bioperl">GitHub</a>!  We have also set up a mirror set of several key repositories at the great public git hosting site<a href="http://repo.or.cz/w"> repo.or.cz</a>.</p>
<p>If you are a current BioPerl developer (had a previous account for direct access to our prior Subversion repository), please sign up for a GitHub account and let us know your user ID.  Also, add the extra email <a href="http://news.open-bio.org/news/wp-content/uploads/2010/05/generic.jpg"><img class="alignnone size-full wp-image-703" src="http://news.open-bio.org/news/wp-content/uploads/2010/05/generic.jpg" alt="" width="137" height="15" /></a> (where &#8216;DEVNAME&#8217; is your <strong>original Subversion account ID</strong>).  This should map any previous commits from the older Subversion and CVS repository to your new GitHub account.</p>
<p>The following are ways everyone can download the latest code.</p>
<h2>Using git</h2>
<p>Note you can replace &#8216;bioperl-live.git&#8217; with any of the repository names (bioperl-db, bioperl-run, etc).  For BioPerl developers (GitHub collaborators) you have a choice of SSH or HTTP:</p>
<pre><span style="font-family: Consolas, Monaco, 'Courier New', Courier, monospace;line-height: 18px;font-size: 12px">  git clone git@github.com:bioperl/bioperl-live.git</span></pre>
<pre><span style="font-family: Consolas, Monaco, 'Courier New', Courier, monospace;line-height: 18px;font-size: 12px">  git clone https://bioperl@github.com/bioperl/bioperl-live.git</span></pre>
<p>The open read-only link (for everyone):</p>
<pre>  git clone git://github.com/bioperl/bioperl-live.git</pre>
<p>or using the mirror site:</p>
<pre><code>  git clone git://repo.or.cz/bioperl-live.git</code></pre>
<h2>Using SVN (read-only)</h2>
<p>We will also support read-only access to GitHub with Subversion.  We may allow write support at some later point.  To use svn:</p>
<pre>  svn checkout http://svn.github.com/bioperl/bioperl-live.git</pre>
<h2>Direct downloads</h2>
<p>Tagged releases can be found here:</p>
<p><a href="http://github.com/bioperl/bioperl-live/downloads">http://github.com/bioperl/bioperl-live/downloads</a></p>
<p>The latest source code here:</p>
<p><a href="http://github.com/bioperl/bioperl-live/archives/master">http://github.com/bioperl/bioperl-live/archives/master</a></p>
<h2><strong>Forking BioPerl and Pull Requests</strong></h2>
<p>We intend on using git and GitHub to their fullest.  With that in mind, we encourage users to <a href="http://help.github.com/forking/">fork</a> BioPerl code, make changes, commit them to your forked repository, and submit <a href="http://github.com/guides/pull-requests">pull requests</a>.</p>
<h2>Documentation</h2>
<p>We&#8217;re also in the process of updating our local developer documents for help with those who haven&#8217;t used git before.  In particular, we have a <a href="http://www.bioperl.org/wiki/Using_Git">Using Git</a> page, and have added <a href="http://www.bioperl.org/wiki/Tracking_Git_commits">RSS feeds</a> for our repository commits.</p>
<p>Enjoy!</p>
<p>chris</p>
<p><strong>Update: </strong>SVN version fixed, thanks to DaveMessina++ for pointing it out.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/05/bioperl-has-moved-to-github/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Illumina FASTQ files &#8211; Read Segment Quality Control Indicator</title>
		<link>http://news.open-bio.org/news/2010/04/illumina-q2-trim-fastq/</link>
		<comments>http://news.open-bio.org/news/2010/04/illumina-q2-trim-fastq/#comments</comments>
		<pubDate>Fri, 30 Apr 2010 07:49:45 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[HOWTO]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[FASTQ]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=677</guid>
		<description><![CDATA[<br/>In another quirk to the FASTQ story, recent Illumina FASTQ files don&#8217;t actually use the full range of PHRED scores &#8211; and a score of 2 has a special meaning, The Read Segment Quality Control Indicator (RSQCI, encoded as &#8216;B&#8217;). Hats off to Dr Torsten Seemann for raising awareness of this issue in his post [...]]]></description>
			<content:encoded><![CDATA[<br/><p>In another quirk to the <a href="http://news.open-bio.org/news/2009/12/nar-fastq-format/">FASTQ story</a>, recent Illumina FASTQ files don&#8217;t actually use the full range of PHRED scores &#8211; and a score of 2 has a special meaning, <i>The Read Segment Quality Control Indicator</i> (RSQCI, encoded as &#8216;B&#8217;).</p>
<p>Hats off to <i>Dr Torsten Seemann</i> for raising awareness of this issue in <a href="http://seqanswers.com/forums/showpost.php?p=17491&#038;postcount=3">his post on the seqanswers.com forum</a>, referring to a presentation by <i>Tobias Mann</i> of Illumina which says:</p>
<blockquote><p><i>The Read Segment Quality Control Indicator:</p>
<ul>
<li>At the ends of some reads, quality scores are unreliable. Illumina has an algorithm for identifying these unreliable runs of quality scores, and we use a special indicator to flag these portions of reads
</li>
<li>A quality score of 2, encoded as a &#8220;B&#8221;, is used as a special indicator. A quality score of 2 does not imply a specific error rate, but rather implies that the marked region of the read should not be used for downstream analysis.
</li>
<li>Some reads will end with a run of B (or Q2) basecalls, but there will never  be an isolated Q2 basecall.
</li>
</ul>
<p></i><i></i></p></blockquote>
<p>So, armed with this knowledge, you might want to apply a simple trimming criteria to any Illumina FASTQ files &#8211; remove anything after and including a PHRED quality score of 2 (encoded as ASCII &#8216;B&#8217;).</p>
<p>We could do this with the rich object orientated <tt>SeqRecord</tt> based API in Biopython, but when <a href="http://news.open-bio.org/news/2009/09/biopython-fast-fastq/">dealing with massive FASTQ files</a> this overhead matters. Instead we&#8217;ll stick with plain Python strings:</p>
<p><code>
<pre>from Bio.SeqIO.QualityIO import FastqGeneralIterator
handle = open("B_trimmed.fastq", "w")
min_length = 10
for title, seq, qual in FastqGeneralIterator(open("untrimmed.fastq")) :
    #Find the location of the first "B" (PHRED quality 2)
    trim = qual.find("B")
    if trim == -1:
        #No need to trim
        handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
    elif trim >= min_length:
        #Take everything up to the first B
        handle.write("@%s\n%s\n+\n%s\n" % (title, seq[:trim], qual[:trim]))
handle.close()</pre>
<p></code></p>
<p>The above will work fine on any recent Illumina FASTQ files using the RSQCI scheme, but on older Illumina FASTQ files the &#8220;B&#8221; character is just a low quality score &#8211; and can occur even in the middle of a read. Here trimming at the first &#8220;B&#8221; might be unwise. Instead, we can trim any trailing &#8220;B&#8221; characters &#8211; which will do the same thing on RSQCI based FASTQ files where the &#8220;B&#8221; should only appear at the end:</p>
<p><code>
<pre>from Bio.SeqIO.QualityIO import FastqGeneralIterator
handle = open("B_trimmed.fastq", "w")
min_length = 10
for title, seq, qual in FastqGeneralIterator(open("untrimmed.fastq")) :
    qual = qual.rstrip("B") #Remove any trailing B characters
    length = len(qual)
    if length >= min_length:
        seq = seq[:length] #trim to match
        handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
handle.close()</pre>
<p></code></p>
<p>You could easily modify this example to read from stdin and write to stdout (see this <a href="http://www.biopython.org/wiki/Reading_from_unix_pipes">cookbook example</a>), or take filenames as command line arguments to turn this into a general purpose &#8220;FASTQ B-trimming script&#8221;.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/04/illumina-q2-trim-fastq/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Partial sequence files with Biopython</title>
		<link>http://news.open-bio.org/news/2010/04/partial-seq-files-biopython/</link>
		<comments>http://news.open-bio.org/news/2010/04/partial-seq-files-biopython/#comments</comments>
		<pubDate>Tue, 27 Apr 2010 13:03:04 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[HOWTO]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=659</guid>
		<description><![CDATA[<br/>This is another blog post to highlight one of the neat tricks you&#8217;ll be able to do with Biopython 1.54 (which you can help test with the Biopython 1.54 beta release). It is often useful to be able to extract a few records from a larger sequence file &#8211; for example, some sequences of interest [...]]]></description>
			<content:encoded><![CDATA[<br/><p>This is another blog post to highlight one of the neat tricks you&#8217;ll be able to do with Biopython 1.54 (which you can help test with the <a href="http://news.open-bio.org/news/2010/04/biopython-1-54-beta-released/">Biopython 1.54 beta</a> release).</p>
<p>It is often useful to be able to extract a few records from a larger sequence file &#8211; for example, some sequences of interest from a full UniProt or GenBank dump. One obvious way to try to do this is to parse the file into an object representation (i.e. <tt>SeqRecord</tt> objects using <tt>Bio.SeqIO.parse(...)</tt>), filter to pick out the entries you want, and then write them back to disk (using <tt>Bio.SeqIO.write(...)</tt>). However, for complex file formats like GenBank this can be lossy (<tt>Bio.SeqIO</tt> does not support a 100% identical round trip), and Biopython don&#8217;t currently support writing out the SwissProt plain text format used by UniProt. So, that approach won&#8217;t work.</p>
<p><a href="http://news.open-bio.org/news/2009/09/biopython-release-152/">Biopython 1.52</a> introduced a new <a href="http://news.open-bio.org/news/2009/09/biopython-seqio-index/">indexing function</a>, <tt>Bio.SeqIO.index(...)</tt>, which allows a large multi-sequence file to be treated like a Python dictionary &#8211; parsing requested records on request. This has been enhanced for Biopython 1.54 with a method <tt>get_raw(...)</tt> to extract the raw for a record as a string.</p>
<p>How is this useful? Well, take your large (UniProt) file, index it, then extract the records you want and write them to your output file unmodified:</p>
<p><code>
<pre>from Bio import SeqIO
uniprot = SeqIO.index("uniprot_sprot.dat", "swiss")
handle = open("selected.dat", "w")
for acc in ["P33487", "P19801", "P13689", "Q8JZQ5", "Q9TRC7"]:
    handle.write(uniprot.get_raw(acc))
handle.close()
</pre>
<p></code></p>
<p>Another neat use of this functionality would be to sort entries in a sequential file format, and there is an example of that in the <a href="http://biopython.org/DIST/docs/tutorial/Tutorial.html">Biopython Tutorial</a> (<a href="http://biopython.org/DIST/docs/tutorial/Tutorial.pdf">PDF</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/04/partial-seq-files-biopython/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Making Biopython SeqIO and AlignIO easier</title>
		<link>http://news.open-bio.org/news/2010/04/biopython-seqio-and-alignio-easier/</link>
		<comments>http://news.open-bio.org/news/2010/04/biopython-seqio-and-alignio-easier/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 16:37:11 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=643</guid>
		<description><![CDATA[<br/>One of the small changes coming in Biopython 1.54 (which you can try out already using the Biopython 1.54 beta) is to Bio.SeqIO and Bio.AlignIO. Previously the input and output functions had required file handles, but they will now also accept filenames. This is a case of practicality beats purity (to quote the Zen of [...]]]></description>
			<content:encoded><![CDATA[<br/><p>One of the small changes coming in Biopython 1.54 (which you can try out already using the  <a href="http://news.open-bio.org/news/2010/04/biopython-1-54-beta-released/">Biopython 1.54 beta</a>) is to <a href="http://www.biopython.org/wiki/SeqIO">Bio.SeqIO</a> and <a href="http://www.biopython.org/wiki/AlignIO">Bio.AlignIO</a>. Previously the input and output functions had required file <em>handles</em>, but they will now also accept <em>filenames</em>.</p>
<p>This is a case of <em>practicality beats purity</em> (to quote <a href="http://www.python.org/dev/peps/pep-0020/">the Zen of Python</a>), and is particularly handy when doing very short scripts or working at the Python prompt.</p>
<p>For example, filtering a FASTA file to take only entries with a minimum length of 100 can be done like this (with handles):</p>
<p><code>from Bio import SeqIO<br />
in_handle = open("example.fasta", "rU")<br />
out_handle = open("long.fasta", "w")<br />
records = (rec for rec in SeqIO.parse(in_handle, "fasta") if len(rec)>100)<br />
SeqIO.write(records, out_handle, "fasta")<br />
in_handle.close()<br />
out_handle.close()</code></p>
<p>Using filenames it becomes much more concise &#8211; just three lines:</p>
<p><code>from Bio import SeqIO<br />
records = (rec for rec in SeqIO.parse("example.fasta", "fasta") if len(rec)>100)<br />
SeqIO.write(records, "long.fasta", "fasta")</code></p>
<p>This also means Python and Biopython beginners can postpone learning about file handles a little longer, although that may not be an entirely good thing <img src='http://news.open-bio.org/news/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/04/biopython-seqio-and-alignio-easier/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Biopython 1.54 beta released</title>
		<link>http://news.open-bio.org/news/2010/04/biopython-1-54-beta-released/</link>
		<comments>http://news.open-bio.org/news/2010/04/biopython-1-54-beta-released/#comments</comments>
		<pubDate>Fri, 02 Apr 2010 17:19:55 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=629</guid>
		<description><![CDATA[<br/>A beta release for Biopython 1.54 is now available for download and testing.
]]></description>
			<content:encoded><![CDATA[<br/><p>A <em>beta</em> release for Biopython 1.54 is now available for download and testing.</p>
<p>Since <a href="http://news.open-bio.org/news/2009/12/biopython-release-153/">Biopython 1.53</a> was released at the end of last year, several new features and more documentation have been added, plus the usual bug fixes. For full details see the <a href="http://biopython.open-bio.org/SRC/biopython/NEWS">NEWS file</a>.</p>
<p>All the new features have been tested by the dev team but it’s possible there are cases that we haven’t been able to foresee and test, especially for the updated multiple sequence alignment object (which is what you&#8217;ll now get when parsing alignments with <a href="http://biopython.org/wiki/AlignIO">Bio.AlignIO</a>), the <a href="http://biopython.org/wiki/Phylo">new Bio.Phylo module</a>, and the <a href="http://biopython.org/wiki/SeqIO">Bio.SeqIO</a> support for Standard Flowgram Format (SFF) files.</p>
<p>Source distributions and Windows installers are available from the <a href="http://biopython.org/wiki/Download">downloads page</a> on the Biopython website (<a href="http://biopython.org/">biopython.org</a>).</p>
<p>We are interested in getting feedback on the beta release as a whole, but especially on the new features and the <a href="http://biopython.org/DIST/docs/tutorial/Tutorial.html">Biopython Tutorial and Cookbook</a> (<a href="http://biopython.org/DIST/docs/tutorial/Tutorial.pdf">PDF</a>).</p>
<p>(At least) 10 people contributed to this release (so far), which includes 4 new people:</p>
<ul>
<li>Anne Pajon (first contribution)
</li>
<li>Brad Chapman
</li>
<li>Christian Zmasek
</li>
<li>Eric Talevich
</li>
<li>Jose Blanca (first contribution)
</li>
<li>Kevin Jacobs (first contribution)
</li>
<li>Leighton Pritchard
</li>
<li>Michiel de Hoon
</li>
<li>Peter Cock
</li>
<li>Thomas Holder (first contribution)
</li>
</ul>
<p>So, gather your courage, download the release, try it out and please let us know what works and what doesn’t through the <a href="http://biopython.org/wiki/Mailing_lists">mailing lists</a> (or <a href="http://bugzilla.open-bio.org/">bugzilla</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/04/biopython-1-54-beta-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BioPerl at GMOD Meeting 2010</title>
		<link>http://news.open-bio.org/news/2010/01/bioperl-at-gmod-meeting-2010/</link>
		<comments>http://news.open-bio.org/news/2010/01/bioperl-at-gmod-meeting-2010/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 03:58:58 +0000</pubDate>
		<dc:creator>Chris Fields</dc:creator>
				<category><![CDATA[BioPerl]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=603</guid>
		<description><![CDATA[<br/>BioPerl developers and users attended the BioPerl satellite meeting on January 13th, just prior to the GMOD Meeting.  Several items were covered on the agenda: In order to start addressing whole genome data with more lightweight objects, we are planning on setting up a lightweight Bio::SeqI object that has a flexible DB backend (i.e. Bio::DB::SeqFeature::Store or [...]]]></description>
			<content:encoded><![CDATA[<br/><p>BioPerl developers and users attended the <a href="http://www.bioperl.org/wiki/GMOD_2010_Meeting">BioPerl satellite meeting</a> on January 13th, just prior to the <a href="http://gmod.org/wiki/January_2010_GMOD_Meeting">GMOD Meeting</a>.  Several items were covered on the agenda:</p>
<ul>
<li>In order to start addressing whole genome data with more lightweight objects, we are planning on setting up a lightweight Bio::SeqI object that has a flexible DB backend (i.e. Bio::DB::SeqFeature::Store or similar).  We are also contemplating adding lazy parsing for some parsers, possibly using the Bio::PullParserI methods (or similar) that Sendu Bala created.</li>
<li>After a final  1.6 branch point release, we may &#8216;freeze&#8217; BioPerl in a maintenance mode, primarily so that we can reorganize core into several more easily installed subdistributions on a branch.  New modules will essentially be additional separate repos that will depend on BioPerl core.  This reorganization has been discussed for a few years now, and as we edge closer to starting this (probably this spring) we&#8217;ll announce more details.</li>
<li>Some initial thoughts on how to handle circular genomes more efficiently.  We essentially do this already, but it isn&#8217;t full-proof.</li>
<li>Need some significant time dedicated towards GFF3-based coding (reimplement FeatureIO but allow some flexibility).  Rob Buels had started the initial run at splitting out FeatureIO, so next step is a true reimplementation.</li>
<li>We don&#8217;t plan on including Moose support for the immediate future, feeling that it would be better to reimplement some of the classes from scratch using Moose and similar as a BioPerl 2.0, or possibly await the impending Rakudo Perl 6 alpha and start afresh using that instead of Moose.</li>
</ul>
<p>Anything we missed?  Anything you would like to address?  Please add comments and we&#8217;ll discuss them on list.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2010/01/bioperl-at-gmod-meeting-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BioRuby 1.4.0 released</title>
		<link>http://news.open-bio.org/news/2009/12/bioruby-1-4-0-released/</link>
		<comments>http://news.open-bio.org/news/2009/12/bioruby-1-4-0-released/#comments</comments>
		<pubDate>Tue, 29 Dec 2009 10:22:25 +0000</pubDate>
		<dc:creator>Naohisa Goto</dc:creator>
				<category><![CDATA[BioRuby]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[OBF]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[FASTQ]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=588</guid>
		<description><![CDATA[<br/>We are pleased to announce the release of BioRuby 1.4.0. This new release contains many new features, in addition to bug fixes and improvements. PhyloXML support: Support for reading and writing PhyloXML file format is added, developed by Diana Jaunzeikare, mentored by Christian M Zmasek and co-mentors, supported by Google Summer of Code 2009 in [...]]]></description>
			<content:encoded><![CDATA[<br/><p>We are pleased to announce the release of BioRuby 1.4.0. This new release contains many new features, in addition to bug fixes and improvements.</p>
<p><strong>PhyloXML support:</strong> Support for reading and writing PhyloXML file format is added, developed by Diana Jaunzeikare, mentored by Christian M Zmasek and co-mentors, supported by Google Summer of Code 2009 in<br />
collaboration with the National Evolutionary Synthesis Center (NESCent).</p>
<p><strong>FASTQ file format support: </strong>Support for reading and writing FASTQ file format is added. All of the three FASTQ format variants are supported. The code is written by Naohisa Goto, with the help of discussions in the<br />
open-bio-l mailing list. The prototype of Bio::Fastq class was first developed during the BioHackathon 2009 held in Okinawa.</p>
<p><strong>DNA chromatogram support: </strong>Support for reading DNA chromatogram files are added. SCF and ABIF file formats are supported. The code is developed by Anthony Underwood.</p>
<p><strong>MEME (motif-based sequence analysis tools) support: </strong>Support for running MAST (Motif Alignment &amp; Search Tool, part of the MEME Suite, motif-based sequence analysis tools) and parsing its results are added,  developed by Adam Kraut.</p>
<p><strong> Improvement of KEGG parser classes: </strong>Some new methods are added to parse new fields added to some KEGG file formats. Unit tests for KEGG parsers are also added and improved. These are contributed by Kozo Nishida.</p>
<p><strong>Many sample scripts are added:</strong> Many sample scripts showing demonstrations of usages of classes are added. They were originally primitive test codes written in the &#8220;if __FILE__ == $0&#8243; convention.</p>
<p><strong>Unit tests can test installed BioRuby: </strong>Mechanism to load library and to find test data in the unit tests are changed, and target library path and test data path can be changed with environment variables.</p>
<p><strong>Incompatible change: Bio::NCBI::REST needs email address:</strong> NCBI announced that all Entrez Utilities (E-utilities)  requests must contain email and tool parameters, and requests without them will return error after June 2010. In BioRuby, to set default email address and tool name, following methods are added.</p>
<ul>
<li>Bio::NCBI.default_email=(email)</li>
<li>Bio::NCBI.default_tool=(tool_name)</li>
</ul>
<p>Note that no default email address is preset in BioRuby. Library users must set their own email address or implement to get user&#8217;s email address in some way (from input form, configuration file, etc).</p>
<p>In addition, many changes have been made, including incompatible changes. For more information, see <a href="http://github.com/bioruby/bioruby/blob/1.4.0/RELEASE_NOTES.rdoc" target="_blank">RELEASE_NOTES.rdoc</a> and <a href="http://github.com/bioruby/bioruby/blob/1.4.0/ChangeLog" target="_blank">ChangeLog</a>.</p>
<p>The archive is available at:<a href="http://bioruby.org/archive/bioruby-1.4.0.tar.gz"> http://bioruby.org/archive/bioruby-1.4.0.tar.gz</a></p>
<p>Gem file is also available at:  <a href="http://bioruby.org/archive/gems/bio-1.4.0.gem">http://bioruby.org/archive/gems/bio-1.4.0.gem</a></p>
<p>We also put RubyGems pacakge at RubyForge and Gemcutter. You can easily install by using RubyGems.<br />
First, check the version number by using search command:<br />
% gem search &#8211;remote bio<br />
and find &#8220;bio (1.4.0)&#8221; in the list. Then,<br />
% sudo gem install bio</p>
<p>Hope you enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2009/12/bioruby-1-4-0-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Biopython 1.53 released</title>
		<link>http://news.open-bio.org/news/2009/12/biopython-release-153/</link>
		<comments>http://news.open-bio.org/news/2009/12/biopython-release-153/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 16:57:53 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[OBDA / BioSQL]]></category>
		<category><![CDATA[OBF Projects]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=555</guid>
		<description><![CDATA[<br/>We are pleased to announce the availability of Biopython 1.53, a new stable release of the Biopython library, three months after the release of Biopython 1.52. This is our first release since migrating from CVS to git for source code control. There have been some additions to our core objects &#8211; the Seq (and related [...]]]></description>
			<content:encoded><![CDATA[<br/><p>We are pleased to announce the availability of Biopython 1.53, a new stable release of the Biopython library, three months after the release of <a href="http://news.open-bio.org/news/2009/09/biopython-release-152/">Biopython 1.52</a>. This is our first release since <a href="http://news.open-bio.org/news/2009/09/biopython-cvs-to-git-migration/">migrating from CVS to git</a> for source code control.</p>
<p>There have been some additions to our core objects &#8211; the <a href="http://biopython.org/wiki/Seq">Seq</a> (and related UnknownSeq) objects gained upper and lower methods (like the string methods of the same name but alphabet aware) plus a new ungap method. The SeqFeature object now has an extract method to get the region of sequence it describes (useful for getting CDS nucleotide sequences from GenBank files). Also <a href="http://biopython.org/wiki/SeqRecord">SeqRecord</a> objects now support addition, giving a new SeqRecord with the combined sequence, all the SeqFeatures, and any common annotation.</p>
<p>SQLite support (built into Python 2.5+) was added to our <a href="http://biopython.org/wiki/BioSQL">BioSQL interface</a> (Brad Chapman). This is still a little experimental as we are using a draft BioSQL SQLite schema, but this should be merged into the next <a href="http://www.biosql.org">BioSQL</a> release.</p>
<p>Biopython now includes wrappers for the new NCBI BLAST C++ tools, which will be replacing the old NCBI &#8220;legacy&#8221; BLAST tools written in C. The plain text BLAST parser has been updated to cope as well. Nevertheless, we (and the NCBI) still recommend using the XML output for parsing.</p>
<p>Bio.Entrez includes the <a href="http://www.nlm.nih.gov/bsd/licensee/announce/2009.html#d09_17">new (Jan 2010) DTD files</a> from the NCBI for parsing MedLine/PubMed data.</p>
<p>The NCBI codon tables have been updated from version 3.4 to 3.9, which adds a few extra start codons, and a few new tables (Tables 16, 21, 22 and 23).</p>
<p>The restriction enzyme list in Bio.Restriction has been updated to the Nov 2009 release of <a href="http://rebase.neb.com/rebase/rebase.html">REBASE</a>.</p>
<p>The Bio.PDB parser and output code has been updated to understand the element column in ATOM and HETATM lines (based on patches contributed by Hongbo Zhu and Frederik Gwinner), and Bio.PDB.PDBList has been updated for recent changes to the PDB FTP site (Paul T. Bathen).</p>
<p>Finally, support for running Biopython under <a href="http://www.jython.org">Jython</a> (using the Java Virtual Machine) has been much improved (with input from Kyle Ellrott). Note that Jython does not support C code, and currently Jython does not parse DTD files (<a href="http://bugs.jython.org/issue1447">Jython Issue 1447</a>; needed for the Bio.Entrez XML parser). However, most of the Biopython modules seem fine from testing with Jython 2.5.0 and 2.5.1.</p>
<p>Sources and Windows Installers are available from our <a href="http://www.biopython.org/wiki/Download">downloads</a> page.</p>
<p>Thanks to the Biopython development team and to everyone who has reported bugs or contributed patches since our last release.</p>
<p>(At least) 12 people contributed to this release, including 3 first timers:</p>
<ul>
<li>Bartek Wilczynski</li>
<li>Brad Chapman</li>
<li>Chris Lasher</li>
<li>Cymon Cox</li>
<li>Frank Kauff</li>
<li>Frederik Gwinner (first contribution)</li>
<li>Hongbo Zhu (first contribution)</li>
<li>Kyle Ellrott</li>
<li>Leighton Pritchard</li>
<li>Michiel de Hoon</li>
<li>Paul Bathen (first contribution)</li>
<li>Peter Cock</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2009/12/biopython-release-153/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interleaving paired FASTQ files with Biopython</title>
		<link>http://news.open-bio.org/news/2009/12/interleaving-paired-fastq-files-with-biopython/</link>
		<comments>http://news.open-bio.org/news/2009/12/interleaving-paired-fastq-files-with-biopython/#comments</comments>
		<pubDate>Mon, 14 Dec 2009 14:03:41 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Biopython]]></category>
		<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Community]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[HOWTO]]></category>
		<category><![CDATA[OBF Projects]]></category>
		<category><![CDATA[FASTQ]]></category>

		<guid isPermaLink="false">http://news.open-bio.org/news/?p=513</guid>
		<description><![CDATA[<br/>This post is about paired end data (FASTA or FASTQ) and manipulating it with Biopython&#8217;s Bio.SeqIO module (see also FASTQ conversions &#38; speeding up FASTQ). There are two main ways of presenting paired end data in FASTA or FASTQ files: Paired files, with matching entries for the forward and reverse reads (probably the norm with [...]]]></description>
			<content:encoded><![CDATA[<br/><p>This post is about paired end data (FASTA or FASTQ) and manipulating it with Biopython&#8217;s <a href="http://biopython.org/wiki/SeqIO">Bio.SeqIO</a> module (see also <a href="http://news.open-bio.org/news/2009/09/biopython-convert-function/">FASTQ conversions</a> &amp; <a href="http://news.open-bio.org/news/2009/09/biopython-fast-fastq/">speeding up FASTQ</a>).<br />
<span id="more-513"></span></p>
<p>There are two main ways of presenting paired end data in FASTA or FASTQ files:</p>
<ul>
<li>Paired files, with matching entries for the forward and reverse reads (probably the norm with Illumina data)</li>
<li>Single files, with alternating entries for the forward and reverse reads (used by Velvet)</li>
</ul>
<p>Converting between these two is a relatively common operation, and is normally pretty easy. There was a <a href="http://lists.open-bio.org/pipermail/biopython/2009-September/005584.html">short example</a> of how you might do this in Biopython on a recent (September 2009) Velvet users/Biopython mailing list discussion. That script didn&#8217;t check the record IDs matched up (but neither does the Perl script shuffleSequences_fastq.pl included with Velvet for this task).</p>
<p>It would be safer to check the record IDs do match. However, there are several different naming schemes for reads, most typically suffixes of <tt>/1</tt> and <tt>/2</tt>, but also things like <tt>.f</tt> and <tt>.r</tt> get used. In the case of FASTQ files from the NCBI SRA, the reads have no suffixes, so to feed those into Velvet you may want to check they are equal and then add a suffix as shown below.</p>
<p><code> </code></p>
<pre>#This Python script requires Biopython 1.51 or later
from Bio import SeqIO
import itertools

#Setup variables (could parse command line args instead)
file_f = "SRR001666_1.fastq"
file_r = "SRR001666_2.fastq"
file_out = "SRR001666_interleaved.fastq"
format = "fastq" #or "fastq-illumina", or "fasta", or ...

def interleave(iter1, iter2) :
    for (forward, reverse) in itertools.izip(iter1,iter2):
        assert forward.id == reverse.id
        forward.id += "/1"
        reverse.id += "/2"
        yield forward
        yield reverse

records_f = SeqIO.parse(open(file_f,"rU"), format)
records_r = SeqIO.parse(open(file_r,"rU"), format)

handle = open(file_out, "w")
count = SeqIO.write(interleave(records_f, records_r), handle, format)
handle.close()
print "%i records written to %s" % (count, file_out)
</pre>
<p>&nbsp;</p>
<p>This example uses the <a href="http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=viewer&amp;m=data&amp;s=viewer&amp;run=SRR001666">SRR001666</a> files from the <a href="ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX000/SRX000430/">NCBI SRA FTP site</a>.</p>
<p>Now that works fine, and just by changing the filenames and the format name this could be used on FASTA data (or another supported file format). The bad news is it took 14 minutes to produce a 2GB FASTQ. However, going a little more low-level <a href="http://news.open-bio.org/news/2009/09/biopython-fast-fastq/">as discussed before</a> can really pay off. This FASTQ-only version takes just 2 minutes:</p>
<p><code> </code></p>
<pre>#This Python script requires Biopython 1.51 or later
from Bio.SeqIO.QualityIO import FastqGeneralIterator
import itertools

#Setup variables (could parse command line args instead)
file_f = "SRR001666_1.fastq"
file_r = "SRR001666_2.fastq"
file_out = "SRR001666_interleaved.fastq"

handle = open(file_out, "w")
count = 0

f_iter = FastqGeneralIterator(open(file_f,"rU"))
r_iter = FastqGeneralIterator(open(file_r,"rU"))
for (f_id, f_seq, f_q), (r_id, r_seq, r_q) \
in itertools.izip(f_iter,r_iter):
    assert f_id == r_id
    count += 2
    #Write out both reads with "/1" and "/2" suffix on ID
    handle.write("@%s/1\n%s\n+\n%s\n@%s/2\n%s\n+\n%s\n" \
                 % (f_id, f_seq, f_q, r_id, r_seq, r_q))
handle.close()
print "%i records written to %s" % (count, file_out)</pre>
<p>&nbsp;</p>
<p>You can make this a little faster still by missing out most of the validation done by the Biopython FASTQ parser &#8211; but personally I wouldn&#8217;t take that risk. I&#8217;d much rather know about any errors in the data.</p>
<p style="text-align: right;">Peter</p>
<p>P.S.</p>
<p>Things get more interesting if you want to do quality filtering or trimming. If only one of a pair passes the quality assurance step, then you may want to keep it and treat it as an unpaired read. To give such cleaned up data to Velvet, you would need one file of alternating paired end reads, and a separate file of the orphaned effectively unpaired reads. That deserves another post going into more detail&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://news.open-bio.org/news/2009/12/interleaving-paired-fastq-files-with-biopython/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
