Making Biopython SeqIO and AlignIO easier

One of the small changes coming in Biopython 1.54 (which you can try out already using the Biopython 1.54 beta) is to Bio.SeqIO and Bio.AlignIO. Previously the input and output functions had required file handles, but they will now also accept filenames.

This is a case of practicality beats purity (to quote the Zen of Python), and is particularly handy when doing very short scripts or working at the Python prompt.

For example, filtering a FASTA file to take only entries with a minimum length of 100 can be done like this (with handles):

from Bio import SeqIO
in_handle = open("example.fasta", "rU")
out_handle = open("long.fasta", "w")
records = (rec for rec in SeqIO.parse(in_handle, "fasta") if len(rec)>100)
SeqIO.write(records, out_handle, "fasta")
in_handle.close()
out_handle.close()

Using filenames it becomes much more concise – just three lines:

from Bio import SeqIO
records = (rec for rec in SeqIO.parse("example.fasta", "fasta") if len(rec)>100)
SeqIO.write(records, "long.fasta", "fasta")

This also means Python and Biopython beginners can postpone learning about file handles a little longer, although that may not be an entirely good thing ;)

This entry was posted in Biopython, Blogroll, Code, Development, Documentation, OBF, OBF Projects and tagged . Bookmark the permalink.