Mar-28-2019, 06:58 PM
(This post was last modified: Mar-28-2019, 06:58 PM by pianistseb.)
I am using biopython for dna sequences. I am new in this python library. I have a .fasta file that has a 4-letters dna code, and I want to convert it in 2-letters purines and pyrimidines binary code. So I merge all the segments/records of the .fasta file and I take the full_sequence of 4-letters alphabet. Then I have to convert this alphabet into two letters alphabet new_sequence. And here is the problem! When I am doing the conversion it takes hours to run. The sequence's length is 119750280, so it's a very long sequence. Any ideas to make my program run faster?
from Bio import SeqIO from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord # merge all the records full_seq=Seq("") for seq_record in SeqIO.parse("OMOK01.fasta", "fasta"): full_seq+=seq_record.seq # convert the 4-letters alphabet into binary alphabet new_seq=Seq("") for i in range(0,len(full_seq)): if (full_seq[i]=="A") or (full_seq[i]=="G"): new_seq+=Seq("-") else: new_seq+=Seq("+") print("Binary sequence", repr(new_seq))