Python Forum
To make an algorithm work faster
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
To make an algorithm work faster
#1
I am using biopython for dna sequences. I am new in this python library. I have a .fasta file that has a 4-letters dna code, and I want to convert it in 2-letters purines and pyrimidines binary code. So I merge all the segments/records of the .fasta file and I take the full_sequence of 4-letters alphabet. Then I have to convert this alphabet into two letters alphabet new_sequence. And here is the problem! When I am doing the conversion it takes hours to run. The sequence's length is 119750280, so it's a very long sequence. Any ideas to make my program run faster?

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

# merge all the records

full_seq=Seq("")

for seq_record in SeqIO.parse("OMOK01.fasta", "fasta"):
    full_seq+=seq_record.seq

# convert the 4-letters alphabet into binary alphabet

new_seq=Seq("")

for i in range(0,len(full_seq)):
    if (full_seq[i]=="A") or (full_seq[i]=="G"):
        new_seq+=Seq("-")
    else:
        new_seq+=Seq("+")

print("Binary sequence", repr(new_seq))
Reply
#2
You can see if this helps. Your code has to find the offset in the list each time, so if the offset is 10,000, it has to start at the beginning of the list and move forward to the 10,000 record, and then do it all over again for 10,001. This is not terrible for 10,000 records, but you have millions so it does have an effect. The other option is to break full_seq into smaller bites and then combine the resulting lists.

for rec in full_seq:  ## assumes full_seq is iterable
    if rec.startswith(("A", "G")): 
Reply
#3
I finally found that a very fast way to do it is to use something like that:

for seq_record in SeqIO.parse("OMOK01.fasta", "fasta"):
    new_str=str(seq_record.seq).replace("A","+");
    new_str=new_str.replace("G","+");
    new_str=new_str.replace("C","-");
    new_str=new_str.replace("T","-");
Reply
#4
You can also try
table = {ord(k): ord(v) for k, v in {'A': '+', 'G': '+', 'C': '-', 'T': '-'}.items()}
new_str = new_str.translate(table)
or
import re
table = {'A': '+', 'G': '+', 'C': '-', 'T': '-'}
new_str = re.sub(r'[AGCT]', lambda m: table[m.group()], new_str)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  hi need help to make this code work correctly atulkul1985 5 783 Nov-20-2023, 04:38 PM
Last Post: deanhystad
  newbie question - can't make code work tronic72 2 689 Oct-22-2023, 09:08 PM
Last Post: tronic72
  Why do I have to repeat items in list slices in order to make this work? Pythonica 7 1,329 May-22-2023, 10:39 PM
Last Post: ICanIBB
  Make my py script work only on 1 compter tomtom 14 3,852 Feb-20-2022, 06:19 PM
Last Post: DPaul
  Cannot make 'pandas' module to work... ellie145 2 4,197 Jan-05-2021, 09:38 PM
Last Post: ellie145
  Is there anyway to make this work? dre 3 2,168 Nov-26-2020, 12:40 PM
Last Post: jefsummers
  Cannot Make the python Code work ErnestTBass 4 2,676 Apr-23-2020, 02:42 PM
Last Post: snippsat
  if, or, in, else in 1 line - how to make it work? zarize 2 1,853 Sep-10-2019, 04:51 PM
Last Post: zarize
  How can I make a faster search algorithm pianistseb 19 6,578 Apr-18-2019, 05:48 PM
Last Post: Larz60+
  Rewrite a function to make it work with 'bottle-pymysql' nikos 1 1,976 Feb-26-2019, 02:59 PM
Last Post: nikos

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020