Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Fasta Files
#1
Hello everybody,

i'm new in programming and its the first time i use python. I'm working on a code that should read a fasta file and delete the header of each sequence.
My code to read the file:

def read_fasta(inputfile):
    with open(inputfile,'r') as f:
        file=f.readlines()
        f.close
        return file

fasta_file=read_fasta('SELEX_100_reads.txt')  

print(fasta_file)
The output of fasta file looks like that:
Output:
['@DBV2SVN1:110:B:7:1101:1456:2092\n', 'CTAAAAAGCGAGTGCGNCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNANNNNNNCNNNNNNNNAAACANNAAGGTAAGAAACAAGCACAGATGAGAGC\n', '\n', '+\n', '#####################################################################################################\n', '\n', '@DBV2SVN1:110:B:7:1101:2491:2141\n', 'AAGTGAGCAAACAGAAACATAGTGCGGAGTGGGAAAATGAGACTCAAAAAAAGAGTGTGGGTATTCAGTAGGGGATATTAGGCCACAATACGAAAGAGCAA\n', '\n', '+\n', '#####################################################################################################\n', '\n', '@DBV2SVN1:110:B:7:1101:2924:2130\n'......]
it's a list with header for each sequence. therefore i just want the DNA sequences (CTAAAA or AAGTAAAGCA) of each line as a list.
Can anyone help me with that ?
Thanks a lot

Cheers,
John
Reply
#2
So how do you think you can approach the problem?
Reply
#3
i would do it with a loop, and ignore every line if it's not a sequence.

Like that:

for i in file:
    list_=[]
    if ... ='A' or 'T' : 
      new_list=append.list_      .... ( if the line  start with an A,T,C, or G then append to my list) 
i dont know how to write that as a code.
Reply
#4
You could use the following:
https://docs.python.org/3/library/stdtyp...startswith Wrote:str.startswith(prefix[, start[, end]])
Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.
Reply
#5
Links to some Fasta code I wrote, can't remember exactly what's here, but expect that I have dealt with headers somewhere in the following:
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Please suggest python code to format DNA sequence FASTA file rajamdade 4 3,117 Oct-24-2019, 04:36 AM
Last Post: rajamdade

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020