Python Forum

Full Version: Fasta Files
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everybody,

i'm new in programming and its the first time i use python. I'm working on a code that should read a fasta file and delete the header of each sequence.
My code to read the file:

def read_fasta(inputfile):
    with open(inputfile,'r') as f:
        file=f.readlines()
        f.close
        return file

fasta_file=read_fasta('SELEX_100_reads.txt')  

print(fasta_file)
The output of fasta file looks like that:
Output:
['@DBV2SVN1:110:B:7:1101:1456:2092\n', 'CTAAAAAGCGAGTGCGNCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNANNNNNNCNNNNNNNNAAACANNAAGGTAAGAAACAAGCACAGATGAGAGC\n', '\n', '+\n', '#####################################################################################################\n', '\n', '@DBV2SVN1:110:B:7:1101:2491:2141\n', 'AAGTGAGCAAACAGAAACATAGTGCGGAGTGGGAAAATGAGACTCAAAAAAAGAGTGTGGGTATTCAGTAGGGGATATTAGGCCACAATACGAAAGAGCAA\n', '\n', '+\n', '#####################################################################################################\n', '\n', '@DBV2SVN1:110:B:7:1101:2924:2130\n'......]
it's a list with header for each sequence. therefore i just want the DNA sequences (CTAAAA or AAGTAAAGCA) of each line as a list.
Can anyone help me with that ?
Thanks a lot

Cheers,
John
So how do you think you can approach the problem?
i would do it with a loop, and ignore every line if it's not a sequence.

Like that:

for i in file:
    list_=[]
    if ... ='A' or 'T' : 
      new_list=append.list_      .... ( if the line  start with an A,T,C, or G then append to my list) 
i dont know how to write that as a code.
You could use the following:
https://docs.python.org/3/library/stdtyp...startswith Wrote:str.startswith(prefix[, start[, end]])
Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.
Links to some Fasta code I wrote, can't remember exactly what's here, but expect that I have dealt with headers somewhere in the following: