Hello everybody,
i'm new in programming and its the first time i use python. I'm working on a code that should read a fasta file and delete the header of each sequence.
My code to read the file:
Can anyone help me with that ?
Thanks a lot
Cheers,
John
i'm new in programming and its the first time i use python. I'm working on a code that should read a fasta file and delete the header of each sequence.
My code to read the file:
def read_fasta(inputfile): with open(inputfile,'r') as f: file=f.readlines() f.close return file fasta_file=read_fasta('SELEX_100_reads.txt') print(fasta_file)The output of fasta file looks like that:
Output:['@DBV2SVN1:110:B:7:1101:1456:2092\n', 'CTAAAAAGCGAGTGCGNCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNANNNNNNCNNNNNNNNAAACANNAAGGTAAGAAACAAGCACAGATGAGAGC\n', '\n', '+\n', '#####################################################################################################\n', '\n', '@DBV2SVN1:110:B:7:1101:2491:2141\n', 'AAGTGAGCAAACAGAAACATAGTGCGGAGTGGGAAAATGAGACTCAAAAAAAGAGTGTGGGTATTCAGTAGGGGATATTAGGCCACAATACGAAAGAGCAA\n', '\n', '+\n', '#####################################################################################################\n', '\n', '@DBV2SVN1:110:B:7:1101:2924:2130\n'......]
it's a list with header for each sequence. therefore i just want the DNA sequences (CTAAAA or AAGTAAAGCA) of each line as a list. Can anyone help me with that ?
Thanks a lot
Cheers,
John