(Oct-15-2016, 07:43 PM)Larz60+ Wrote: [ -> ]Hello again,
I would like to take a closer look at how this data is exactly laid out.
Is this from the ncbi blast database? and if so, which file.
I would rather be working with an actual file.
Larz60+
Hello. I dont know actually if its from the ncbi database or not I've got the file from someone else and I dont know where they've got it. :s!
How do you attach or send a file here? I cant seem to find it in here :huh:
(Oct-15-2016, 07:35 PM)wavic Wrote: [ -> ]Does this work? It's not I've proposed. It's step by step. If the file doesn't contain something else...
def get_data(f):
data = f.read().split()
ecoli = dict()
e_name = None
for row in data:
if row.startswith(">"):
e_name = row.strip(">")
ecoli[e_name] = ""
else:
ecoli[e_name] = "{}{}".format(ecoli[e_name], row)
return ecoli
I got an error for data=f.read().split() so I tried to change it to data = open(f).read().split() and I got no error but the output were only the values and also not in a dictionary. :-/
(Oct-15-2016, 08:22 PM)Larz60+ Wrote: [ -> ]Probably not a good idea as size could be an issue.
If the name of the file hasn't changed, I should be able to find it.
what name?
here is the blast help file location: https://blast.ncbi.nlm.nih.gov/Blast.cgi...=BlastHelp
The name of the file is "Ecoli.prot.fasta"
OK,
This should do the trick. It's not the most efficient code, but you can clean it up. It works, that's the important thing.
def read_fasta(filename=None):
table_dict = {}
update_dict = False
if filename is not None:
name = ''
value = ''
with open(filename, 'r') as f:
for line in f.readlines():
line = line.strip()
if line[0] == ">":
if update_dict:
table_dict[name] = value
value = ''
name = line[1:]
else:
update_dict = True
if len(line):
value += line
if len(value):
table_dict[name] = value
print(table_dict)
if __name__ == '__main__':
read_fasta('Ecoli.prot.fasta')
Larz60+
Quote:How do you attach or send a file here? I cant seem to find it in here [Image: huh.png]
I posted only the function which is supposed to do the job.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import pprint
def get_data(f):
data = f.read().split()
ecoli = dict()
e_name = None
for row in data:
if row.startswith(">"):
e_name = row.strip(">")
ecoli[e_name] = ""
else:
ecoli[e_name] = "{}{}".format(ecoli[e_name], row)
return ecoli
def main():
with open("ecoli.txt") as in_file:
pprint.pprint(get_data(in_file))
if __name__ == '__main__':
sys.exit(main())
Try @
Larz60's solution first. He is a real programmer. I code for fun. :surfing: