Python Forum
finding items/comparison in/with a dictionary
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
finding items/comparison in/with a dictionary
#11
by delimiter I mean what separates ID from value in the first file. It looks like the ID is everything left of first space...
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#12
They are attached.

Annotation.txt file is file 1

Protein_file.txt file is file 2 (I had to change the file type and jsut save a fraction of it since the .faa extension was not accepted, and then it was too large).
Reply
#13
annotations = {}
with open ('annotation.txt', 'r') as f:
    for line in f:
        data = (line.split())
        annotations[data[0]] = line

with open('protein_file.txt', 'r') as f, open('proteinandannotation.faa', 'w', newline='') as faa:
    for line in f:
        if line.startswith('>'):
            line = line.strip()
            protein_id = line[1:]
            annotation = annotations.get(protein_id, protein_id) # if id is missing in annotations it will just return the id
            faa.write('>{}'.format(annotation))
        else:
            faa.write(line)
I think the output is what is needed. It's simple, but works. Did you have a look at biopython? it may offer some tools to work with faa files...

EDIT - replaced in code - id with protein_id
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#14
I just tried and it worked! I had to take out the newline='' out from line 7 as python returned that it was an invalid keyword argument for this function.

Yes, I need to check Biopython. This was my first attempt at using a dictionary, and I was pretty excited that I had actually generated one. Now I'm a bit deflated that I was so off, but I will get over it.

Thanks again!
Reply
#15
I didn't notice you are using python 2. newline keyword was introduced in python3. if using python2 you need to open the faa file for writing in wb mode to avoid extra new lines on windows. what OS do you use? by the way it's better to work with python3, because official support for python2 will end soon.
with respect to dict it was small mistake, so don't worry
as to the algorithm - it's more matter of practice

here is updated code

annotations = {}
with open ('annotation.txt', 'r') as f:
    for line in f:
        data = (line.split())
        annotations[data[0]] = line


with open('protein_file.txt', 'r') as f, open('proteinandannotation.faa', 'wb') as faa:
    for line in f:
        if line.startswith('>'):
            line = line.strip()
            protein_id = line[1:]
            annotation = annotations.get(protein_id, '{}\n'.format(protein_id)) # if id is missing in annotations it will just return the id
            faa.write('>{}'.format(annotation))
        else:
            faa.write(line)
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#16
Thanks again!

I've studied your code and it seems like .get is the key here. That makes it so much easier than trying to loop and match, etc.

It works very well!

Thanks!
Reply
#17
dict.get(key, default) is used to retrieve the value for key from dict and supply default value if key is not present in the dict
you can do just
annotation = annotations[protein_id]
but if protein_id is missing in the dict you will get KeyError
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#18
Nice, thanks.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Finding combinations of list of items (30 or so) LynnS 1 866 Jan-25-2023, 02:57 PM
Last Post: deanhystad
  how to assign items from a list to a dictionary CompleteNewb 3 1,561 Mar-19-2022, 01:25 AM
Last Post: deanhystad
  Calculating frequency of items in a dictionary markellefultz20 1 1,719 Nov-27-2019, 04:21 AM
Last Post: scidam
  Python find the minimum length of string to differentiate dictionary items zydjohn 3 3,606 Mar-03-2018, 05:23 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020