Python Forum

Pages: 1 2

Hey guys,

I have a huge spreadsheet that I am attempting to search through for some specific data.

On the one hand I have IDs like this:

Y00988-11
G01024-14
Z01933-13

And on the other hand I have a massive spreadsheet(CSV) in the following format:

Run,Sample,Source,Rate,
DFT,G01024-14,A,High
DFT,U04424-15,B,Low
TFF,T64673-18,A,Low
RRT,I01324-14,A,High
RRT,J01624-14,A,High
...

I'm trying to extract both the 'Sample' ID and the 'Run'.

I read the csv spreadsheet into a Dictionary using the in built reader, but I'm having trouble extracting the elements I am interested in.

import csv
import sys

# sequences of interest
dataset=sys.argv[1]

# CSV spreadsheet
database=sys.argv[2]

sampleIDs=[]
with open(dataset, 'r') as file:
	for line in file:
		line.strip('\n')
		sampleIDs.append(line)
file.close()

seq_Dict=[]
finalList=['init']


with open(database, 'rb') as csvfile:
	reader=csv.DictReader(csvfile, delimiter='\t')
	for line in reader:
		seq_Dict.append(line)
csvfile.close()


for element in seq_Dict:
	for key, value in element.items():
		if element['Sample'] in sampleIDs:
			finalList.pop()
			finalList.append(element['Sample']+" "+element['Run'])

for i in finalList:
	print(i)

This script returns the info of the last ID in my sampleIDs, so I can see that what is occurring during the loop is being overwriting the previous iteration.
So I did try to use deepcopy but that didn't seem to work.

It's not overwriting, you are removing the last iteration before you add a new one. The pop method removes the last item of the list. You keep removing an item and adding an item (lines 31 + 32), so you end up with one item. Remove line 31 and it should work.

You can also remove line 25. The with statement on line 21 takes care of that.

(Sep-28-2018, 12:07 PM)ichabod801 Wrote: [ -> ]It's not overwriting, you are removing the last iteration before you add a new one. The pop method removes the last item of the list. You keep removing an item and adding an item (lines 31 + 32), so you end up with one item. Remove line 31 and it should work.

You can also remove line 25. The with statement on line 21 takes care of that.

On the contrary, by removing the pop method it still returns the last sample ID, the only difference is that it's repeated by the number of key-values there are in the dictionary i.e.
S01933-11 r480
S01933-11 r480
S01933-11 r480
S01933-11 r480
S01933-11 r480
S01933-11 r480
S01933-11 r480
S01933-11 r480
S01933-11 r480

Okay, this bit:

for element in seq_Dict:
    for key, value in element.items():
        if element['Sample'] in sampleIDs:
            finalList.append(element['Sample']+" "+element['Run'])

The second for loop is not necessary. What the above code does is for every key in element, it checks element and appends the sample and run. If you get rid of the second for loop, it will just check each element once.

I think there may only be one matching element in the data that matches your filter, and the above issue is why it is repeated. But I can't check that without a (small) sample of the data and what you are passing to sampleIDs.

Hey apologies for late reply, I'm not getting email alerts. I'm going to try what you said, although I'd like to point out why I created the second loop. It's because I don't know what the syntax is to access a particular key in a list of dictionaries.
For example, I know that there is the syntax:

arrayofDict[0]['key']

But this will hone in on only the first element of the list and won't grant access to all the dictionaries in the list. I'm trying to cycle through the list of dictionaries and print out the key-value of a particular key.

if seq_Dict is your list of dictionaries, that's what your first for loop does. Each time through the loop, element is the next dict in seq_Dict.

Yeah, but the question is, if every dictionary in the list has the same keys-values (structure), can one exclusively access and retrieve the value of the key you're interested in and only that key.

Sure:

for each_dict in a_list:
     print(each_dict[key])

Most people would do this as a list comprehension:

[each_dict[key] for each_dict in a_list]

Hmm, let me see..

a_list=[{"Sample" : "A-15", "Run" : "n47", "quality" : "good" }, 
{"Sample" : "B-04", "Run" : "n45", "quality" : "good"}, 
{"Sample" : "C-10", "Run" : "n48", "quality" : "bad"}, 
{"Sample" : "Z-95", "Run" : "n47", "quality" : "good" },]

sampleIDs=['A-15', 'B-04', 'C-10']

for each_dict in a_list:
	if each_dict['Sample'] in sampleIDs:
		print(each_dict['Sample']+" "+each_dict['Run'])

So if I run this^ I expect to get:
A-15 n47
B-04 n45
C-10 n48

but instead I get:
C-10 n48

Is this because I'm overwriting the operation with each iteration?
If so how can I avoid doing that?

Pages: 1 2

Mr_Keystrokes

ichabod801

Mr_Keystrokes

ichabod801

Mr_Keystrokes

ichabod801

Mr_Keystrokes

ichabod801

Mr_Keystrokes

Mr_Keystrokes