Posts: 7
Threads: 4
Joined: Apr 2020
Hi Feel like I am missing something simple here.
I am importing data from a csv file. I want to create a number of lists to use elsewhere.
foe row in database:
if row[0] == "2010" and row[2] == D1:
mylist1.append(row[3]) something like this works for 1 set of variables and I could copy and paste editing the if statement each time but there are too many options for that. Nesting the for loop so the row() looks at and steps through a list is half a solution but I cant work out how to create a new list for each iteration (mylist1, mylist2 etc)
For scale its about 15 unique variables in each part of the if statement and 100000 rows of data total.
Thank you
Posts: 1,583
Threads: 3
Joined: Mar 2020
Jun-11-2020, 08:54 PM
(This post was last modified: Jun-11-2020, 08:54 PM by bowlofred.)
I'm not sure what the concern is? What would be too difficult about repeating this within your loop?
It's certainly possible to do this with comprehensions and no explicit loops, but that may or may not address your concerns.
database = [
["2010", "X", "D1", "info"],
["2012", "X", "D1", "wrong year"],
["2010", "X", "E2", "wrong spec"],
["2010", "Y", "D1", "info2"],
]
mylist = [row[3] for row in database if row[0] == "2010" and row[2] == "D1"]
print(mylist)
mylist2 = [row[3] for row in database if # multiple lines if you have lots of conditionals.
row[0] == "2010" and
row[2] == "D1" and
row[1] == "X"
]
print(mylist2) Output: ['info', 'info2']
['info']
Posts: 7
Threads: 4
Joined: Apr 2020
It will work to repeat the code but I have a lot of variables so going to end up with 15 versions of my list (possibly more). That's a lot of code and a lot of copy and paste feel like there must be a way to do it better. A "for" loop and iterating through the changing variables would do part of it, but would overwrite the list rather than putting it into a new list.
Posts: 1,950
Threads: 8
Joined: Jun 2018
Some questions for clarification:
- are all the csv files similarly structured
- does csv files have header row
My preliminary feeling is that one option could be reading file with csv.DictReader into list and create utility function for filtering needed records and/or their fields.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 7
Threads: 4
Joined: Apr 2020
Jun-12-2020, 03:01 PM
(This post was last modified: Jun-12-2020, 03:01 PM by EvanS1.)
(Jun-12-2020, 11:28 AM)perfringo Wrote: Some questions for clarification:
- are all the csv files similarly structured
- does csv files have header row
My preliminary feeling is that one option could be reading file with csv.DictReader into list and create utility function for filtering needed records and/or their fields.
Yes the CSV has a header row I'm using csv.DictReader to get it I dropped the headers out of the example to try and make it simpler for other people to read (the code is on a machine without internet access so I had to type across). It is one big csv file bowlofred got the database structure fairly close in his reply.the only difference is the final row is a time string. (there are a load more columns but I don't care about those)
database = [
["2010", "X", "D1", "01:15:41"],
["2012", "X", "D1", "00:17:51"],
["2010", "X", "D2", "00:25:41"],
["2010", "Y", "D1", "00:15:21"],
] So each list should be a list of times based on the other fields
##for 2010 and D1 output should be
mylist = ["01:15:41", "00:15:21"]
##for 2010 and D2 output should be
mylist2 = ["00:25:41"]
Posts: 1,950
Threads: 8
Joined: Jun 2018
Jun-12-2020, 05:59 PM
(This post was last modified: Jun-12-2020, 05:59 PM by perfringo.)
I probably don't grasp the whole problem but I tried based on my understanding following:
I made dummy file filein_1.csv with following data:
Output: Year,Month,Day,Amount
2010,January,20,100
2010,January,21,200
2011,January,2,300
I read it into list of dictionaries, created utility function for filtering and then tried several filtering options:
import csv
with open('filein_1.csv') as f:
data = list(csv.DictReader(f))
def filter_row(data, **kwargs):
for row in data:
for key, value in kwargs.items():
try:
if row[key] != value:
break
except KeyError: # if kwargs supplied have key not present in data
break
else:
yield row['Amount']
amounts = list(filter_row(data, Year='2010', Month='January'))
# ['100', '200']
years = [list(filter_row(data, Year=year, Month='January')) for year in ['2010', '2011']]
# [['100', '200'], ['300']]
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
|