Python Forum
Manipulating data from a CSV - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Manipulating data from a CSV (/thread-27580.html)



Manipulating data from a CSV - EvanS1 - Jun-11-2020

Hi Feel like I am missing something simple here.

I am importing data from a csv file. I want to create a number of lists to use elsewhere.

foe row in database:
    if row[0] == "2010" and row[2] == D1:
        mylist1.append(row[3])
something like this works for 1 set of variables and I could copy and paste editing the if statement each time but there are too many options for that. Nesting the for loop so the row() looks at and steps through a list is half a solution but I cant work out how to create a new list for each iteration (mylist1, mylist2 etc)

For scale its about 15 unique variables in each part of the if statement and 100000 rows of data total.

Thank you


RE: Manipulating data from a CSV - bowlofred - Jun-11-2020

I'm not sure what the concern is? What would be too difficult about repeating this within your loop?

It's certainly possible to do this with comprehensions and no explicit loops, but that may or may not address your concerns.

database = [
        ["2010", "X", "D1", "info"],
        ["2012", "X", "D1", "wrong year"],
        ["2010", "X", "E2", "wrong spec"],
        ["2010", "Y", "D1", "info2"],
        ]


mylist = [row[3] for row in database if row[0] == "2010" and row[2] == "D1"]
print(mylist)
mylist2 = [row[3] for row in database if  # multiple lines if you have lots of conditionals.
              row[0] == "2010" and
              row[2] == "D1" and
              row[1] == "X"
              ]
print(mylist2)
Output:
['info', 'info2'] ['info']



RE: Manipulating data from a CSV - EvanS1 - Jun-12-2020

It will work to repeat the code but I have a lot of variables so going to end up with 15 versions of my list (possibly more). That's a lot of code and a lot of copy and paste feel like there must be a way to do it better. A "for" loop and iterating through the changing variables would do part of it, but would overwrite the list rather than putting it into a new list.


RE: Manipulating data from a CSV - perfringo - Jun-12-2020

Some questions for clarification:

- are all the csv files similarly structured
- does csv files have header row

My preliminary feeling is that one option could be reading file with csv.DictReader into list and create utility function for filtering needed records and/or their fields.


RE: Manipulating data from a CSV - EvanS1 - Jun-12-2020

(Jun-12-2020, 11:28 AM)perfringo Wrote: Some questions for clarification:

- are all the csv files similarly structured
- does csv files have header row

My preliminary feeling is that one option could be reading file with csv.DictReader into list and create utility function for filtering needed records and/or their fields.

Yes the CSV has a header row I'm using csv.DictReader to get it I dropped the headers out of the example to try and make it simpler for other people to read (the code is on a machine without internet access so I had to type across). It is one big csv file bowlofred got the database structure fairly close in his reply.the only difference is the final row is a time string. (there are a load more columns but I don't care about those)

database = [
        ["2010", "X", "D1", "01:15:41"],
        ["2012", "X", "D1", "00:17:51"],
        ["2010", "X", "D2", "00:25:41"],
        ["2010", "Y", "D1", "00:15:21"],
        ]
So each list should be a list of times based on the other fields
##for 2010 and D1 output should be 
mylist = ["01:15:41", "00:15:21"]
##for 2010 and D2 output should be
mylist2 = ["00:25:41"] 



RE: Manipulating data from a CSV - perfringo - Jun-12-2020

I probably don't grasp the whole problem but I tried based on my understanding following:

I made dummy file filein_1.csv with following data:

Output:
Year,Month,Day,Amount 2010,January,20,100 2010,January,21,200 2011,January,2,300
I read it into list of dictionaries, created utility function for filtering and then tried several filtering options:

import csv


with open('filein_1.csv') as f:
    data = list(csv.DictReader(f))


def filter_row(data, **kwargs):
    for row in data:
        for key, value in kwargs.items():
            try:
                if row[key] != value:
                    break
            except KeyError:       # if kwargs supplied have key not present in data
                break
        else:
            yield row['Amount']


amounts = list(filter_row(data, Year='2010', Month='January'))
# ['100', '200']

years = [list(filter_row(data, Year=year, Month='January')) for year in ['2010', '2011']]
# [['100', '200'], ['300']]