Manipulating data from a CSV

EvanS1 · Jun-11-2020, 08:25 PM

Hi Feel like I am missing something simple here.

I am importing data from a csv file. I want to create a number of lists to use elsewhere.

foe row in database:
    if row[0] == "2010" and row[2] == D1:
        mylist1.append(row[3])

something like this works for 1 set of variables and I could copy and paste editing the if statement each time but there are too many options for that. Nesting the for loop so the row() looks at and steps through a list is half a solution but I cant work out how to create a new list for each iteration (mylist1, mylist2 etc)

For scale its about 15 unique variables in each part of the if statement and 100000 rows of data total.

Thank you

bowlofred · (This post was last modified: Jun-11-2020, 08:54 PM by bowlofred.)

I'm not sure what the concern is? What would be too difficult about repeating this within your loop?

It's certainly possible to do this with comprehensions and no explicit loops, but that may or may not address your concerns.

database = [
        ["2010", "X", "D1", "info"],
        ["2012", "X", "D1", "wrong year"],
        ["2010", "X", "E2", "wrong spec"],
        ["2010", "Y", "D1", "info2"],
        ]


mylist = [row[3] for row in database if row[0] == "2010" and row[2] == "D1"]
print(mylist)
mylist2 = [row[3] for row in database if  # multiple lines if you have lots of conditionals.
              row[0] == "2010" and
              row[2] == "D1" and
              row[1] == "X"
              ]
print(mylist2)

Output:['info', 'info2']
['info']

EvanS1 · Jun-12-2020, 08:25 AM

It will work to repeat the code but I have a lot of variables so going to end up with 15 versions of my list (possibly more). That's a lot of code and a lot of copy and paste feel like there must be a way to do it better. A "for" loop and iterating through the changing variables would do part of it, but would overwrite the list rather than putting it into a new list.

**perfringo** · Jun-12-2020, 11:28 AM

Some questions for clarification:

- are all the csv files similarly structured
- does csv files have header row

My preliminary feeling is that one option could be reading file with csv.DictReader into list and create utility function for filtering needed records and/or their fields.

EvanS1 · (This post was last modified: Jun-12-2020, 03:01 PM by EvanS1.)

(Jun-12-2020, 11:28 AM)perfringo Wrote: Some questions for clarification:

- are all the csv files similarly structured
- does csv files have header row

My preliminary feeling is that one option could be reading file with csv.DictReader into list and create utility function for filtering needed records and/or their fields.

Yes the CSV has a header row I'm using csv.DictReader to get it I dropped the headers out of the example to try and make it simpler for other people to read (the code is on a machine without internet access so I had to type across). It is one big csv file bowlofred got the database structure fairly close in his reply.the only difference is the final row is a time string. (there are a load more columns but I don't care about those)

database = [
        ["2010", "X", "D1", "01:15:41"],
        ["2012", "X", "D1", "00:17:51"],
        ["2010", "X", "D2", "00:25:41"],
        ["2010", "Y", "D1", "00:15:21"],
        ]

So each list should be a list of times based on the other fields

##for 2010 and D1 output should be 
mylist = ["01:15:41", "00:15:21"]
##for 2010 and D2 output should be
mylist2 = ["00:25:41"]

**perfringo** · (This post was last modified: Jun-12-2020, 05:59 PM by perfringo.)

I probably don't grasp the whole problem but I tried based on my understanding following:

I made dummy file filein_1.csv with following data:

Output:Year,Month,Day,Amount
2010,January,20,100
2010,January,21,200
2011,January,2,300

I read it into list of dictionaries, created utility function for filtering and then tried several filtering options:

import csv


with open('filein_1.csv') as f:
    data = list(csv.DictReader(f))


def filter_row(data, **kwargs):
    for row in data:
        for key, value in kwargs.items():
            try:
                if row[key] != value:
                    break
            except KeyError:       # if kwargs supplied have key not present in data
                break
        else:
            yield row['Amount']


amounts = list(filter_row(data, Year='2010', Month='January'))
# ['100', '200']

years = [list(filter_row(data, Year=year, Month='January')) for year in ['2010', '2011']]
# [['100', '200'], ['300']]

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	manipulating two lists	rancans	8	4,602	Apr-16-2020, 06:00 PM Last Post: deanhystad
	Manipulating index value, what is wrong with this code?	Emun	1	2,423	Feb-05-2020, 07:18 AM Last Post: perfringo
	Manipulating the filename of an output script	mckinneycm	4	13,092	Jan-15-2020, 07:29 PM Last Post: mckinneycm
	Manipulating Excel with Python.	Spacely	2	5,213	Jun-25-2019, 01:57 AM Last Post: Dequanharrison
	Manipulating CSV	Prince_Bhatia	1	2,488	Apr-25-2019, 11:55 AM Last Post: Gribouillis
	Reading and manipulating csv	Prince_Bhatia	11	7,076	Mar-14-2019, 11:40 AM Last Post: Larz60+
	Manipulating an Excel Workbook	Stanimal	4	4,196	Jan-18-2019, 11:03 PM Last Post: Stanimal
	Manipulating Binary Data	arsenal88	10	10,431	Apr-25-2017, 02:30 PM Last Post: snippsat
	Manipulating files Python 2.7	hugobaur	6	11,285	Nov-01-2016, 12:28 PM Last Post: hugobaur

Manipulating data from a CSV

User Panel Messages

Announcements