Python Forum
Manipulating data from a CSV
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Manipulating data from a CSV
#1
Hi Feel like I am missing something simple here.

I am importing data from a csv file. I want to create a number of lists to use elsewhere.

foe row in database:
    if row[0] == "2010" and row[2] == D1:
        mylist1.append(row[3])
something like this works for 1 set of variables and I could copy and paste editing the if statement each time but there are too many options for that. Nesting the for loop so the row() looks at and steps through a list is half a solution but I cant work out how to create a new list for each iteration (mylist1, mylist2 etc)

For scale its about 15 unique variables in each part of the if statement and 100000 rows of data total.

Thank you
Reply
#2
I'm not sure what the concern is? What would be too difficult about repeating this within your loop?

It's certainly possible to do this with comprehensions and no explicit loops, but that may or may not address your concerns.

database = [
        ["2010", "X", "D1", "info"],
        ["2012", "X", "D1", "wrong year"],
        ["2010", "X", "E2", "wrong spec"],
        ["2010", "Y", "D1", "info2"],
        ]


mylist = [row[3] for row in database if row[0] == "2010" and row[2] == "D1"]
print(mylist)
mylist2 = [row[3] for row in database if  # multiple lines if you have lots of conditionals.
              row[0] == "2010" and
              row[2] == "D1" and
              row[1] == "X"
              ]
print(mylist2)
Output:
['info', 'info2'] ['info']
Reply
#3
It will work to repeat the code but I have a lot of variables so going to end up with 15 versions of my list (possibly more). That's a lot of code and a lot of copy and paste feel like there must be a way to do it better. A "for" loop and iterating through the changing variables would do part of it, but would overwrite the list rather than putting it into a new list.
Reply
#4
Some questions for clarification:

- are all the csv files similarly structured
- does csv files have header row

My preliminary feeling is that one option could be reading file with csv.DictReader into list and create utility function for filtering needed records and/or their fields.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
(Jun-12-2020, 11:28 AM)perfringo Wrote: Some questions for clarification:

- are all the csv files similarly structured
- does csv files have header row

My preliminary feeling is that one option could be reading file with csv.DictReader into list and create utility function for filtering needed records and/or their fields.

Yes the CSV has a header row I'm using csv.DictReader to get it I dropped the headers out of the example to try and make it simpler for other people to read (the code is on a machine without internet access so I had to type across). It is one big csv file bowlofred got the database structure fairly close in his reply.the only difference is the final row is a time string. (there are a load more columns but I don't care about those)

database = [
        ["2010", "X", "D1", "01:15:41"],
        ["2012", "X", "D1", "00:17:51"],
        ["2010", "X", "D2", "00:25:41"],
        ["2010", "Y", "D1", "00:15:21"],
        ]
So each list should be a list of times based on the other fields
##for 2010 and D1 output should be 
mylist = ["01:15:41", "00:15:21"]
##for 2010 and D2 output should be
mylist2 = ["00:25:41"] 
Reply
#6
I probably don't grasp the whole problem but I tried based on my understanding following:

I made dummy file filein_1.csv with following data:

Output:
Year,Month,Day,Amount 2010,January,20,100 2010,January,21,200 2011,January,2,300
I read it into list of dictionaries, created utility function for filtering and then tried several filtering options:

import csv


with open('filein_1.csv') as f:
    data = list(csv.DictReader(f))


def filter_row(data, **kwargs):
    for row in data:
        for key, value in kwargs.items():
            try:
                if row[key] != value:
                    break
            except KeyError:       # if kwargs supplied have key not present in data
                break
        else:
            yield row['Amount']


amounts = list(filter_row(data, Year='2010', Month='January'))
# ['100', '200']

years = [list(filter_row(data, Year=year, Month='January')) for year in ['2010', '2011']]
# [['100', '200'], ['300']]
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  manipulating two lists rancans 8 3,188 Apr-16-2020, 06:00 PM
Last Post: deanhystad
  Manipulating index value, what is wrong with this code? Emun 1 1,751 Feb-05-2020, 07:18 AM
Last Post: perfringo
  Manipulating the filename of an output script mckinneycm 4 11,882 Jan-15-2020, 07:29 PM
Last Post: mckinneycm
  Manipulating Excel with Python. Spacely 2 3,639 Jun-25-2019, 01:57 AM
Last Post: Dequanharrison
  Manipulating CSV Prince_Bhatia 1 1,959 Apr-25-2019, 11:55 AM
Last Post: Gribouillis
  Reading and manipulating csv Prince_Bhatia 11 5,084 Mar-14-2019, 11:40 AM
Last Post: Larz60+
  Manipulating an Excel Workbook Stanimal 4 3,003 Jan-18-2019, 11:03 PM
Last Post: Stanimal
  Manipulating Binary Data arsenal88 10 8,620 Apr-25-2017, 02:30 PM
Last Post: snippsat
  Manipulating files Python 2.7 hugobaur 6 8,444 Nov-01-2016, 12:28 PM
Last Post: hugobaur

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020