Cut .csv to pieces and save as .csv

BruderKellermeister · (This post was last modified: Feb-13-2017, 09:19 AM by snippsat.)

Hi folks!

I am quite new to programming / python - the journey is fun but can be frustrating at times...

I have a .csv file about 300 columns. The first column has the date. The following columns contain my Objects.
I want to create separate .csv files with two columns: 1. Date; 2. Object

I want to tell Python:
Always take column 1 and one object-column and generate a .csv file from those two columns.
Then take column 1 (date) again and move on to the next object-column and generate the next .csv file.
Name the files after their object column.

So far i've got this:

import csv
object = 0

f = open("/home/myComputer/Data/data.csv")

csv_f = csv.reader(f, delimiter=';', quotechar='|')

writer = csv.writer(csv_f)

for row in csv_f:
   writer("[0],[object]")

   object += 1

f.close()

Thank you very much for your help on this! Pray

(If I don't solve the problem I will have to do this by hand with excel - which gives me the creeps)

wavic · Feb-13-2017, 09:29 AM

Just get the index of the first column and a slice for the second from the list generated for each .csv row.

csv_f[0] # write it to a file
csv_f[1:] # here is the object. Dump it to another .csv

**buran** · Feb-13-2017, 09:31 AM

It's not clear if your file has headers and what are the names of the output files.
The general approach would be:
1. Open the input file for read
2. Open all output files to write. Store these file handlers in list
3. Loop over the lines in input file. Write date/column to respective.

as an alternative:

2a. Loop over the lines in input file. For each line open the respective output file for append.
3a. Write date and respective column, then close the file. This would be more inefficient in terms that open/close file each time takes time.

merlem · (This post was last modified: Feb-13-2017, 10:07 AM by merlem.)

(Feb-13-2017, 09:29 AM)wavic Wrote: Just get the index of the first column and a slice for the second from the list generated for each .csv row.

If I understood correctly, he wish to have _one_ file for the _respective_ colum, that is: not a slice, but just this one column, and the next colum in another file?

And if the data amount is not too big, maybe one could do the datasplitting on a list generated from the source file and do the writing later as 'in one go' for each file?

Edit:
Something like this? I'm not sure how efficient it is for big data amounts, but for a small set, it seems to do what it should (except that it makes a needless file with the dates two times).

import csv
 
f = open("testcsv.csv")
csv_f = csv.reader(f, delimiter=',')

firstlinedone = False
listoffiles = []
for line in csv_f:

    # for the first line, make all files
    if firstlinedone == False:
        for columnumber, column in enumerate(line):
            newfilename = "test" + str(columnumber) + ".csv"
            # one can also do more formatting here for the filename
            newfile = open(newfilename, 'wb')
            listoffiles.append(newfile)
            firstlinedone == True
    for columnumber, column in enumerate(line):
        csvwriter = csv.writer(listoffiles[columnumber], delimiter = ',')
        csvwriter.writerow([line[0], line[columnumber]])

for anyfile in listoffiles:
    anyfile.close()
f.close()

wavic · Feb-13-2017, 10:13 AM

He want to write the first column to a file and all the rest to another .csv. If I am not wrong too.

BruderKellermeister · Feb-13-2017, 10:47 AM

Thank you all very much for thinking about my problem! That is very helpful to me!

I'm sorry I didn´t get it clearer:

I have 1 big list with headers.
To make things complicated, there are 2 samples per tree with the same header. Like displayed below - treenumber 578 has two columns.

Name: HappyTrees.csv

"Date" | "578" | "578" | "579" | "579"
------------------------------------------------
2012 | 7,34 | 4,34 | 9,65 | 5,34
------------------------------------------------
2011 | 5,23 | 4,75 | 8,63 | 3,46
------------------------------------------------
2010 | 3,45 | 3,78 | 6,34 | 5,82
-----------------------------------------------
2009 | 5,83 | 6,28 | 8,24 | 5,42

What I need:

1. Name: 578a.csv

"Date" | "578" |
------------------------------------------------
2012 | 7,34 |
------------------------------------------------
2011 | 5,23 |
------------------------------------------------
2010 | 3,45 |
-----------------------------------------------
2009 | 5,83 |

2. Name: 578b.csv

"Date" | "578" |
---------------------
2012 | 4,34 |
----------------------
2011 | 4,75 |
---------------------
2010 | 3,78 |
---------------------
2009 | 6,28 |

3. Name: 579a.csv

"Date" | "579" |
----------------------
2012 | 9,65 |
----------------------
2011 | 8,63 |
-----------------------
2010 | 6,34 |
---------------------
2009 | 8,24 |

4. Name: 579b.csv

5. Name: 580a.csv

6. Name: 580b.csv

........... and so on......... and so on.........

merlem · Feb-13-2017, 11:14 AM

That would make the filenaming a bit more complex, but as far as I can see, it's really just a question of string construction.
You can take "column" instead of "columnumber" and add "a" or "b" before the ".csv". To decide which of "a" or "b", you can check whether the string with "a" is already in the list.
Are you already familiar with this kind of things?

Oh, and there's a failure in the code above Blush

. In line 17, there has to be only one "=", not two.

BruderKellermeister · (This post was last modified: Feb-13-2017, 12:25 PM by BruderKellermeister.)

thanks a lot for your help.

Since I picked up Python (actually: programming) just a few weeks ago, Im only familiar with the very basic concepts.
As for example Datatypes, very simple while / for - loops and so on.
My Material is a book (1100 pages) + internet research (stackoverflow etc.) + youtube.

I really do want to try to improve and also try to invest time in research before asking - but as soon as I start writing a program I have lots of questions and it seems as if research usually only leads to more questions. Blush

Any ideas on how to approach getting better at this?

when trying to run your code from above I get:

csvwriter.writerow([line[0], line[columnumber]])

TypeError: a bytes-like object is required, not 'str'

merlem · Feb-13-2017, 12:54 PM

(Feb-13-2017, 12:07 PM)BruderKellermeister Wrote: when trying to run your code from above I get:

csvwriter.writerow([line[0], line[columnumber]])

TypeError: a bytes-like object is required, not 'str'

That's strange. For me, the code runs without this error. I'm not sure whether that's relevant in this case, but I am using python in the version 2.7. If you have version 3, that can cause incompatibilities.
And I have to confess that I don't understand this message... the expression in the brackets should be a list, not a string. But 'a bytes-like object', I'm not sure what is included in that and what isn't.
Maybe somone else can help there?

According to the code: you may try to replace the old 'firstlinedone'-part with:

        for columnumber, column in enumerate(line):
            newfilename = "test" + str(column) + "a.csv"
            if newfilename in listoffilenames:
                newfilename = "test" + str(column) + "b.csv"
                listoffilenames.append(newfilename)
            else:
                listoffilenames.append(newfilename)
            newfile = open(newfilename, 'wb')
            listoffiles.append(newfile)
            firstlinedone = True

Quote:Any ideas on how to approach getting better at this?

Not really, except maybe: try to get stuff done that you need for something else. It keeps up motivation Wink

. But learning programming remains hard and difficult for a long time. Don't give up too easily!

**buran** · Feb-13-2017, 03:05 PM

it will be more convinient to use DictReader and DictWriter

import csv

with open("/home/myComputer/Data/testcsv.csv") as f:
   csv_f = csv.DictReader(f, delimiter='|')
   filenames = zip(csv_f.fieldnames, 'ab'*(len(csv_f.fieldnames)/2))
   for field, suffix in filenames:
       with open('/home/myComputer/Data/{}{}.csv'.format(field, suffix), 'w') as f_out:
           header = ['Date', field]
           csv_wr = csv.DictWriter(f_out, fieldnames=header, delimiter='|', lineterminator='\n')
           csv_wr.writeheader()
   for line in csv_f:
       for field, suffix in filenames:
           with open('/home/myComputer/Data/{}{}.csv'.format(field, suffix), 'a') as f_out:
               header = ['Date', field]
               csv_wr = csv.DictWriter(f_out, fieldnames=header, delimiter = '|',
                                       extrasaction='ignore', lineterminator='\n')
               csv_wr.writerow(line)

Also, if you like to venture into pandas, the script could be simpler

import pandas as pd

df = pd.read_csv('/home/myComputer/Data/testcsv.csv', sep='|')
for col in df.columns[1:]:
   if col.endswith('.1'):
       fname = '/home/myComputer/Data/{}.csv'.format(col.replace('.1', 'b'))
   else:
       fname = '/home/myComputer/Data/{}a.csv'.format(col)
   df.to_csv(fname, sep='|', columns=['Date', col], index=False,
              header=['Date', col.replace('.1', '')])

Cut .csv to pieces and save as .csv

User Panel Messages

Announcements