Cut .csv to pieces and save as .csv

merlem · Feb-13-2017, 08:15 PM

About the error: It is a question of using version 2 or 3, and other people had the problem, too. There are a couple of suggestions at http://stackoverflow.com/questions/72006...-csv-files .
Simply eliminate the 'b' from opening the newfile:

newfile = open(newfilename, 'w')

Then it should run, I hope.

BruderKellermeister · (This post was last modified: Feb-15-2017, 04:22 PM by BruderKellermeister.)

merlem you are my hero: this is actually starting to work! Cool

I still have some minor problems with it but I want to try solving them first.

If that does not work out well I will happily ask for further help.

Thanks a lot!

edit: Yes, i am using Python3

**buran** · Feb-15-2017, 04:34 PM

Did you try the two snippets from my last post?

BruderKellermeister · Feb-16-2017, 10:54 AM

Hi Buran,

I've been trying to get pandas to run since the studd i've read about it sounds like it might be very usefull for me along the road...

I have pandas installed via anaconda and "import pandas" works (gives no error) via the Linux Terminal.

However - if i try running a .py via pyCharm it gives me: "ImportError: No module named 'pandas' " so at the moment Im a bit stuck...

**buran** · Feb-16-2017, 11:03 AM

Check PyCharm settings - that the project is set up to use the correct interpreter (the one with pandas installed). Linux comes with Python pre-installed and if you installed also Anaconda, I would assume you have at least 2 (and most probably 3) Python installations

Also, my other snippet does not use pandas, just the csv module.

BruderKellermeister · (This post was last modified: May-04-2017, 10:45 AM by Larz60+.)

Hi Merlem and all the other folks!

Ok, so this is as far as I am.

Your script is cutting the list to the right pieces. Only the naming of the files is not quite right yet. I sort of need "518a", "518b", "519a", "519b".....

Since I do realize that this might be a bit too challenging for a newbie, I would really like to ask some Questions:

import csv

f = open("/home/herbert/PycharmProjects/2017-02-10 CutList/Kastanie_Jahrringe.csv")
csv_f = csv.reader(f, delimiter=';')

firstlinedone = False <= why this? Probably starting parameter, right?
listoffiles = [] <= make a empty list of files, right?
for line in csv_f: <= cycle through every value of the csv

   # for the first line, make all files <= how to get 1a, 1b, 2a, 2b, 3a, 3b etc.  maybe a List with two objects a and b?
   if firstlinedone == False: <= as long as the first line goes on keep going

       for columnumber, column in enumerate(line): <= now you're loosing me: columnnumber AND column? And enumerate = count lines?

           newfilename = "Baumprobe" + str(columnumber) + ".csv" <= now im on the road again... I like this stuff
           # one can also do more formatting here for the filename

           newfile = open(newfilename, "w") <= open new files in mode write which means create files

           listoffiles.append(newfile) <= always append the next new file to listoffiles

           firstlinedone == True <= if however, you are done with the first line, then:

   for columnumber, column in enumerate(line): <= still not getting this line

       csvwriter = csv.writer(listoffiles[columnumber], delimiter=',') <= i "sort of" get this, but not in detail: define csvwriter as a method which takes the columns and cuts them by delimiter?

       csvwriter.writerow([line[0], line[columnumber]]) <= why now write again?

for anyfile in listoffiles: <= go through all the objects in listoffiles and close them
   anyfile.close()
f.close() <= close all functions
[color=#a9b7c6][font=Source Code Pro]

[/font][/color]

(Feb-16-2017, 11:03 AM)buran Wrote: Check PyCharm settings - that the project is set up to use the correct interpreter (the one with pandas installed). Linux comes with Python pre-installed and if you installed also Anaconda, I would assume you have at least 2 (and most probably 3) Python installations

Also, my other snippet does not use pandas, just the csv module.

thanks - I will check on that.

Via PyCharm settings it only shows me python 2.7 (not selected) and python 3.5 (selected) but im sure ill make it work if I try on!

(Feb-16-2017, 11:03 AM)buran Wrote: Also, my other snippet does not use pandas, just the csv module.

Thank you - I will test the script and will give you feedback on how its working!

PyCharm seems to be working with Anaconda (python 3.6) now.

On your script it gives me:

Error:Traceback (most recent call last):
  
File "/home/herbert/Pycharm_Projekte/2017-02-10 CutList/CutList_Buran.py", line 5, in <module>
    filenames = zip(csv_f.fieldnames, 'ab' * (len(csv_f.fieldnames) / 2))

TypeError: can't multiply sequence by non-int of type 'float'

Im guessing python does'nt like float as a result of len/2.....?

**buran** · Feb-16-2017, 12:11 PM

sorry, my code is Python2, where that would be int.
change that to

filenames = zip(csv_f.fieldnames, 'ab' * int(len(csv_f.fieldnames) / 2))

Also you may want to check this
https://docs.continuum.io/anaconda/ide_i...on#pycharm
to setup conda with PyCharm

merlem · (This post was last modified: Feb-16-2017, 12:25 PM by merlem.)

Well, let's see:

firstlinedone: it's a variable to see whether the files are created, as this is done during the course of the processing the first line. In all other cases, file creation shall be omitted. As the variable is never set to False again, the state of firstlindone is fixed after the first loop. One could leave out this line and check for the pure existence of this variable and create it after the file preparation is done, but I prefer to have it explicitely.

The next three lines and the loop in principle are okay.

First in the firstline-block, I use enumerate. This return always a tuple, that is a set of two values. With for columnumber, column in enumerate(line) I tell Python to assign the first entry of this tuple with the name columnumber and the second one with the name column. The first value that enumerate gives is a number that is simply the position of the item column in the list with the name line. So, columnumber is equivalent toline.index(column).
In this case, enumerate does not count the lines, but for the line, it counts the items that are stored in it. And here in particular: in the first line.
Then, for every columnnumber, a file is created.

Now, the filename. I had a replacement for the 'firstline'-block in this post.There, the name of the file is derived from what is found in the variable column, and if an file with the end "a" is already found in the listoffilenames, the name is set to end with b.(Oh... how embarassing Sad

... just now I see that I should have mentioned: it's necessary to create the list listoffilenames first... Blush

. It's supposed to be done next to the creation of the listoffiles-list.)
It would be more 'pythonic' to use string.format() here, but I find it more difficult to read, so for the beginning I hope that this construction is okay. The same name as you are creating actually would appear from newfilename ="Baumprobe{0}.csv".format([str(columnumber)]) (untested, however, hope that I made no mistake).
Or, if 'a' and 'b' together with the columtitle shall be the name of the new file:
newfilename ="Baumprobe{0}a.csv".format([str(column)]) or newfilename ="Baumprobe{0}b.csv".format([str(column)]).

The next lines are okay until an instance of csvwriter is created. I am sorry the I can't give an explanation for this, I understand this process not well enough by myself. I just accepted that this is necessary. I hope that someone else can fill this gap.

However, ths line csvwriter = csv.writer(listoffiles[columnumber], delimiter=',') does not write somthing itself. It just prepares the csvwriter for doing something according to the settings that are set here. And this 'something' in our case is the writing in the next line.

When the cycling through all files is done, they can be closed.
And then, not the functions are closed, but f - and that is the file opened in the very first line. Until now, it had to be open for reading.

BruderKellermeister · May-01-2017, 04:49 PM

Ok I just have to add one more time how thankful I am that you guys helped me so well on this. I finally went with the pandas solution and i like it a lot.

Saved me days of copy pasting like a retarded chimp.

I'll be happy in coming back to this source for more advice along the road!

So guys: keep up the good work - and thanks again!

Pray

BruderKellermeister · (This post was last modified: May-04-2017, 10:13 AM by buran.)

so, this is my code so far, and it's working pretty fine (thanks to buran! Cool

)

One problem still remains:

My original .csv has Numbers and NA's at the positions without values - but only the numbers get exported to the cut up files...

Unfortunately I need both numbers AND NA's in my cut up files...

Any suggestions?

import pandas as pd

df = pd.read_csv('/home/Cutting Excel/Kastanie_Jahrringe.csv', sep=';')
for col in df.columns[1:]:
   if col.endswith('.1'):
       fname = '/home/Cutting Excel/Cut Pieces/{}.csv'.format(col.replace('.1', 'b'))
   else:
       fname = '/home/Cutting Excel/Cut Pieces/{}a.csv'.format(col)
   df.to_csv(fname, sep='\t', columns=['YEAR', col], index=False,
             header=['YEAR', col.replace('.1', '')])

I think the answer might be here:

http://pandas.pydata.org/pandas-docs/sta...d_csv.html

I've tried adding the argument na_values='NA' but that doesn't seem to work...

ok, now it works.

For the record and others, that have the same problem:

http://pandas.pydata.org/pandas-docs/sta...o_csv.html

in df.to_csv I added: na_rep='NA'

Cut .csv to pieces and save as .csv

User Panel Messages

Announcements