Posts: 404
Threads: 94
Joined: Dec 2017
#! python3
# removecsvheader.py - Removes the header from all CSV files in the current working directory
import csv, os
os.makedirs('headerRemoved', exist_ok=True)
# loop through every file in the current working directory.
for csvFilename in os.listdir('.'):
if not csvFilename.endswith('.csv'):
continue # skip non-csv files
print('Removing header from ' + csvFilename + '...')
# Read the CSV file in ( skipping first row )
csvRows = []
csvFileObj = open(csvFilename)
readerObj = csv.reader(csvFileObj)
for row in readerObj:
if readerObj.line_num == 1:
continue # skip first row
csvRows.append(row)
csvFileObj.close()
# Write out the CSV file
csvFileObj = open(os.path.join('headerRemoved', csvFilename), 'w', newline='')
csvWriter = csv.writer(csvFileObj)
for row in csvRows:
csvWriter.writerow(row)
csvFileObj.close() Traceback
Error: Traceback (most recent call last):
File "C:\Python36\kodovi\removecsvheader.py", line 16, in <module>
csvFileObj = open(csvFilename)
PermissionError: [Errno 13] Permission denied: '__pycache__'
Does anyone understand this error? Why is permission denied to access __pycache__? ( I know what pycache is, found a good explanation on stackoverflow )
Posts: 4,780
Threads: 76
Joined: Jan 2018
I don't understand the issue, but csvFilename should not be __pycache__ . I seems to me that lines 14-29 should be indented in the for loop.
That said, I would better try a shorter way using shutil.copyfileobj
#! python3
# removecsvheader.py - Removes the header from all CSV files in the current working directory
import csv, os
import shutil
os.makedirs('headerRemoved', exist_ok=True)
# loop through every file in the current working directory.
for csvFilename in os.listdir('.'):
if not csvFilename.endswith('.csv'):
continue # skip non-csv files
print('Removing header from ' + csvFilename + '...')
targetFilename = os.path.join('headerRemoved', csvFilename)
with open(csvFilename) as ifo, open(targetFilename, "w") as ofo:
ifo.readline()
shutil.copyfileobj(ifo, ofo)
Posts: 12,022
Threads: 484
Joined: Sep 2016
Dec-25-2018, 11:08 AM
(This post was last modified: Dec-25-2018, 11:49 PM by Larz60+.)
Headers are used when you are using csv.DictReader (which can be very handy).
It's easy to simply bypass the first record.
You can tell if a header is present (or not) by using csv.Sniffer to get the dialect, and then pass results to dialect attribute:
with open('filename', 'r') as fp:
sample = fp.read(1024)
sdialect = csv.Sniffer().sniff(sample)
fp.seek(0)
reader = csv.reader(csvfile, dialect=sdialect)
# skip header
for n, row in enumerate(reader):
if n == 0:
continue
... edited, needed clarification
Posts: 404
Threads: 94
Joined: Dec 2017
Dec-26-2018, 12:30 AM
(This post was last modified: Dec-26-2018, 12:30 AM by Truman.)
Gribouillis, indentation opens a folder headerRemoved but it's empty. Now will check your solution and solution of larz.
and again, I get an empty code with shutil. And your code doesn't really skip the first line, right?
Posts: 4,780
Threads: 76
Joined: Jan 2018
Quote:indentation opens a folder headerRemoved but it's empty
It can be empty if there is no csv file in the current directory. Can you print the list os.listdir('.') and see if it contains filenames that end with .csv ?
Posts: 404
Threads: 94
Joined: Dec 2017
Dec-26-2018, 10:55 PM
(This post was last modified: Dec-26-2018, 10:55 PM by Truman.)
My bad, I used an another lap-top yesterday where I didn't copy files. Now checking this one with files, yes, it did copy those files to folder without the first line.
Now let me check your code. It also gives the same result. What I don't understand where is a condition that eliminates the first line in targetFilename?
Posts: 7,310
Threads: 123
Joined: Sep 2016
Dec-26-2018, 11:16 PM
(This post was last modified: Dec-26-2018, 11:16 PM by snippsat.)
(Dec-26-2018, 10:55 PM)Truman Wrote: What I don't understand where is a condition that eliminates the first line in targetFilename? It's line 16, ifo.readline() .
import io
# Simulate a file
ifo = io.StringIO('''\
header line
1,2,3
4,5,6''') Test it:
>>> ifo.readline()
'header line\n'
>>> # After this ifo file object contain this
>>> print(ifo.read())
1,2,3
4,5,6 I would have used next(ifo) ,which i like better than ifo.readline() .
Posts: 4,780
Threads: 76
Joined: Jan 2018
Dec-26-2018, 11:19 PM
(This post was last modified: Dec-26-2018, 11:19 PM by Gribouillis.)
Truman Wrote:What I don't understand where is a condition that eliminates the first line in targetFilename? The statement ifo.readline() reads the first line of the input file, advancing the file position after the first newline. Then the copyfileobj() copies the rest of the file.
@ snippsat Why do you think next(ifo) is better than ifo.readline() ?
Posts: 7,310
Threads: 123
Joined: Sep 2016
(Dec-26-2018, 11:19 PM)Gribouillis Wrote: @snippsat Why do you think next(ifo) is better than ifo.readline() ? Not sure i did use readline() before to skip header,but after i start using next() i just think it read and look better.
In Python 2 i also did sometime ifo.next() ,but Python 3 made it nicer i think with next(ifo) .
Posts: 4,780
Threads: 76
Joined: Jan 2018
Ideally, I would like to avoid this line completely and write copyfileobj(islice(ifo, 1, None), ofo) . Unfortunately it doesn't work because an iterable is not necessarily a file object. So considering that we are in the context of file objects and not in the context of iterables, I prefer ifo.readline() , it is less distracting :-)
|