Remove CSV header

Truman · Dec-25-2018, 12:07 AM

#! python3
# removecsvheader.py - Removes the header from all CSV files in the current working directory

import csv, os

os.makedirs('headerRemoved', exist_ok=True)

# loop through every file in the current working directory.
for csvFilename in os.listdir('.'):
    if not csvFilename.endswith('.csv'):
        continue # skip non-csv files
    print('Removing header from ' + csvFilename + '...')
	
# Read the CSV file in ( skipping first row )
csvRows = []
csvFileObj = open(csvFilename)
readerObj = csv.reader(csvFileObj)
for row in readerObj:
	if readerObj.line_num == 1:
		continue  # skip first row
	csvRows.append(row)
csvFileObj.close()

# Write out the CSV file
csvFileObj = open(os.path.join('headerRemoved', csvFilename), 'w', newline='')
csvWriter = csv.writer(csvFileObj)
for row in csvRows:
	csvWriter.writerow(row)
csvFileObj.close()

Traceback

Error:Traceback (most recent call last):
  File "C:\Python36\kodovi\removecsvheader.py", line 16, in <module>
    csvFileObj = open(csvFilename)
PermissionError: [Errno 13] Permission denied: '__pycache__'

Does anyone understand this error? Why is permission denied to access __pycache__? ( I know what pycache is, found a good explanation on stackoverflow )

**Gribouillis** · Dec-25-2018, 07:54 AM

I don't understand the issue, but csvFilename should not be __pycache__. I seems to me that lines 14-29 should be indented in the for loop.

That said, I would better try a shorter way using shutil.copyfileobj

#! python3
# removecsvheader.py - Removes the header from all CSV files in the current working directory
 
import csv, os
import shutil
 
os.makedirs('headerRemoved', exist_ok=True)
 
# loop through every file in the current working directory.
for csvFilename in os.listdir('.'):
    if not csvFilename.endswith('.csv'):
        continue # skip non-csv files
    print('Removing header from ' + csvFilename + '...')
    targetFilename = os.path.join('headerRemoved', csvFilename)
    with open(csvFilename) as ifo, open(targetFilename, "w") as ofo:
        ifo.readline()
        shutil.copyfileobj(ifo, ofo)

**Larz60+** · (This post was last modified: Dec-25-2018, 11:49 PM by Larz60+.)

Headers are used when you are using csv.DictReader (which can be very handy).
It's easy to simply bypass the first record.
You can tell if a header is present (or not) by using csv.Sniffer to get the dialect, and then pass results to dialect attribute:

with open('filename', 'r') as fp:
    sample = fp.read(1024)
    sdialect = csv.Sniffer().sniff(sample)
    fp.seek(0)
    reader = csv.reader(csvfile, dialect=sdialect)
    # skip header
    for n, row in enumerate(reader):
        if n == 0:
            continue
        ...

edited, needed clarification

Truman · (This post was last modified: Dec-26-2018, 12:30 AM by Truman.)

Gribouillis, indentation opens a folder headerRemoved but it's empty. Now will check your solution and solution of larz.
and again, I get an empty code with shutil. And your code doesn't really skip the first line, right?

**Gribouillis** · Dec-26-2018, 06:31 AM

Quote:indentation opens a folder headerRemoved but it's empty

It can be empty if there is no csv file in the current directory. Can you print the list os.listdir('.') and see if it contains filenames that end with .csv ?

Truman · (This post was last modified: Dec-26-2018, 10:55 PM by Truman.)

My bad, I used an another lap-top yesterday where I didn't copy files. Now checking this one with files, yes, it did copy those files to folder without the first line.
Now let me check your code. It also gives the same result. What I don't understand where is a condition that eliminates the first line in targetFilename?

***snippsat*** · (This post was last modified: Dec-26-2018, 11:16 PM by snippsat.)

(Dec-26-2018, 10:55 PM)Truman Wrote: What I don't understand where is a condition that eliminates the first line in targetFilename?

It's line 16,ifo.readline().

import io

# Simulate a file
ifo = io.StringIO('''\
header line
1,2,3
4,5,6''')

Test it:

>>> ifo.readline()
'header line\n'
>>> # After this ifo file object contain this
>>> print(ifo.read())
1,2,3
4,5,6

I would have used next(ifo),which i like better than ifo.readline().

**Gribouillis** · (This post was last modified: Dec-26-2018, 11:19 PM by Gribouillis.)

Truman Wrote:What I don't understand where is a condition that eliminates the first line in targetFilename?

The statement ifo.readline() reads the first line of the input file, advancing the file position after the first newline. Then the copyfileobj() copies the rest of the file.

@snippsat Why do you think next(ifo) is better than ifo.readline() ?

***snippsat*** · Dec-26-2018, 11:28 PM

(Dec-26-2018, 11:19 PM)Gribouillis Wrote: @snippsat Why do you think next(ifo) is better than ifo.readline() ?

Not sure i did use readline() before to skip header,but after i start using next() i just think it read and look better.
In Python 2 i also did sometime ifo.next(),but Python 3 made it nicer i think with next(ifo).

**Gribouillis** · Dec-26-2018, 11:39 PM

Ideally, I would like to avoid this line completely and write copyfileobj(islice(ifo, 1, None), ofo). Unfortunately it doesn't work because an iterable is not necessarily a file object. So considering that we are in the context of file objects and not in the context of iterables, I prefer ifo.readline(), it is less distracting :-)

Remove CSV header

User Panel Messages

Announcements