Python Forum

I work with large CSV files that often times are too large to open in Excel. I'm trying to write a program that will extract a dozen or so columns from the 1460 columns in the file. I know the header names that I need, but I cannot get my parse program to work.

import csv

datafile = open("C:\\Users\\Administrator\\Desktop\\test.csv",'r')
reader = csv.reader(datafile,delimiter=',')
  
outfile= open("C:\\Users\\Administrator\\Desktop\\csv.csv",'w')
writer = csv.writer(outfile, delimiter=',')

header=[]
parameter=['Rec #','AirSpd_ADC-1','AirSpd_ADC-2','AirTemp-1','AirTemp-2']

i=0
for row in reader: # get index of header names for iterating
    if i == 8:
        for name in row:
            if name in parameter:
                header.append(row.index(name))
    i+=1
    
datafile.seek(0)  # reset to use csv reader again

for row in reader:
    for col in row:
        for indx in header:
            outfile.write(row[indx])

Error:
>>IndexError: string index out of range

You need to format your post with the code tags to make it readable: https://python-forum.io/misc.php?action=help&hid=25

Also, does your IndexError provide a stack trace that says which line is blowing up?

Without a proper code formatting, the least I can see is that you are using the reader iterable in two for loops. But after the first one, it is exhausted already.

My apologies for the format issue guys. My first post. Mpd,the trace back is to the code: outfile.write(row[indx]). Wavic, you are correct the csv reader is exhausted in the first for reader loop, but after hours of searching online I found a thread with the .seek(0) method. Best I can tell it resets the file to the top so that the reader can be iterated over again. And it works.

After drawing a diagram of the code last night I am certain that there are too many "for" loops in the last for statement.

you are over-complicating something simple as

import csv

infile = 'infile.csv' # file with large number of columns, dummy filednames field1, field2...field1460
outfile = 'outfile.csv' # you need just 3 out of 1460 columns field2, field10, filed25, in that order

fieldnames = ['field2', 'field10', 'field25']

with open(infile, 'r') as in_f, open(outfile, 'w', newline='') as out_f:
    rdr = csv.DictReader(in_f)
    wrtr = csv.DictWriter(out_f, fieldnames=fieldnames, extrasaction='ignore')
    wrtr.writeheader()
    for row in rdr:
        wrtr.writerow(row)

mathisp64

mpd

wavic

mathisp64

buran