Feb-11-2020, 09:15 PM
I have a lot of csv files with data and about 30 possible known headers, Header1 to Header30. The number of rows can be practically anything, from 1 to thousands. Not all the headers are present in each file. Seperator is a semicolon.
An example is Header1;Header3;Header8;Header10;Header11;Header17;Header18
Some of the headers have to be combined to one with a new headername.
The columns with Header1 to Header3, if one of them exists, should always be combined in column NewHeaderA devided by spaces.
The column with Header4 becomes column NewHeaderB
The column with Header5 becomes column NewHeaderC
The columns with Header6 to Header14, if one of them exists, should always be combined in column NewHeaderD devided by spaces.
And so on in more combinations for the rest of the headers.
So far I've made a script that reads a csv and writes out another file with all the same headers or part of the existing headers in a different order. After that I'm stuck. Some help would be appreciated.
Due to some system limitations, pandas or numpy are not installed. Just plain Python 3.7.6
An example is Header1;Header3;Header8;Header10;Header11;Header17;Header18
Some of the headers have to be combined to one with a new headername.
The columns with Header1 to Header3, if one of them exists, should always be combined in column NewHeaderA devided by spaces.
The column with Header4 becomes column NewHeaderB
The column with Header5 becomes column NewHeaderC
The columns with Header6 to Header14, if one of them exists, should always be combined in column NewHeaderD devided by spaces.
And so on in more combinations for the rest of the headers.
So far I've made a script that reads a csv and writes out another file with all the same headers or part of the existing headers in a different order. After that I'm stuck. Some help would be appreciated.
Due to some system limitations, pandas or numpy are not installed. Just plain Python 3.7.6
import sys inFile = sys.argv[1] outFile = sys.argv[2] # open csv file csvfile = open(inFile, "r" ) reader = csv.DictReader(csvfile, delimiter=';') # open output file outfile = open(outFile, "w" ) # fieldnames=reader.fieldnames --> uses all columns fieldnames = ["Header1","Header4","Header5","Header8","Header19","test"] # --> uses the specific columns in the order provided, test becomes an emty column # write the output to a file writer = csv.DictWriter(outfile, delimiter=';', fieldnames=fieldnames, extrasaction='ignore') headers = {} for n in writer.fieldnames: headers[n] = n writer.writerow(headers) for row in reader: writer.writerow(row) csvfile.close() outfile.close()