Merged multiple csv to one

SamLiu · Sep-17-2022, 02:19 PM

Hi all,
I try to merge multiple csv to one, but header with all CSVs,how to config only write once,other thing is how to ignore content of snapshot,thank in advance!!

import pandas as pd
import glob
import os

path="C:\\Users\\Sam\\pythonProject\\mainpulated_file\\"
all_files=glob.glob(os.path.join(path,"*Tester*.csv"))

all_csv=(pd.read_csv(f,sep=',') for f in all_files)
df_merged=pd.concat(all_csv,ignore_index=True)

df_merged.to_csv('merged.csv')

**deanhystad** · (This post was last modified: Sep-17-2022, 10:42 PM by deanhystad.)

The header does not have a row index. The thing that looks like a header is actually a row. That's why it shows up again and again. This wouldn't happen if it was a header.

Your files don't look like CSV files. The first 4 lines are very different from the remaining lines. Line1 looks like it could be column headers, but it contains duplicates. I have no idea what to make of lines 0, 2 and 3. Straightening out the header is the first thing you need to do.

I decided to let pandas try to sort out the header, and then throw away any garbage that immediately followed.

import pandas as pd
import glob

dfs = []
for path in glob.glob("Tester*.csv"):
    with open(path, "r") as file:
        file.readline()  # Throw away line 0
        df = pd.read_csv(file)[2:]  # Throw away lines 2 and 3
        dfs.append(df)

    all_csv = pd.concat(dfs, ignore_index=True)
 
print(all_csv)

Output:   TesterID  Batch Number  Site Number  Retest  ... TT_Settling_LM DUT_MVT_NM1_corr_PAT_Lo  DUT_MVT_NM1_corr_PAT_Hi  DUT_MVT_LM1_corr_PAT_Lo
0   Tester1           1.0          0.0     0.0  ...       0.142555              -39.195243               -37.393705               -39.187843
1   Tester1           1.0          1.0     0.0  ...       0.143227              -39.219153               -37.379417               -39.222282
2   Tester1           1.0          2.0     0.0  ...       0.141303              -39.219336               -37.405629               -39.211728
3   Tester1           1.0          3.0     0.0  ...       0.140354              -39.196658               -37.366870               -39.175577
4   Tester1           1.0          4.0     0.0  ...       0.157857              -39.188032               -37.394963               -39.177588
5   Tester1           1.0          5.0     0.0  ...       0.172997              -39.211672               -37.381087               -39.202590
6   Tester1           1.0          6.0     0.0  ...       0.156858              -39.202400               -37.394763               -39.190315
7   Tester1           1.0          7.0     0.0  ...       0.145466                     NaN                      NaN                      NaN 
8   Tester1           1.0          8.0     0.0  ...       0.155768              -39.253510               -37.334465               -39.225693 
9   Tester1           1.0          9.0     0.0  ...       0.168768              -39.215654               -37.372970               -39.222424 
10  Tester1           1.0         10.0     0.0  ...       0.157497              -39.242851               -37.356496               -39.240183 
11  Tester1           1.0         11.0     0.0  ...       0.156704              -39.211828               -37.380295               -39.190747 
12  Tester1           1.0         13.0     0.0  ...       0.157244              -39.172185               -37.421204               -39.156919 
13  Tester1           1.0         14.0     0.0  ...       0.170749              -39.188823               -37.369761               -39.177097 
14  Tester1           1.0         15.0     0.0  ...       0.142479              -39.238777               -37.359947               -39.222324 
15  Tester2           1.0          0.0     0.0  ...       0.142555              -39.195243               -37.393705               -39.187843 
16  Tester2           1.0          1.0     0.0  ...       0.143227              -39.219153               -37.379417               -39.222282 
17  Tester2           1.0          2.0     0.0  ...       0.141303              -39.219336               -37.405629               -39.211728 
18  Tester2           1.0          3.0     0.0  ...       0.140354              -39.196658               -37.366870               -39.175577 
19  Tester2           1.0          4.0     0.0  ...       0.157857              -39.188032               -37.394963               -39.177588 
20  Tester2           1.0          5.0     0.0  ...       0.172997              -39.211672               -37.381087               -39.202590 
21  Tester2           1.0          6.0     0.0  ...       0.156858              -39.202400               -37.394763               -39.190315 
22  Tester2           1.0          7.0     0.0  ...       0.145466                     NaN                      NaN                      NaN 
23  Tester2           1.0          8.0     0.0  ...       0.155768              -39.253510               -37.334465               -39.225693 
24  Tester2           1.0          9.0     0.0  ...       0.168768              -39.215654               -37.372970               -39.222424 
25  Tester2           1.0         10.0     0.0  ...       0.157497              -39.242851               -37.356496               -39.240183 
26  Tester2           1.0         11.0     0.0  ...       0.156704              -39.211828               -37.380295               -39.190747 
27  Tester2           1.0         13.0     0.0  ...       0.157244              -39.172185               -37.421204               -39.156919 
28  Tester2           1.0         14.0     0.0  ...       0.170749              -39.188823               -37.369761               -39.177097 
29  Tester2           1.0         15.0     0.0  ...       0.142479              -39.238777               -37.359947               -39.222324 
30  Tester2           1.0          0.0     0.0  ...       0.142555              -39.195243               -37.393705               -39.187843 
31  Tester2           1.0          1.0     0.0  ...       0.143227              -39.219153               -37.379417               -39.222282 
32  Tester2           1.0          2.0     0.0  ...       0.141303              -39.219336               -37.405629               -39.211728 
33  Tester2           1.0          3.0     0.0  ...       0.140354              -39.196658               -37.366870               -39.175577 
34  Tester2           1.0          4.0     0.0  ...       0.157857              -39.188032               -37.394963               -39.177588 
35  Tester2           1.0          5.0     0.0  ...       0.172997              -39.211672               -37.381087               -39.202590 
36  Tester2           1.0          6.0     0.0  ...       0.156858              -39.202400               -37.394763               -39.190315 
37  Tester2           1.0          7.0     0.0  ...       0.145466                     NaN                      NaN                      NaN 
38  Tester2           1.0          8.0     0.0  ...       0.155768              -39.253510               -37.334465               -39.225693 
39  Tester2           1.0          9.0     0.0  ...       0.168768              -39.215654               -37.372970               -39.222424 
40  Tester2           1.0         10.0     0.0  ...       0.157497              -39.242851               -37.356496               -39.240183 
41  Tester2           1.0         11.0     0.0  ...       0.156704              -39.211828               -37.380295               -39.190747 
42  Tester2           1.0         13.0     0.0  ...       0.157244              -39.172185               -37.421204               -39.156919 
43  Tester2           1.0         14.0     0.0  ...       0.170749              -39.188823               -37.369761               -39.177097 
44  Tester2           1.0         15.0     0.0  ...       0.142479              -39.238777               -37.359947               -39.222324 

[45 rows x 308 columns]

Pedroski55 · Sep-17-2022, 11:10 PM

@deanhystad

The csv looks very similar to the ones in a thread just a little while back.

Can't remember what line 1 was now, but lines 2 and 3, when present, were upper and lower limits. If a test exceeded the limits, it got FAILED

In that thread the job was find all the FAILED rows and write to csv.

SamLiu · Sep-17-2022, 11:40 PM

(Sep-17-2022, 10:41 PM)deanhystad Wrote: The header does not have a row index. The thing that looks like a header is actually a row. That's why it shows up again and again. This wouldn't happen if it was a header.

Your files don't look like CSV files. The first 4 lines are very different from the remaining lines. Line1 looks like it could be column headers, but it contains duplicates. I have no idea what to make of lines 0, 2 and 3. Straightening out the header is the first thing you need to do.

I decided to let pandas try to sort out the header, and then throw away any garbage that immediately followed.

import pandas as pd
import glob

dfs = []
for path in glob.glob("Tester*.csv"):
    with open(path, "r") as file:
        file.readline()  # Throw away line 0
        df = pd.read_csv(file)[2:]  # Throw away lines 2 and 3
        dfs.append(df)

    all_csv = pd.concat(dfs, ignore_index=True)
 
print(all_csv)

Output:   TesterID  Batch Number  Site Number  Retest  ... TT_Settling_LM DUT_MVT_NM1_corr_PAT_Lo  DUT_MVT_NM1_corr_PAT_Hi  DUT_MVT_LM1_corr_PAT_Lo
0   Tester1           1.0          0.0     0.0  ...       0.142555              -39.195243               -37.393705               -39.187843
1   Tester1           1.0          1.0     0.0  ...       0.143227              -39.219153               -37.379417               -39.222282
2   Tester1           1.0          2.0     0.0  ...       0.141303              -39.219336               -37.405629               -39.211728
3   Tester1           1.0          3.0     0.0  ...       0.140354              -39.196658               -37.366870               -39.175577
4   Tester1           1.0          4.0     0.0  ...       0.157857              -39.188032               -37.394963               -39.177588
5   Tester1           1.0          5.0     0.0  ...       0.172997              -39.211672               -37.381087               -39.202590
6   Tester1           1.0          6.0     0.0  ...       0.156858              -39.202400               -37.394763               -39.190315
7   Tester1           1.0          7.0     0.0  ...       0.145466                     NaN                      NaN                      NaN 
8   Tester1           1.0          8.0     0.0  ...       0.155768              -39.253510               -37.334465               -39.225693 
9   Tester1           1.0          9.0     0.0  ...       0.168768              -39.215654               -37.372970               -39.222424 
10  Tester1           1.0         10.0     0.0  ...       0.157497              -39.242851               -37.356496               -39.240183 
11  Tester1           1.0         11.0     0.0  ...       0.156704              -39.211828               -37.380295               -39.190747 
12  Tester1           1.0         13.0     0.0  ...       0.157244              -39.172185               -37.421204               -39.156919 
13  Tester1           1.0         14.0     0.0  ...       0.170749              -39.188823               -37.369761               -39.177097 
14  Tester1           1.0         15.0     0.0  ...       0.142479              -39.238777               -37.359947               -39.222324 
15  Tester2           1.0          0.0     0.0  ...       0.142555              -39.195243               -37.393705               -39.187843 
16  Tester2           1.0          1.0     0.0  ...       0.143227              -39.219153               -37.379417               -39.222282 
17  Tester2           1.0          2.0     0.0  ...       0.141303              -39.219336               -37.405629               -39.211728 
18  Tester2           1.0          3.0     0.0  ...       0.140354              -39.196658               -37.366870               -39.175577 
19  Tester2           1.0          4.0     0.0  ...       0.157857              -39.188032               -37.394963               -39.177588 
20  Tester2           1.0          5.0     0.0  ...       0.172997              -39.211672               -37.381087               -39.202590 
21  Tester2           1.0          6.0     0.0  ...       0.156858              -39.202400               -37.394763               -39.190315 
22  Tester2           1.0          7.0     0.0  ...       0.145466                     NaN                      NaN                      NaN 
23  Tester2           1.0          8.0     0.0  ...       0.155768              -39.253510               -37.334465               -39.225693 
24  Tester2           1.0          9.0     0.0  ...       0.168768              -39.215654               -37.372970               -39.222424 
25  Tester2           1.0         10.0     0.0  ...       0.157497              -39.242851               -37.356496               -39.240183 
26  Tester2           1.0         11.0     0.0  ...       0.156704              -39.211828               -37.380295               -39.190747 
27  Tester2           1.0         13.0     0.0  ...       0.157244              -39.172185               -37.421204               -39.156919 
28  Tester2           1.0         14.0     0.0  ...       0.170749              -39.188823               -37.369761               -39.177097 
29  Tester2           1.0         15.0     0.0  ...       0.142479              -39.238777               -37.359947               -39.222324 
30  Tester2           1.0          0.0     0.0  ...       0.142555              -39.195243               -37.393705               -39.187843 
31  Tester2           1.0          1.0     0.0  ...       0.143227              -39.219153               -37.379417               -39.222282 
32  Tester2           1.0          2.0     0.0  ...       0.141303              -39.219336               -37.405629               -39.211728 
33  Tester2           1.0          3.0     0.0  ...       0.140354              -39.196658               -37.366870               -39.175577 
34  Tester2           1.0          4.0     0.0  ...       0.157857              -39.188032               -37.394963               -39.177588 
35  Tester2           1.0          5.0     0.0  ...       0.172997              -39.211672               -37.381087               -39.202590 
36  Tester2           1.0          6.0     0.0  ...       0.156858              -39.202400               -37.394763               -39.190315 
37  Tester2           1.0          7.0     0.0  ...       0.145466                     NaN                      NaN                      NaN 
38  Tester2           1.0          8.0     0.0  ...       0.155768              -39.253510               -37.334465               -39.225693 
39  Tester2           1.0          9.0     0.0  ...       0.168768              -39.215654               -37.372970               -39.222424 
40  Tester2           1.0         10.0     0.0  ...       0.157497              -39.242851               -37.356496               -39.240183 
41  Tester2           1.0         11.0     0.0  ...       0.156704              -39.211828               -37.380295               -39.190747 
42  Tester2           1.0         13.0     0.0  ...       0.157244              -39.172185               -37.421204               -39.156919 
43  Tester2           1.0         14.0     0.0  ...       0.170749              -39.188823               -37.369761               -39.177097 
44  Tester2           1.0         15.0     0.0  ...       0.142479              -39.238777               -37.359947               -39.222324 

[45 rows x 308 columns]

Much appreciate, this is a way to handle it, each csv file contain 4 row such Row1 is TestId,Row2 is TestName,Row 3 is Limit_up,Row 4 is Limit_down,need to save one time.

Pedroski55 · Sep-18-2022, 02:41 AM

Without pandas:

import csv
import glob

path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
    print('The csv files are:', file)

while True:    
    myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
    if myfile == 'q':
        break
    files.append(file)

def getData(acsv):
    with open(acsv) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        data = []
        for row in csv_reader:
            data.append(row)
        return data

data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])

# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
    if not data1[i] == data2[i]:
        print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
    if not data2[i] == data3[i]:
        print('This line', i, 'is not the same in data1 and data2')
# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
with open(path + 'merge_me.csv', 'w') as file:    
    for i in range(0, len(data1)):
        data1_string = ','.join(data1[i])
        file.write(data1_string + '\n')
    for i in range(5, len(data2)):
        data2_string = ','.join(data2[i])
        file.write(data2_string + '\n')
    for i in range(5, len(data3)):
        data3_string = ','.join(data3[i])
        file.write(data3_string + '\n')   

# test it
with open(path + 'merge_me.csv') as file: 
    csv_reader = csv.reader(file, delimiter=',')
    for row in csv_reader:
        print(row)
print('Loooks OK!')

SamLiu · Sep-18-2022, 07:48 AM

(Sep-18-2022, 02:41 AM)Pedroski55 Wrote: Without pandas:

import csv
import glob

path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
    print('The csv files are:', file)

while True:    
    myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
    if myfile == 'q':
        break
    files.append(file)

def getData(acsv):
    with open(acsv) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        data = []
        for row in csv_reader:
            data.append(row)
        return data

data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])

# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
    if not data1[i] == data2[i]:
        print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
    if not data2[i] == data3[i]:
        print('This line', i, 'is not the same in data1 and data2')
# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
with open(path + 'merge_me.csv', 'w') as file:    
    for i in range(0, len(data1)):
        data1_string = ','.join(data1[i])
        file.write(data1_string + '\n')
    for i in range(5, len(data2)):
        data2_string = ','.join(data2[i])
        file.write(data2_string + '\n')
    for i in range(5, len(data3)):
        data3_string = ','.join(data3[i])
        file.write(data3_string + '\n')   

# test it
with open(path + 'merge_me.csv') as file: 
    csv_reader = csv.reader(file, delimiter=',')
    for row in csv_reader:
        print(row)
print('Loooks OK!')

so cool, help fixed it, move files array variable into read CSVs, it is worked, thanks!!

import csv
import glob

files=[]

path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
    files.append(file)
    print('The csv files are:', file)

while True:    
    myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
    if myfile == 'q':
        break


def getData(acsv):
    with open(acsv) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        data = []
        for row in csv_reader:
            data.append(row)
        return data

data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])

# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
    if not data1[i] == data2[i]:
        print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
    if not data2[i] == data3[i]:
        print('This line', i, 'is not the same in data1 and data2')
# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
with open(path + 'merge_me.csv', 'w') as file:    
    for i in range(0, len(data1)):
        data1_string = ','.join(data1[i])
        file.write(data1_string + '\n')
    for i in range(5, len(data2)):
        data2_string = ','.join(data2[i])
        file.write(data2_string + '\n')
    for i in range(5, len(data3)):
        data3_string = ','.join(data3[i])
        file.write(data3_string + '\n')   

# test it
with open(path + 'merge_me.csv') as file: 
    csv_reader = csv.reader(file, delimiter=',')
    for row in csv_reader:
        print(row)
print('Loooks OK!')

[/quote]

Pedroski55 · Sep-18-2022, 11:04 PM

Sorry, I forgot the files list when copying and pasting:

import csv
import glob

path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
    print('The csv files are:', file)

files = [] # forgot this
while True:    
    myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
    if myfile == 'q':
        break
    files.append(file)

def getData(acsv):
    with open(acsv) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        data = []
        for row in csv_reader:
            data.append(row)
        return data

data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])

# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
    if not data1[i] == data2[i]:
        print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
    if not data2[i] == data3[i]:
        print('This line', i, 'is not the same in data2 and data3')

# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
##with open(path + 'merge_me.csv', 'w') as file:    
##    for i in range(0, len(data1)):
##        data1_string = ','.join(data1[i])
##        file.write(data1_string + '\n')
##    for i in range(5, len(data2)):
##        data2_string = ','.join(data2[i])
##        file.write(data2_string + '\n')
##    for i in range(5, len(data3)):
##        data3_string = ','.join(data3[i])
##        file.write(data3_string + '\n')   

# also works!!
with open(path + 'merge_me.csv', 'w') as csvfile:
    csv_writer = csv.writer(csvfile)
    for i in range(0, len(data1)):
        csv_writer.writerow(data1[i])        
    for i in range(5, len(data2)):
        csv_writer.writerow(data2[i])
    for i in range(5, len(data3)):
        csv_writer.writerow(data3[i])

# test it
with open(path + 'merge_me.csv') as file: 
    csv_reader = csv.reader(file, delimiter=',')
    for row in csv_reader:
        print(row)
print('Looks OK!')

SamLiu · Sep-19-2022, 02:20 PM

(Sep-18-2022, 11:04 PM)Pedroski55 Wrote: Sorry, I forgot the files list when copying and pasting:

import csv
import glob

path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
    print('The csv files are:', file)

files = [] # forgot this
while True:    
    myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
    if myfile == 'q':
        break
    files.append(file)

def getData(acsv):
    with open(acsv) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        data = []
        for row in csv_reader:
            data.append(row)
        return data

data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])

# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
    if not data1[i] == data2[i]:
        print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
    if not data2[i] == data3[i]:
        print('This line', i, 'is not the same in data2 and data3')

# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
##with open(path + 'merge_me.csv', 'w') as file:    
##    for i in range(0, len(data1)):
##        data1_string = ','.join(data1[i])
##        file.write(data1_string + '\n')
##    for i in range(5, len(data2)):
##        data2_string = ','.join(data2[i])
##        file.write(data2_string + '\n')
##    for i in range(5, len(data3)):
##        data3_string = ','.join(data3[i])
##        file.write(data3_string + '\n')   

# also works!!
with open(path + 'merge_me.csv', 'w') as csvfile:
    csv_writer = csv.writer(csvfile)
    for i in range(0, len(data1)):
        csv_writer.writerow(data1[i])        
    for i in range(5, len(data2)):
        csv_writer.writerow(data2[i])
    for i in range(5, len(data3)):
        csv_writer.writerow(data3[i])

# test it
with open(path + 'merge_me.csv') as file: 
    csv_reader = csv.reader(file, delimiter=',')
    for row in csv_reader:
        print(row)
print('Looks OK!')

Thank you for your kind help,merge_me.csv format wrong by this code as snapshot.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	What is the best way to merged csv files and commit them to a database?	SuchUmami	2	1,424	Jul-10-2023, 02:31 AM Last Post: deanhystad

Merged multiple csv to one

User Panel Messages

Announcements