Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Merged multiple csv to one
#1
Hi all,
I try to merge multiple csv to one, but header with all CSVs,how to config only write once,other thing is how to ignore content of snapshot,thank in advance!!

import pandas as pd
import glob
import os

path="C:\\Users\\Sam\\pythonProject\\mainpulated_file\\"
all_files=glob.glob(os.path.join(path,"*Tester*.csv"))

all_csv=(pd.read_csv(f,sep=',') for f in all_files)
df_merged=pd.concat(all_csv,ignore_index=True)

df_merged.to_csv('merged.csv')
Reply
#2
The header does not have a row index. The thing that looks like a header is actually a row. That's why it shows up again and again. This wouldn't happen if it was a header.

Your files don't look like CSV files. The first 4 lines are very different from the remaining lines. Line1 looks like it could be column headers, but it contains duplicates. I have no idea what to make of lines 0, 2 and 3. Straightening out the header is the first thing you need to do.

I decided to let pandas try to sort out the header, and then throw away any garbage that immediately followed.
import pandas as pd
import glob

dfs = []
for path in glob.glob("Tester*.csv"):
    with open(path, "r") as file:
        file.readline()  # Throw away line 0
        df = pd.read_csv(file)[2:]  # Throw away lines 2 and 3
        dfs.append(df)

    all_csv = pd.concat(dfs, ignore_index=True)
 
print(all_csv)
Output:
TesterID Batch Number Site Number Retest ... TT_Settling_LM DUT_MVT_NM1_corr_PAT_Lo DUT_MVT_NM1_corr_PAT_Hi DUT_MVT_LM1_corr_PAT_Lo 0 Tester1 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843 1 Tester1 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282 2 Tester1 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728 3 Tester1 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577 4 Tester1 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588 5 Tester1 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590 6 Tester1 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315 7 Tester1 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN 8 Tester1 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693 9 Tester1 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424 10 Tester1 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183 11 Tester1 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747 12 Tester1 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919 13 Tester1 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097 14 Tester1 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324 15 Tester2 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843 16 Tester2 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282 17 Tester2 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728 18 Tester2 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577 19 Tester2 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588 20 Tester2 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590 21 Tester2 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315 22 Tester2 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN 23 Tester2 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693 24 Tester2 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424 25 Tester2 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183 26 Tester2 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747 27 Tester2 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919 28 Tester2 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097 29 Tester2 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324 30 Tester2 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843 31 Tester2 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282 32 Tester2 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728 33 Tester2 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577 34 Tester2 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588 35 Tester2 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590 36 Tester2 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315 37 Tester2 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN 38 Tester2 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693 39 Tester2 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424 40 Tester2 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183 41 Tester2 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747 42 Tester2 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919 43 Tester2 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097 44 Tester2 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324 [45 rows x 308 columns]
Reply
#3
@deanhystad

The csv looks very similar to the ones in a thread just a little while back.

Can't remember what line 1 was now, but lines 2 and 3, when present, were upper and lower limits. If a test exceeded the limits, it got FAILED

In that thread the job was find all the FAILED rows and write to csv.
Reply
#4
(Sep-17-2022, 10:41 PM)deanhystad Wrote: The header does not have a row index. The thing that looks like a header is actually a row. That's why it shows up again and again. This wouldn't happen if it was a header.

Your files don't look like CSV files. The first 4 lines are very different from the remaining lines. Line1 looks like it could be column headers, but it contains duplicates. I have no idea what to make of lines 0, 2 and 3. Straightening out the header is the first thing you need to do.

I decided to let pandas try to sort out the header, and then throw away any garbage that immediately followed.
import pandas as pd
import glob

dfs = []
for path in glob.glob("Tester*.csv"):
    with open(path, "r") as file:
        file.readline()  # Throw away line 0
        df = pd.read_csv(file)[2:]  # Throw away lines 2 and 3
        dfs.append(df)

    all_csv = pd.concat(dfs, ignore_index=True)
 
print(all_csv)
Output:
TesterID Batch Number Site Number Retest ... TT_Settling_LM DUT_MVT_NM1_corr_PAT_Lo DUT_MVT_NM1_corr_PAT_Hi DUT_MVT_LM1_corr_PAT_Lo 0 Tester1 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843 1 Tester1 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282 2 Tester1 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728 3 Tester1 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577 4 Tester1 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588 5 Tester1 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590 6 Tester1 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315 7 Tester1 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN 8 Tester1 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693 9 Tester1 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424 10 Tester1 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183 11 Tester1 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747 12 Tester1 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919 13 Tester1 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097 14 Tester1 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324 15 Tester2 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843 16 Tester2 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282 17 Tester2 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728 18 Tester2 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577 19 Tester2 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588 20 Tester2 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590 21 Tester2 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315 22 Tester2 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN 23 Tester2 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693 24 Tester2 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424 25 Tester2 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183 26 Tester2 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747 27 Tester2 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919 28 Tester2 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097 29 Tester2 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324 30 Tester2 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843 31 Tester2 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282 32 Tester2 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728 33 Tester2 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577 34 Tester2 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588 35 Tester2 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590 36 Tester2 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315 37 Tester2 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN 38 Tester2 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693 39 Tester2 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424 40 Tester2 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183 41 Tester2 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747 42 Tester2 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919 43 Tester2 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097 44 Tester2 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324 [45 rows x 308 columns]


Much appreciate, this is a way to handle it, each csv file contain 4 row such Row1 is TestId,Row2 is TestName,Row 3 is Limit_up,Row 4 is Limit_down,need to save one time.
Reply
#5
Without pandas:

import csv
import glob

path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
    print('The csv files are:', file)

while True:    
    myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
    if myfile == 'q':
        break
    files.append(file)

def getData(acsv):
    with open(acsv) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        data = []
        for row in csv_reader:
            data.append(row)
        return data

data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])

# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
    if not data1[i] == data2[i]:
        print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
    if not data2[i] == data3[i]:
        print('This line', i, 'is not the same in data1 and data2')
# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
with open(path + 'merge_me.csv', 'w') as file:    
    for i in range(0, len(data1)):
        data1_string = ','.join(data1[i])
        file.write(data1_string + '\n')
    for i in range(5, len(data2)):
        data2_string = ','.join(data2[i])
        file.write(data2_string + '\n')
    for i in range(5, len(data3)):
        data3_string = ','.join(data3[i])
        file.write(data3_string + '\n')   

# test it
with open(path + 'merge_me.csv') as file: 
    csv_reader = csv.reader(file, delimiter=',')
    for row in csv_reader:
        print(row)
print('Loooks OK!')
Reply
#6
(Sep-18-2022, 02:41 AM)Pedroski55 Wrote: Without pandas:

import csv
import glob

path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
    print('The csv files are:', file)

while True:    
    myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
    if myfile == 'q':
        break
    files.append(file)

def getData(acsv):
    with open(acsv) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        data = []
        for row in csv_reader:
            data.append(row)
        return data

data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])

# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
    if not data1[i] == data2[i]:
        print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
    if not data2[i] == data3[i]:
        print('This line', i, 'is not the same in data1 and data2')
# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
with open(path + 'merge_me.csv', 'w') as file:    
    for i in range(0, len(data1)):
        data1_string = ','.join(data1[i])
        file.write(data1_string + '\n')
    for i in range(5, len(data2)):
        data2_string = ','.join(data2[i])
        file.write(data2_string + '\n')
    for i in range(5, len(data3)):
        data3_string = ','.join(data3[i])
        file.write(data3_string + '\n')   

# test it
with open(path + 'merge_me.csv') as file: 
    csv_reader = csv.reader(file, delimiter=',')
    for row in csv_reader:
        print(row)
print('Loooks OK!')




so cool, help fixed it, move files array variable into read CSVs, it is worked, thanks!!

import csv
import glob

files=[]

path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
    files.append(file)
    print('The csv files are:', file)

while True:    
    myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
    if myfile == 'q':
        break


def getData(acsv):
    with open(acsv) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        data = []
        for row in csv_reader:
            data.append(row)
        return data

data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])

# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
    if not data1[i] == data2[i]:
        print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
    if not data2[i] == data3[i]:
        print('This line', i, 'is not the same in data1 and data2')
# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
with open(path + 'merge_me.csv', 'w') as file:    
    for i in range(0, len(data1)):
        data1_string = ','.join(data1[i])
        file.write(data1_string + '\n')
    for i in range(5, len(data2)):
        data2_string = ','.join(data2[i])
        file.write(data2_string + '\n')
    for i in range(5, len(data3)):
        data3_string = ','.join(data3[i])
        file.write(data3_string + '\n')   

# test it
with open(path + 'merge_me.csv') as file: 
    csv_reader = csv.reader(file, delimiter=',')
    for row in csv_reader:
        print(row)
print('Loooks OK!')
[/quote]
Reply
#7
Sorry, I forgot the files list when copying and pasting:

import csv
import glob

path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
    print('The csv files are:', file)

files = [] # forgot this
while True:    
    myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
    if myfile == 'q':
        break
    files.append(file)

def getData(acsv):
    with open(acsv) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        data = []
        for row in csv_reader:
            data.append(row)
        return data

data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])

# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
    if not data1[i] == data2[i]:
        print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
    if not data2[i] == data3[i]:
        print('This line', i, 'is not the same in data2 and data3')

# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
##with open(path + 'merge_me.csv', 'w') as file:    
##    for i in range(0, len(data1)):
##        data1_string = ','.join(data1[i])
##        file.write(data1_string + '\n')
##    for i in range(5, len(data2)):
##        data2_string = ','.join(data2[i])
##        file.write(data2_string + '\n')
##    for i in range(5, len(data3)):
##        data3_string = ','.join(data3[i])
##        file.write(data3_string + '\n')   

# also works!!
with open(path + 'merge_me.csv', 'w') as csvfile:
    csv_writer = csv.writer(csvfile)
    for i in range(0, len(data1)):
        csv_writer.writerow(data1[i])        
    for i in range(5, len(data2)):
        csv_writer.writerow(data2[i])
    for i in range(5, len(data3)):
        csv_writer.writerow(data3[i])

# test it
with open(path + 'merge_me.csv') as file: 
    csv_reader = csv.reader(file, delimiter=',')
    for row in csv_reader:
        print(row)
print('Looks OK!')
Reply
#8
(Sep-18-2022, 11:04 PM)Pedroski55 Wrote: Sorry, I forgot the files list when copying and pasting:

import csv
import glob

path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
    print('The csv files are:', file)

files = [] # forgot this
while True:    
    myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
    if myfile == 'q':
        break
    files.append(file)

def getData(acsv):
    with open(acsv) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        data = []
        for row in csv_reader:
            data.append(row)
        return data

data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])

# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
    if not data1[i] == data2[i]:
        print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
    if not data2[i] == data3[i]:
        print('This line', i, 'is not the same in data2 and data3')

# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
##with open(path + 'merge_me.csv', 'w') as file:    
##    for i in range(0, len(data1)):
##        data1_string = ','.join(data1[i])
##        file.write(data1_string + '\n')
##    for i in range(5, len(data2)):
##        data2_string = ','.join(data2[i])
##        file.write(data2_string + '\n')
##    for i in range(5, len(data3)):
##        data3_string = ','.join(data3[i])
##        file.write(data3_string + '\n')   

# also works!!
with open(path + 'merge_me.csv', 'w') as csvfile:
    csv_writer = csv.writer(csvfile)
    for i in range(0, len(data1)):
        csv_writer.writerow(data1[i])        
    for i in range(5, len(data2)):
        csv_writer.writerow(data2[i])
    for i in range(5, len(data3)):
        csv_writer.writerow(data3[i])

# test it
with open(path + 'merge_me.csv') as file: 
    csv_reader = csv.reader(file, delimiter=',')
    for row in csv_reader:
        print(row)
print('Looks OK!')



Thank you for your kind help,merge_me.csv format wrong by this code as snapshot.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  What is the best way to merged csv files and commit them to a database? SuchUmami 2 620 Jul-10-2023, 02:31 AM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020