Posts: 41
Threads: 15
Joined: Aug 2022
Hi all,
I try to merge multiple csv to one, but header with all CSVs,how to config only write once,other thing is how to ignore content of snapshot,thank in advance!!
import pandas as pd
import glob
import os
path="C:\\Users\\Sam\\pythonProject\\mainpulated_file\\"
all_files=glob.glob(os.path.join(path,"*Tester*.csv"))
all_csv=(pd.read_csv(f,sep=',') for f in all_files)
df_merged=pd.concat(all_csv,ignore_index=True)
df_merged.to_csv('merged.csv')
Posts: 6,778
Threads: 20
Joined: Feb 2020
Sep-17-2022, 10:41 PM
(This post was last modified: Sep-17-2022, 10:42 PM by deanhystad.)
The header does not have a row index. The thing that looks like a header is actually a row. That's why it shows up again and again. This wouldn't happen if it was a header.
Your files don't look like CSV files. The first 4 lines are very different from the remaining lines. Line1 looks like it could be column headers, but it contains duplicates. I have no idea what to make of lines 0, 2 and 3. Straightening out the header is the first thing you need to do.
I decided to let pandas try to sort out the header, and then throw away any garbage that immediately followed.
import pandas as pd
import glob
dfs = []
for path in glob.glob("Tester*.csv"):
with open(path, "r") as file:
file.readline() # Throw away line 0
df = pd.read_csv(file)[2:] # Throw away lines 2 and 3
dfs.append(df)
all_csv = pd.concat(dfs, ignore_index=True)
print(all_csv) Output: TesterID Batch Number Site Number Retest ... TT_Settling_LM DUT_MVT_NM1_corr_PAT_Lo DUT_MVT_NM1_corr_PAT_Hi DUT_MVT_LM1_corr_PAT_Lo
0 Tester1 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843
1 Tester1 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282
2 Tester1 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728
3 Tester1 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577
4 Tester1 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588
5 Tester1 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590
6 Tester1 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315
7 Tester1 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN
8 Tester1 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693
9 Tester1 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424
10 Tester1 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183
11 Tester1 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747
12 Tester1 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919
13 Tester1 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097
14 Tester1 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324
15 Tester2 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843
16 Tester2 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282
17 Tester2 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728
18 Tester2 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577
19 Tester2 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588
20 Tester2 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590
21 Tester2 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315
22 Tester2 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN
23 Tester2 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693
24 Tester2 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424
25 Tester2 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183
26 Tester2 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747
27 Tester2 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919
28 Tester2 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097
29 Tester2 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324
30 Tester2 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843
31 Tester2 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282
32 Tester2 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728
33 Tester2 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577
34 Tester2 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588
35 Tester2 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590
36 Tester2 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315
37 Tester2 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN
38 Tester2 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693
39 Tester2 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424
40 Tester2 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183
41 Tester2 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747
42 Tester2 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919
43 Tester2 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097
44 Tester2 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324
[45 rows x 308 columns]
Posts: 1,089
Threads: 143
Joined: Jul 2017
@ deanhystad
The csv looks very similar to the ones in a thread just a little while back.
Can't remember what line 1 was now, but lines 2 and 3, when present, were upper and lower limits. If a test exceeded the limits, it got FAILED
In that thread the job was find all the FAILED rows and write to csv.
Posts: 41
Threads: 15
Joined: Aug 2022
(Sep-17-2022, 10:41 PM)deanhystad Wrote: The header does not have a row index. The thing that looks like a header is actually a row. That's why it shows up again and again. This wouldn't happen if it was a header.
Your files don't look like CSV files. The first 4 lines are very different from the remaining lines. Line1 looks like it could be column headers, but it contains duplicates. I have no idea what to make of lines 0, 2 and 3. Straightening out the header is the first thing you need to do.
I decided to let pandas try to sort out the header, and then throw away any garbage that immediately followed.
import pandas as pd
import glob
dfs = []
for path in glob.glob("Tester*.csv"):
with open(path, "r") as file:
file.readline() # Throw away line 0
df = pd.read_csv(file)[2:] # Throw away lines 2 and 3
dfs.append(df)
all_csv = pd.concat(dfs, ignore_index=True)
print(all_csv) Output: TesterID Batch Number Site Number Retest ... TT_Settling_LM DUT_MVT_NM1_corr_PAT_Lo DUT_MVT_NM1_corr_PAT_Hi DUT_MVT_LM1_corr_PAT_Lo
0 Tester1 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843
1 Tester1 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282
2 Tester1 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728
3 Tester1 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577
4 Tester1 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588
5 Tester1 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590
6 Tester1 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315
7 Tester1 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN
8 Tester1 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693
9 Tester1 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424
10 Tester1 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183
11 Tester1 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747
12 Tester1 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919
13 Tester1 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097
14 Tester1 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324
15 Tester2 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843
16 Tester2 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282
17 Tester2 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728
18 Tester2 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577
19 Tester2 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588
20 Tester2 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590
21 Tester2 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315
22 Tester2 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN
23 Tester2 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693
24 Tester2 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424
25 Tester2 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183
26 Tester2 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747
27 Tester2 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919
28 Tester2 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097
29 Tester2 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324
30 Tester2 1.0 0.0 0.0 ... 0.142555 -39.195243 -37.393705 -39.187843
31 Tester2 1.0 1.0 0.0 ... 0.143227 -39.219153 -37.379417 -39.222282
32 Tester2 1.0 2.0 0.0 ... 0.141303 -39.219336 -37.405629 -39.211728
33 Tester2 1.0 3.0 0.0 ... 0.140354 -39.196658 -37.366870 -39.175577
34 Tester2 1.0 4.0 0.0 ... 0.157857 -39.188032 -37.394963 -39.177588
35 Tester2 1.0 5.0 0.0 ... 0.172997 -39.211672 -37.381087 -39.202590
36 Tester2 1.0 6.0 0.0 ... 0.156858 -39.202400 -37.394763 -39.190315
37 Tester2 1.0 7.0 0.0 ... 0.145466 NaN NaN NaN
38 Tester2 1.0 8.0 0.0 ... 0.155768 -39.253510 -37.334465 -39.225693
39 Tester2 1.0 9.0 0.0 ... 0.168768 -39.215654 -37.372970 -39.222424
40 Tester2 1.0 10.0 0.0 ... 0.157497 -39.242851 -37.356496 -39.240183
41 Tester2 1.0 11.0 0.0 ... 0.156704 -39.211828 -37.380295 -39.190747
42 Tester2 1.0 13.0 0.0 ... 0.157244 -39.172185 -37.421204 -39.156919
43 Tester2 1.0 14.0 0.0 ... 0.170749 -39.188823 -37.369761 -39.177097
44 Tester2 1.0 15.0 0.0 ... 0.142479 -39.238777 -37.359947 -39.222324
[45 rows x 308 columns]
Much appreciate, this is a way to handle it, each csv file contain 4 row such Row1 is TestId,Row2 is TestName,Row 3 is Limit_up,Row 4 is Limit_down,need to save one time.
Posts: 1,089
Threads: 143
Joined: Jul 2017
Without pandas:
import csv
import glob
path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
print('The csv files are:', file)
while True:
myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
if myfile == 'q':
break
files.append(file)
def getData(acsv):
with open(acsv) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
data = []
for row in csv_reader:
data.append(row)
return data
data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])
# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
if not data1[i] == data2[i]:
print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
if not data2[i] == data3[i]:
print('This line', i, 'is not the same in data1 and data2')
# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
with open(path + 'merge_me.csv', 'w') as file:
for i in range(0, len(data1)):
data1_string = ','.join(data1[i])
file.write(data1_string + '\n')
for i in range(5, len(data2)):
data2_string = ','.join(data2[i])
file.write(data2_string + '\n')
for i in range(5, len(data3)):
data3_string = ','.join(data3[i])
file.write(data3_string + '\n')
# test it
with open(path + 'merge_me.csv') as file:
csv_reader = csv.reader(file, delimiter=',')
for row in csv_reader:
print(row)
print('Loooks OK!')
Posts: 41
Threads: 15
Joined: Aug 2022
(Sep-18-2022, 02:41 AM)Pedroski55 Wrote: Without pandas:
import csv
import glob
path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
print('The csv files are:', file)
while True:
myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
if myfile == 'q':
break
files.append(file)
def getData(acsv):
with open(acsv) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
data = []
for row in csv_reader:
data.append(row)
return data
data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])
# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
if not data1[i] == data2[i]:
print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
if not data2[i] == data3[i]:
print('This line', i, 'is not the same in data1 and data2')
# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
with open(path + 'merge_me.csv', 'w') as file:
for i in range(0, len(data1)):
data1_string = ','.join(data1[i])
file.write(data1_string + '\n')
for i in range(5, len(data2)):
data2_string = ','.join(data2[i])
file.write(data2_string + '\n')
for i in range(5, len(data3)):
data3_string = ','.join(data3[i])
file.write(data3_string + '\n')
# test it
with open(path + 'merge_me.csv') as file:
csv_reader = csv.reader(file, delimiter=',')
for row in csv_reader:
print(row)
print('Loooks OK!')
so cool, help fixed it, move files array variable into read CSVs, it is worked, thanks!!
import csv
import glob
files=[]
path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
files.append(file)
print('The csv files are:', file)
while True:
myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
if myfile == 'q':
break
def getData(acsv):
with open(acsv) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
data = []
for row in csv_reader:
data.append(row)
return data
data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])
# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
if not data1[i] == data2[i]:
print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
if not data2[i] == data3[i]:
print('This line', i, 'is not the same in data1 and data2')
# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
with open(path + 'merge_me.csv', 'w') as file:
for i in range(0, len(data1)):
data1_string = ','.join(data1[i])
file.write(data1_string + '\n')
for i in range(5, len(data2)):
data2_string = ','.join(data2[i])
file.write(data2_string + '\n')
for i in range(5, len(data3)):
data3_string = ','.join(data3[i])
file.write(data3_string + '\n')
# test it
with open(path + 'merge_me.csv') as file:
csv_reader = csv.reader(file, delimiter=',')
for row in csv_reader:
print(row)
print('Loooks OK!') [/quote]
Posts: 1,089
Threads: 143
Joined: Jul 2017
Sorry, I forgot the files list when copying and pasting:
import csv
import glob
path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
print('The csv files are:', file)
files = [] # forgot this
while True:
myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
if myfile == 'q':
break
files.append(file)
def getData(acsv):
with open(acsv) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
data = []
for row in csv_reader:
data.append(row)
return data
data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])
# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
if not data1[i] == data2[i]:
print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
if not data2[i] == data3[i]:
print('This line', i, 'is not the same in data2 and data3')
# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
##with open(path + 'merge_me.csv', 'w') as file:
## for i in range(0, len(data1)):
## data1_string = ','.join(data1[i])
## file.write(data1_string + '\n')
## for i in range(5, len(data2)):
## data2_string = ','.join(data2[i])
## file.write(data2_string + '\n')
## for i in range(5, len(data3)):
## data3_string = ','.join(data3[i])
## file.write(data3_string + '\n')
# also works!!
with open(path + 'merge_me.csv', 'w') as csvfile:
csv_writer = csv.writer(csvfile)
for i in range(0, len(data1)):
csv_writer.writerow(data1[i])
for i in range(5, len(data2)):
csv_writer.writerow(data2[i])
for i in range(5, len(data3)):
csv_writer.writerow(data3[i])
# test it
with open(path + 'merge_me.csv') as file:
csv_reader = csv.reader(file, delimiter=',')
for row in csv_reader:
print(row)
print('Looks OK!')
Posts: 41
Threads: 15
Joined: Aug 2022
(Sep-18-2022, 11:04 PM)Pedroski55 Wrote: Sorry, I forgot the files list when copying and pasting:
import csv
import glob
path = '/home/pedro/myPython/csv/csv/'
csvs = glob.glob(path + '*.csv')
for file in csvs:
print('The csv files are:', file)
files = [] # forgot this
while True:
myfile = input('Copy and paste the whole path to the file you want here, enter q to quit ... ')
if myfile == 'q':
break
files.append(file)
def getData(acsv):
with open(acsv) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
data = []
for row in csv_reader:
data.append(row)
return data
data1 = getData(files[0])
data2 = getData(files[1])
data3 = getData(files[2])
# check if the first 4 rows are the same in all files
# otherwise could be a problem
for i in range(4):
if not data1[i] == data2[i]:
print('This line', i, 'is not the same in data1 and data2')
# data1 and data2 are the same in rows 1 to 4
for i in range(4):
if not data2[i] == data3[i]:
print('This line', i, 'is not the same in data2 and data3')
# the first 4 rows seem to be the same
# write the all rows of data1, start at row 5 for data2 and data3
##with open(path + 'merge_me.csv', 'w') as file:
## for i in range(0, len(data1)):
## data1_string = ','.join(data1[i])
## file.write(data1_string + '\n')
## for i in range(5, len(data2)):
## data2_string = ','.join(data2[i])
## file.write(data2_string + '\n')
## for i in range(5, len(data3)):
## data3_string = ','.join(data3[i])
## file.write(data3_string + '\n')
# also works!!
with open(path + 'merge_me.csv', 'w') as csvfile:
csv_writer = csv.writer(csvfile)
for i in range(0, len(data1)):
csv_writer.writerow(data1[i])
for i in range(5, len(data2)):
csv_writer.writerow(data2[i])
for i in range(5, len(data3)):
csv_writer.writerow(data3[i])
# test it
with open(path + 'merge_me.csv') as file:
csv_reader = csv.reader(file, delimiter=',')
for row in csv_reader:
print(row)
print('Looks OK!')
Thank you for your kind help,merge_me.csv format wrong by this code as snapshot.
|