Posts: 119
Threads: 66
Joined: Sep 2022
Hi Team,
I want to merge two csv files, data1.csv and data2.csv , and create Output.csv
via command line this script works.
copy /b D:\output\data\data1.csv+D:\output\data\data2.csv D:\output\data\output.csv
how to achieve above code via python.
below is attempted code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import subprocess
from pathlib import Path
import shlex
src1 = str (Path( "D:\output\data\data1.csv" ))
src2 = str (Path( "D:\output\data\data2.csv" ))
output = str (Path( "D:\output\data\output.csv" ))
CMD = "copy /b D:\\output\\data\\header.csv+D:\\output\\data\\data.csv D:\\output\\data\\output.csv"
args = shlex.split(CMD)
print (args)
subprocess.run([ "scp" , "copy" , "/b" ,src1 + src2, output])
subprocess.run([ "scp" , "copy" , "/b" , "D:\\output\\data\\data1.csv" + "D:\\output\\data\\data2.csv" , "D:\\output\\data\\output.csv" ])
|
Posts: 2,120
Threads: 10
Joined: May 2017
Dec-07-2022, 09:53 AM
(This post was last modified: Dec-07-2022, 09:53 AM by DeaD_EyE.)
Use Python instead of depending on OS infrastructure. This program could not run on Linux/Mac for example.
Here an example function to merge files:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
from pathlib import Path
def merge_files(input_files: list [ str | Path], output_file: str | Path) - > None :
input_files = [Path( file ) for file in input_files]
output_file = Path(output_file)
with output_file. open ( "wb" ) as fd_out:
for input_file in input_files:
with input_file. open ( "rb" ) as fd_in:
while chunk : = fd_in.read( 4 * 1024 * * 1 ):
fd_out.write(chunk)
if not chunk.endswith(os.linesep.encode( "ascii" )):
fd_out.write(os.linesep.encode( "ascii" ))
|
Pedroski55 likes this post
Posts: 1,088
Threads: 143
Joined: Jul 2017
Dec-07-2022, 11:01 PM
(This post was last modified: Dec-09-2022, 07:12 AM by Yoriz.
Edit Reason: Added error tags
)
I'm always interested in learning new stuff from the experts here. Your function looks complicated!
Using the Idle shell I did:
1 2 3 4 |
from pathlib import Path
in_files = [ '/home/pedro/myPython/csv/csv/time.csv' , '/home/pedro/myPython/csv/csv/randomdups.csv' ]
out_file = '/home/pedro/myPython/csv/csv/merged_output.csv'
|
Then I copied and pasted your function in the Idle shell. I get:
Error: Traceback (most recent call last):
File "/usr/lib/python3.8/idlelib/run.py", line 559, in runcode
exec(code, self.locals)
File "<pyshell#4>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'type' and 'type'
Can you please tell me how to correct this error?
Posts: 7,310
Threads: 123
Joined: Sep 2016
Dec-08-2022, 01:02 AM
(This post was last modified: Dec-08-2022, 01:02 AM by snippsat.)
Pandas can many times make stuff simpler.
1 2 3 4 5 6 |
import pandas as pd
df1 = pd.read_csv( "1.csv" )
df2 = pd.read_csv( "2.csv" )
df = pd.concat([df1, df2])
df.to_csv( "output.csv" , index = False )
|
Output: Name,Age,Gender
Jane Doe,32,Female
John Smith,45,Male
Kent smith,40,Male
Tom Olsen,45,Male
Output: 1.csv
Name,Age,Gender
Jane Doe,32,Female
John Smith,45,Male
2.csv
Name,Age,Gender
Kent smith,40,Male
Tom Olsen,45,Male
Posts: 119
Threads: 66
Joined: Sep 2022
Hi Dead_Eye and Snippsat,
Thanks again for you help, I was looking for BCP or SCP Approach,
because with the below code, I have extracted big sql table data directly into csv .
I wanted combine both, hence answer I was looking in BCP \SCP Style.
pandas cant handle 20gb of sql data or csv data.
copy /b d:\bcp_testing\tablewithheaders.csv+d:\bcp_testing\tablewithoutheaders.csv d:\bcp_testing\TableWithout.csv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
set bcp_export_server =
set bcp_export_DB =
set bcp_export_Schema =
set bcp_export_Table =
BCP"Declare @colnames Varchar( max );
select @colnames = COALESCE((@colnames + '|' ,") +
column_name from % bcp_export_DB % .information_Schema.COLUMNS where Table_schema = ' % BCP_EXPORT_HSBC_SCHEMA % and
Table_name = '%bcp_export_Table%' ORDER BY ORDINAL_POSITION;select @colnames;"
queryout d:\bcp_testing\tablewithheaders.csv - c - t, - T - S % bcp_export_server %
BCP "Select * from %bcp_export_DB%.%bcp_export_Schema%.%bcp_export_Table%"
queryout d:\bcp_testing\tablewithoutheaders.csv - c - t "|" , - T - S % bcp_export_server %
copy / b d:\bcp_testing\tablewithheaders.csv + d:\bcp_testing\tablewithoutheaders.csv d:\bcp_testing\TableWithout.csv
|
Posts: 2,120
Threads: 10
Joined: May 2017
Dec-08-2022, 10:12 AM
(This post was last modified: Dec-08-2022, 10:13 AM by DeaD_EyE.)
(Dec-07-2022, 11:01 PM)Pedroski55 Wrote: TypeError: unsupported operand type(s) for |: 'type' and 'type'
This is not your fault. It comes from the fancy annotations (Union: | ), which are not supported with your Python Version.
It's the | operator for Types which was introduced with Python 3.10: https://peps.python.org/pep-0604/
Before it Union was used:
1 2 3 4 5 6 7 |
from typing import Union
my_type: Union[ int | float ] = 10.0
my_type: int | float = 10.0
|
To suppress this TypeError just do as first import:
1 2 3 4 5 6 7 |
from __future__ import annotations
|
The Exception comes from this line:
1 |
def merge_files(input_files: list [ str | Path], output_file: str | Path) - > None :
|
This should tell the programmer that input_files could be a list with str or Path objects inside.
output_file could be str or Path object.
You must understand this as a hint. There is no checking of type hints during runtime. You can also use a wrong type hints, and Python won't stop you doing this. For beginners, it's not so relevant in the first place. This is more often used in libraries, to give the developer a hint, which types could be used for arguments of functions and methods. In user code, type hints are not used so often.
What I dislike of pandas is the magic behind. It's not black-magic, but feels like magic. Do not try to learn pandas first. Learn first the basics of Python (which do not include type hints).
Posts: 1,088
Threads: 143
Joined: Jul 2017
@ DeaD_EyE after importing from __future__ import annotations it worked perfectly! Now all I need to do is try to understand what's happening!!
@mg24: if the database table is so big, why not export smaller parts? Dump 10 000 rows instead of all rows. Or make a bit of Python to get 10 000 rows at a time from each table and write to csv? That way, you would have nice manageable chunks!
Posts: 7,310
Threads: 123
Joined: Sep 2016
Dec-09-2022, 12:55 AM
(This post was last modified: Dec-09-2022, 12:57 AM by snippsat.)
(Dec-08-2022, 05:37 AM)mg24 Wrote: pandas cant handle 20gb of sql data or csv data. Use Dask | Dask DataFrame copies the pandas DataFrame API
Quote:Dask DataFrame is used in situations where pandas is commonly needed,
usually when pandas fails due to data size or computation speed:
Why and How to Use Dask with Big Data
Pedroski55 likes this post
Posts: 119
Threads: 66
Joined: Sep 2022
Hi snippsat,
Regarding ----> pandas cant handle 20gb of sql data or csv data.
Can we read data in small chunk and write small chunksize data to csv. this way cant we achieve.
Thanks
mg
Posts: 12,022
Threads: 484
Joined: Sep 2016
mg24 ... snippsat gave you link to Dask. You should take a look.
|