Python Forum
merge two csv files into output.csv using Subprocess
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
merge two csv files into output.csv using Subprocess
#1
Hi Team,

I want to merge two csv files, data1.csv and data2.csv , and create Output.csv


via command line this script works.
copy /b D:\output\data\data1.csv+D:\output\data\data2.csv D:\output\data\output.csv

how to achieve above code via python.
below is attempted code.

import subprocess
from pathlib import Path
import shlex

src1 = str(Path("D:\output\data\data1.csv"))
src2 = str(Path("D:\output\data\data2.csv"))
output = str(Path("D:\output\data\output.csv"))

CMD = "copy /b D:\\output\\data\\header.csv+D:\\output\\data\\data.csv D:\\output\\data\\output.csv"

args = shlex.split(CMD)
print(args)


subprocess.run(["scp", "copy", "/b",src1+src2, output])
subprocess.run(["scp", "copy","/b", "D:\\output\\data\\data1.csv"+"D:\\output\\data\\data2.csv", "D:\\output\\data\\output.csv"])
Reply
#2
Use Python instead of depending on OS infrastructure. This program could not run on Linux/Mac for example.
Here an example function to merge files:
from pathlib import Path


def merge_files(input_files: list[str | Path], output_file: str | Path) -> None:
    # user of code could supply a list with paths as str
    # or the paths as Path objetcs
    # but later Path objects are required, so all objects
    # are converted here
    input_files = [Path(file) for file in input_files]
    output_file = Path(output_file)
    
    # opening file in binary mode, which prevents encoding errors
    # if all input_files are encoded with the same encoding
    # Path objects do have the open method, which supports context managers
    with output_file.open("wb") as fd_out:
        # iterating over all input files
        for input_file in input_files:
            # same here, opening the input file in binary mode
            with input_file.open("rb") as fd_in:
                # https://realpython.com/lessons/assignment-expressions/
                # read 4KiB chunks
                while chunk := fd_in.read(4 * 1024 ** 1):
                    fd_out.write(chunk)
                
                # if the last line of input_file has no
                # lineseperator at the end, adding one to it
                # keep in mind, that everything is in binary mode
                
                # os.linesep is OS depended
                if not chunk.endswith(os.linesep.encode("ascii")):
                    fd_out.write(os.linesep.encode("ascii"))
Pedroski55 likes this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
I'm always interested in learning new stuff from the experts here. Your function looks complicated!

Using the Idle shell I did:

from pathlib import Path
 
in_files = ['/home/pedro/myPython/csv/csv/time.csv', '/home/pedro/myPython/csv/csv/randomdups.csv']
out_file = '/home/pedro/myPython/csv/csv/merged_output.csv'
Then I copied and pasted your function in the Idle shell. I get:

Error:
Traceback (most recent call last): File "/usr/lib/python3.8/idlelib/run.py", line 559, in runcode exec(code, self.locals) File "<pyshell#4>", line 1, in <module> TypeError: unsupported operand type(s) for |: 'type' and 'type'
Can you please tell me how to correct this error?
Reply
#4
Pandas can many times make stuff simpler.
import pandas as pd

df1 = pd.read_csv("1.csv")
df2 = pd.read_csv("2.csv")
df = pd.concat([df1, df2])
df.to_csv("output.csv", index=False)
Output:
Name,Age,Gender Jane Doe,32,Female John Smith,45,Male Kent smith,40,Male Tom Olsen,45,Male

Output:
1.csv Name,Age,Gender Jane Doe,32,Female John Smith,45,Male 2.csv Name,Age,Gender Kent smith,40,Male Tom Olsen,45,Male
Reply
#5
Hi Dead_Eye and Snippsat,

Thanks again for you help, I was looking for BCP or SCP Approach,

because with the below code, I have extracted big sql table data directly into csv .
I wanted combine both, hence answer I was looking in BCP \SCP Style.

pandas cant handle 20gb of sql data or csv data.

copy /b d:\bcp_testing\tablewithheaders.csv+d:\bcp_testing\tablewithoutheaders.csv d:\bcp_testing\TableWithout.csv

set bcp_export_server = 
set bcp_export_DB = 
set bcp_export_Schema = 
set bcp_export_Table = 

BCP"Declare @colnames Varchar(max);
select @colnames = COALESCE((@colnames+'|',")+
column_name from %bcp_export_DB%.information_Schema.COLUMNS where Table_schema='%BCP_EXPORT_HSBC_SCHEMA% and
Table_name='%bcp_export_Table%' ORDER BY ORDINAL_POSITION;select @colnames;"
queryout d:\bcp_testing\tablewithheaders.csv -c-t,-T-S%bcp_export_server%


BCP "Select * from %bcp_export_DB%.%bcp_export_Schema%.%bcp_export_Table%"
queryout d:\bcp_testing\tablewithoutheaders.csv -c-t"|",-T-S%bcp_export_server%

copy /b d:\bcp_testing\tablewithheaders.csv+d:\bcp_testing\tablewithoutheaders.csv d:\bcp_testing\TableWithout.csv
Reply
#6
(Dec-07-2022, 11:01 PM)Pedroski55 Wrote: TypeError: unsupported operand type(s) for |: 'type' and 'type'

This is not your fault. It comes from the fancy annotations (Union: |), which are not supported with your Python Version.
It's the | operator for Types which was introduced with Python 3.10: https://peps.python.org/pep-0604/
Before it Union was used:
# Python 3.9
from typing import Union

my_type: Union[int | float] = 10.0

# same with Python 3.10
my_type: int | float = 10.0
To suppress this TypeError just do as first import:
from __future__ import annotations
# this will convert all annotations to str, and mypy is used to do the type checking
# in addition, the IDE gets the TypeHints.

# other imports

# code
The Exception comes from this line:
def merge_files(input_files: list[str | Path], output_file: str | Path) -> None:
This should tell the programmer that input_files could be a list with str or Path objects inside.
output_file could be str or Path object.

You must understand this as a hint. There is no checking of type hints during runtime. You can also use a wrong type hints, and Python won't stop you doing this. For beginners, it's not so relevant in the first place. This is more often used in libraries, to give the developer a hint, which types could be used for arguments of functions and methods. In user code, type hints are not used so often.

What I dislike of pandas is the magic behind. It's not black-magic, but feels like magic. Do not try to learn pandas first. Learn first the basics of Python (which do not include type hints).
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#7
@DeaD_EyE after importing from __future__ import annotations it worked perfectly! Now all I need to do is try to understand what's happening!! Big Grin Big Grin Big Grin

@mg24: if the database table is so big, why not export smaller parts? Dump 10 000 rows instead of all rows. Or make a bit of Python to get 10 000 rows at a time from each table and write to csv? That way, you would have nice manageable chunks!
Reply
#8
(Dec-08-2022, 05:37 AM)mg24 Wrote: pandas cant handle 20gb of sql data or csv data.
Use Dask | Dask DataFrame copies the pandas DataFrame API
Quote:Dask DataFrame is used in situations where pandas is commonly needed,
usually when pandas fails due to data size or computation speed:
Why and How to Use Dask with Big Data
Pedroski55 likes this post
Reply
#9
Hi snippsat,


Regarding ----> pandas cant handle 20gb of sql data or csv data.

Can we read data in small chunk and write small chunksize data to csv. this way cant we achieve.



Thanks
mg
Reply
#10
mg24 ... snippsat gave you link to Dask. You should take a look.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  merge all xlsb files into csv mg24 0 303 Nov-13-2023, 08:25 AM
Last Post: mg24
  Merge all json files in folder after filtering deneme2 10 2,253 Sep-18-2022, 10:32 AM
Last Post: deneme2
  Merge htm files with shutil library (TypeError: 'module' object is not callable) Melcu54 5 1,493 Aug-28-2022, 07:11 AM
Last Post: Melcu54
  Increment text files output and limit contains Kaminsky 1 3,135 Jan-30-2021, 06:58 PM
Last Post: bowlofred
  How to merge all the files in a directory in to one file sutra 3 2,592 Dec-10-2020, 12:09 AM
Last Post: sutra
  opening files and output of parsing leodavinci1990 4 2,456 Oct-12-2020, 06:52 AM
Last Post: bowlofred
  Merge JSON Files Ugo 4 4,568 Aug-20-2020, 06:25 AM
Last Post: ndc85430
  How to get program output from subprocess.Popen? glestwid 1 2,333 Aug-19-2020, 05:44 AM
Last Post: buran
  How to read multiple csv files and merge data rajeshE 0 1,917 Mar-28-2020, 04:01 PM
Last Post: rajeshE
  error merge text files ledgreve 3 2,627 Nov-18-2019, 12:41 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020