merge two csv files into output.csv using Subprocess

merge two csv files into output.csv using Subprocess - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: merge two csv files into output.csv using Subprocess (/thread-38889.html)

merge two csv files into output.csv using Subprocess - mg24 - Dec-07-2022

Hi Team,

I want to merge two csv files, data1.csv and data2.csv , and create Output.csv

via command line this script works.
copy /b D:\output\data\data1.csv+D:\output\data\data2.csv D:\output\data\output.csv

how to achieve above code via python.
below is attempted code.

import subprocess
from pathlib import Path
import shlex

src1 = str(Path("D:\output\data\data1.csv"))
src2 = str(Path("D:\output\data\data2.csv"))
output = str(Path("D:\output\data\output.csv"))

CMD = "copy /b D:\\output\\data\\header.csv+D:\\output\\data\\data.csv D:\\output\\data\\output.csv"

args = shlex.split(CMD)
print(args)


subprocess.run(["scp", "copy", "/b",src1+src2, output])
subprocess.run(["scp", "copy","/b", "D:\\output\\data\\data1.csv"+"D:\\output\\data\\data2.csv", "D:\\output\\data\\output.csv"])

RE: merge two csv files into output.csv using Subprocess - DeaD_EyE - Dec-07-2022

Use Python instead of depending on OS infrastructure. This program could not run on Linux/Mac for example.
Here an example function to merge files:

from pathlib import Path


def merge_files(input_files: list[str | Path], output_file: str | Path) -> None:
    # user of code could supply a list with paths as str
    # or the paths as Path objetcs
    # but later Path objects are required, so all objects
    # are converted here
    input_files = [Path(file) for file in input_files]
    output_file = Path(output_file)
    
    # opening file in binary mode, which prevents encoding errors
    # if all input_files are encoded with the same encoding
    # Path objects do have the open method, which supports context managers
    with output_file.open("wb") as fd_out:
        # iterating over all input files
        for input_file in input_files:
            # same here, opening the input file in binary mode
            with input_file.open("rb") as fd_in:
                # https://realpython.com/lessons/assignment-expressions/
                # read 4KiB chunks
                while chunk := fd_in.read(4 * 1024 ** 1):
                    fd_out.write(chunk)
                
                # if the last line of input_file has no
                # lineseperator at the end, adding one to it
                # keep in mind, that everything is in binary mode
                
                # os.linesep is OS depended
                if not chunk.endswith(os.linesep.encode("ascii")):
                    fd_out.write(os.linesep.encode("ascii"))

RE: merge two csv files into output.csv using Subprocess - Pedroski55 - Dec-07-2022

I'm always interested in learning new stuff from the experts here. Your function looks complicated!

Using the Idle shell I did:

from pathlib import Path
 
in_files = ['/home/pedro/myPython/csv/csv/time.csv', '/home/pedro/myPython/csv/csv/randomdups.csv']
out_file = '/home/pedro/myPython/csv/csv/merged_output.csv'

Then I copied and pasted your function in the Idle shell. I get:

Error:Traceback (most recent call last):
  File "/usr/lib/python3.8/idlelib/run.py", line 559, in runcode
    exec(code, self.locals)
  File "<pyshell#4>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'type' and 'type'

Can you please tell me how to correct this error?

RE: merge two csv files into output.csv using Subprocess - snippsat - Dec-08-2022

Pandas can many times make stuff simpler.

import pandas as pd

df1 = pd.read_csv("1.csv")
df2 = pd.read_csv("2.csv")
df = pd.concat([df1, df2])
df.to_csv("output.csv", index=False)

Output:Name,Age,Gender
Jane Doe,32,Female
John Smith,45,Male
Kent smith,40,Male
Tom Olsen,45,Male

Output:1.csv
Name,Age,Gender
Jane Doe,32,Female
John Smith,45,Male

2.csv
Name,Age,Gender
Kent smith,40,Male
Tom Olsen,45,Male

RE: merge two csv files into output.csv using Subprocess - mg24 - Dec-08-2022

Hi Dead_Eye and Snippsat,

Thanks again for you help, I was looking for BCP or SCP Approach,

because with the below code, I have extracted big sql table data directly into csv .
I wanted combine both, hence answer I was looking in BCP \SCP Style.

pandas cant handle 20gb of sql data or csv data.

copy /b d:\bcp_testing\tablewithheaders.csv+d:\bcp_testing\tablewithoutheaders.csv d:\bcp_testing\TableWithout.csv

set bcp_export_server = 
set bcp_export_DB = 
set bcp_export_Schema = 
set bcp_export_Table = 

BCP"Declare @colnames Varchar(max);
select @colnames = COALESCE((@colnames+'|',")+
column_name from %bcp_export_DB%.information_Schema.COLUMNS where Table_schema='%BCP_EXPORT_HSBC_SCHEMA% and
Table_name='%bcp_export_Table%' ORDER BY ORDINAL_POSITION;select @colnames;"
queryout d:\bcp_testing\tablewithheaders.csv -c-t,-T-S%bcp_export_server%


BCP "Select * from %bcp_export_DB%.%bcp_export_Schema%.%bcp_export_Table%"
queryout d:\bcp_testing\tablewithoutheaders.csv -c-t"|",-T-S%bcp_export_server%

copy /b d:\bcp_testing\tablewithheaders.csv+d:\bcp_testing\tablewithoutheaders.csv d:\bcp_testing\TableWithout.csv

RE: merge two csv files into output.csv using Subprocess - DeaD_EyE - Dec-08-2022

(Dec-07-2022, 11:01 PM)Pedroski55 Wrote: TypeError: unsupported operand type(s) for |: 'type' and 'type'

This is not your fault. It comes from the fancy annotations (Union: |), which are not supported with your Python Version.
It's the | operator for Types which was introduced with Python 3.10: https://peps.python.org/pep-0604/
Before it Union was used:

# Python 3.9
from typing import Union

my_type: Union[int | float] = 10.0

# same with Python 3.10
my_type: int | float = 10.0

To suppress this TypeError just do as first import:

from __future__ import annotations
# this will convert all annotations to str, and mypy is used to do the type checking
# in addition, the IDE gets the TypeHints.

# other imports

# code

The Exception comes from this line:

def merge_files(input_files: list[str | Path], output_file: str | Path) -> None:

This should tell the programmer that input_files could be a list with str or Path objects inside.
output_file could be str or Path object.

You must understand this as a hint. There is no checking of type hints during runtime. You can also use a wrong type hints, and Python won't stop you doing this. For beginners, it's not so relevant in the first place. This is more often used in libraries, to give the developer a hint, which types could be used for arguments of functions and methods. In user code, type hints are not used so often.

What I dislike of pandas is the magic behind. It's not black-magic, but feels like magic. Do not try to learn pandas first. Learn first the basics of Python (which do not include type hints).

RE: merge two csv files into output.csv using Subprocess - Pedroski55 - Dec-08-2022

@DeaD_EyE after importing from __future__ import annotations it worked perfectly! Now all I need to do is try to understand what's happening!! Big Grin

@mg24: if the database table is so big, why not export smaller parts? Dump 10 000 rows instead of all rows. Or make a bit of Python to get 10 000 rows at a time from each table and write to csv? That way, you would have nice manageable chunks!

RE: merge two csv files into output.csv using Subprocess - snippsat - Dec-09-2022

(Dec-08-2022, 05:37 AM)mg24 Wrote: pandas cant handle 20gb of sql data or csv data.

Use Dask | Dask DataFrame copies the pandas DataFrame API

Quote:Dask DataFrame is used in situations where pandas is commonly needed,
usually when pandas fails due to data size or computation speed:

Why and How to Use Dask with Big Data

RE: merge two csv files into output.csv using Subprocess - mg24 - Dec-11-2022

Hi snippsat,

Regarding ----> pandas cant handle 20gb of sql data or csv data.

Can we read data in small chunk and write small chunksize data to csv. this way cant we achieve.

Thanks
mg

RE: merge two csv files into output.csv using Subprocess - Larz60+ - Dec-11-2022

mg24 ... snippsat gave you link to Dask. You should take a look.