merge two csv files into output.csv using Subprocess - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: merge two csv files into output.csv using Subprocess (/thread-38889.html) |
merge two csv files into output.csv using Subprocess - mg24 - Dec-07-2022 Hi Team, I want to merge two csv files, data1.csv and data2.csv , and create Output.csv via command line this script works. copy /b D:\output\data\data1.csv+D:\output\data\data2.csv D:\output\data\output.csv how to achieve above code via python. below is attempted code. import subprocess from pathlib import Path import shlex src1 = str(Path("D:\output\data\data1.csv")) src2 = str(Path("D:\output\data\data2.csv")) output = str(Path("D:\output\data\output.csv")) CMD = "copy /b D:\\output\\data\\header.csv+D:\\output\\data\\data.csv D:\\output\\data\\output.csv" args = shlex.split(CMD) print(args) subprocess.run(["scp", "copy", "/b",src1+src2, output]) subprocess.run(["scp", "copy","/b", "D:\\output\\data\\data1.csv"+"D:\\output\\data\\data2.csv", "D:\\output\\data\\output.csv"]) RE: merge two csv files into output.csv using Subprocess - DeaD_EyE - Dec-07-2022 Use Python instead of depending on OS infrastructure. This program could not run on Linux/Mac for example. Here an example function to merge files: from pathlib import Path def merge_files(input_files: list[str | Path], output_file: str | Path) -> None: # user of code could supply a list with paths as str # or the paths as Path objetcs # but later Path objects are required, so all objects # are converted here input_files = [Path(file) for file in input_files] output_file = Path(output_file) # opening file in binary mode, which prevents encoding errors # if all input_files are encoded with the same encoding # Path objects do have the open method, which supports context managers with output_file.open("wb") as fd_out: # iterating over all input files for input_file in input_files: # same here, opening the input file in binary mode with input_file.open("rb") as fd_in: # https://realpython.com/lessons/assignment-expressions/ # read 4KiB chunks while chunk := fd_in.read(4 * 1024 ** 1): fd_out.write(chunk) # if the last line of input_file has no # lineseperator at the end, adding one to it # keep in mind, that everything is in binary mode # os.linesep is OS depended if not chunk.endswith(os.linesep.encode("ascii")): fd_out.write(os.linesep.encode("ascii")) RE: merge two csv files into output.csv using Subprocess - Pedroski55 - Dec-07-2022 I'm always interested in learning new stuff from the experts here. Your function looks complicated! Using the Idle shell I did: from pathlib import Path in_files = ['/home/pedro/myPython/csv/csv/time.csv', '/home/pedro/myPython/csv/csv/randomdups.csv'] out_file = '/home/pedro/myPython/csv/csv/merged_output.csv'Then I copied and pasted your function in the Idle shell. I get: Can you please tell me how to correct this error?
RE: merge two csv files into output.csv using Subprocess - snippsat - Dec-08-2022 Pandas can many times make stuff simpler. import pandas as pd df1 = pd.read_csv("1.csv") df2 = pd.read_csv("2.csv") df = pd.concat([df1, df2]) df.to_csv("output.csv", index=False)
RE: merge two csv files into output.csv using Subprocess - mg24 - Dec-08-2022 Hi Dead_Eye and Snippsat, Thanks again for you help, I was looking for BCP or SCP Approach, because with the below code, I have extracted big sql table data directly into csv . I wanted combine both, hence answer I was looking in BCP \SCP Style. pandas cant handle 20gb of sql data or csv data. copy /b d:\bcp_testing\tablewithheaders.csv+d:\bcp_testing\tablewithoutheaders.csv d:\bcp_testing\TableWithout.csv set bcp_export_server = set bcp_export_DB = set bcp_export_Schema = set bcp_export_Table = BCP"Declare @colnames Varchar(max); select @colnames = COALESCE((@colnames+'|',")+ column_name from %bcp_export_DB%.information_Schema.COLUMNS where Table_schema='%BCP_EXPORT_HSBC_SCHEMA% and Table_name='%bcp_export_Table%' ORDER BY ORDINAL_POSITION;select @colnames;" queryout d:\bcp_testing\tablewithheaders.csv -c-t,-T-S%bcp_export_server% BCP "Select * from %bcp_export_DB%.%bcp_export_Schema%.%bcp_export_Table%" queryout d:\bcp_testing\tablewithoutheaders.csv -c-t"|",-T-S%bcp_export_server% copy /b d:\bcp_testing\tablewithheaders.csv+d:\bcp_testing\tablewithoutheaders.csv d:\bcp_testing\TableWithout.csv RE: merge two csv files into output.csv using Subprocess - DeaD_EyE - Dec-08-2022 (Dec-07-2022, 11:01 PM)Pedroski55 Wrote: TypeError: unsupported operand type(s) for |: 'type' and 'type' This is not your fault. It comes from the fancy annotations (Union: | ), which are not supported with your Python Version.It's the | operator for Types which was introduced with Python 3.10: https://peps.python.org/pep-0604/Before it Union was used: # Python 3.9 from typing import Union my_type: Union[int | float] = 10.0 # same with Python 3.10 my_type: int | float = 10.0To suppress this TypeError just do as first import:from __future__ import annotations # this will convert all annotations to str, and mypy is used to do the type checking # in addition, the IDE gets the TypeHints. # other imports # codeThe Exception comes from this line: def merge_files(input_files: list[str | Path], output_file: str | Path) -> None:This should tell the programmer that input_files could be a list with str or Path objects inside.output_file could be str or Path object.You must understand this as a hint. There is no checking of type hints during runtime. You can also use a wrong type hints, and Python won't stop you doing this. For beginners, it's not so relevant in the first place. This is more often used in libraries, to give the developer a hint, which types could be used for arguments of functions and methods. In user code, type hints are not used so often. What I dislike of pandas is the magic behind. It's not black-magic, but feels like magic. Do not try to learn pandas first. Learn first the basics of Python (which do not include type hints). RE: merge two csv files into output.csv using Subprocess - Pedroski55 - Dec-08-2022 @DeaD_EyE after importing from __future__ import annotations it worked perfectly! Now all I need to do is try to understand what's happening!! @mg24: if the database table is so big, why not export smaller parts? Dump 10 000 rows instead of all rows. Or make a bit of Python to get 10 000 rows at a time from each table and write to csv? That way, you would have nice manageable chunks! RE: merge two csv files into output.csv using Subprocess - snippsat - Dec-09-2022 (Dec-08-2022, 05:37 AM)mg24 Wrote: pandas cant handle 20gb of sql data or csv data.Use Dask | Dask DataFrame copies the pandas DataFrame API Quote:Dask DataFrame is used in situations where pandas is commonly needed,Why and How to Use Dask with Big Data RE: merge two csv files into output.csv using Subprocess - mg24 - Dec-11-2022 Hi snippsat, Regarding ----> pandas cant handle 20gb of sql data or csv data. Can we read data in small chunk and write small chunksize data to csv. this way cant we achieve. Thanks mg RE: merge two csv files into output.csv using Subprocess - Larz60+ - Dec-11-2022 mg24 ... snippsat gave you link to Dask. You should take a look. |