What is the best way to merged csv files and commit them to a database? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: What is the best way to merged csv files and commit them to a database? (/thread-40314.html) |
What is the best way to merged csv files and commit them to a database? - SuchUmami - Jul-09-2023 I have several csv files with different kinds of data on them. Currently I am making a dataframe for each one and then merging the dataframes. I also have to do a process on the data (calculating total scores) and then I'm committing them to a database. It works but I find it hard to implement new kinds of csv files with different data on them. I was wondering what other people do to make this process easy and scalable? RE: What is the best way to merged csv files and commit them to a database? - Pedroski55 - Jul-09-2023 Do the csv files have the same headers? Does the data overlap? What database are you writing the output to? MySQL? If it is Below is a way of merging any number of csv files, but it will include the header rows. Could fix that if we knew the structure of the csv files. from pathlib import Path import os # version 3.10 uses | as union # this version, 3.8 doesn't below is a workaround # from __future__ import annotations # this will convert all annotations to str, and mypy is used to do the type checking # in addition, the IDE gets the TypeHints. # could get the in_files list with glob if there are a lot of files in_files = ['/home/pedro/myPython/csv/csv/time.csv', '/home/pedro/myPython/csv/csv/randomdups.csv'] out_file = '/home/pedro/myPython/csv/csv/merged_output.csv' def merge_files(input_files: list[str | Path], output_file: str | Path) -> None: # user of code could supply a list with paths as str # or the paths as Path objectc # but later Path objects are required, so all objects # are converted here input_files = [Path(file) for file in input_files] output_file = Path(output_file) # opening file in binary mode, which prevents encoding errors # if all input_files are not encoded with the same encoding # Path objects do have the open method, which supports context managers with output_file.open("wb") as fd_out: # iterating over all input files for input_file in input_files: # same here, opening the input file in binary mode with input_file.open("rb") as fd_in: # https://realpython.com/lessons/assignment-expressions/ # read 4KiB chunks while chunk := fd_in.read(4 * 1024 ** 1): fd_out.write(chunk) # if the last line of input_file has no # lineseperator at the end, adding one to it # keep in mind, that everything is in binary mode # os.linesep is OS depended if not chunk.endswith(os.linesep.encode("ascii")): fd_out.write(os.linesep.encode("ascii")) if __name__ == "__main__": merge_files(in_files, out_file) RE: What is the best way to merged csv files and commit them to a database? - deanhystad - Jul-10-2023 If you need to do processing on the data, probably not. That implies that the CSV files cannot be treated generically. Just opening CSV files, combining them, and writing them as a table in a database could easily be written as a script that knows very little about the contents of the CSV files. |