Python Forum
My python code is running very slow on millions of records
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
My python code is running very slow on millions of records
#1
I want to process data through a python that has 2 million rows and more than 100 columns. My code takes 20 minutes to create an output file. I don't know if there is something else that make my code faster, or if I can change something to make it faster. Any help would be greatly appreciated!

df2 = pd.DataFrame()
    for fn in csv_files:  # Looping Over CSV Files
        all_dfs = pd.read_csv(fn, header=None)

        # Finding non-null columns
        non_null_columns = [col for col in all_dfs.columns if all_dfs.loc[:, col].notna().any()]

        # print(non_null_columns)
        for i in range(0, len(all_dfs)):  # Row Loop
            SourceFile = ""
            RowNumber = ""
            ColumnNumber = ""
            Value = ""
            for j in range(0, len(non_null_columns)):  # Column Loop
                SourceFile = Path(fn.name)
                RowNumber = i+1
                ColumnNumber = j+1
                Value = all_dfs.iloc[i, j]
                df2 = df2.append(pd.DataFrame({
                    "SourceFile": [SourceFile],
                    "RowNumber": [RowNumber],
                    "ColumnNumber": [ColumnNumber],
                    "Value": [Value]
                }), ignore_index=True)
                # print(df2)
    df2['Value'].replace('', np.nan, inplace=True)  # Removing Null Value
    df2.dropna(subset=['Value'], inplace=True)
    df2.to_csv(os.path.join(path_save, f"Compiled.csv"), index=False)
    print("Output: Compiled.csv")
Attach python code.

Attached Files

.py   NormalizedCSV.py (Size: 2.2 KB / Downloads: 302)
.csv   Test.csv (Size: 708 bytes / Downloads: 334)
Reply


Messages In This Thread
My python code is running very slow on millions of records - by shantanu97 - Dec-27-2021, 11:02 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  writing and running code in vscode without saving it akbarza 5 2,195 Mar-03-2025, 08:14 PM
Last Post: Gribouillis
  Python: How to import data from txt, instead of running the data from the code? Melcu54 1 597 Dec-13-2024, 06:50 AM
Last Post: Gribouillis
  Why the python is so slow? rohhthone 3 1,046 Oct-07-2024, 09:54 PM
Last Post: DeaD_EyE
  Sudden Extremely Slow / Failed Python Imports bmccollum 1 1,064 Aug-20-2024, 02:09 PM
Last Post: DeaD_EyE
  problem in running a code akbarza 7 2,233 Feb-14-2024, 02:57 PM
Last Post: snippsat
  the order of running code in a decorator function akbarza 2 1,283 Nov-10-2023, 08:09 AM
Last Post: akbarza
  validate large json file with millions of records in batches herobpv 3 2,130 Dec-10-2022, 10:36 PM
Last Post: bowlofred
  How to retrieve records in a DataFrame (Python/Pandas) that contains leading or trail mmunozjr 3 3,482 Sep-05-2022, 11:56 AM
Last Post: Pedroski55
  Code running many times nad not just one? korenron 4 2,155 Jul-24-2022, 08:12 AM
Last Post: korenron
  Error while running code on VSC maiya 4 6,566 Jul-01-2022, 02:51 PM
Last Post: maiya

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020