Python Forum
Different code execution times
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Different code execution times
#1
Hi,
i have the problem with the following code that it takes extremely long time to run completely under Python 3.9 - 3.11. but under 3.6 - 3.8 it works without big problem. Can someone explain me why this is and how to make it faster on Python 3.9 and higher. The time difference is from few seconds (by 3.8) to about 1 hour (by 3.11).

I have tested it under windows 11 and Ubuntu 22.04 with the same results.

import pandas as pd
import numpy as np

fileName = "Data"
inputRange = 20

df = pd.read_csv(fileName + ".csv", delimiter=";")

x_data = df[["Close","High","Low","Volumen"]]

y_data = df["Signal"]

x_train = []

for i in range(0,len(x_data)-inputRange):
    x_train.append(x_data[(i+1):(i+inputRange+1)])
Reply
#2
How big in the csv file? I just ran your code using python 3.10.7 with Data.csv having 100,000 rows and it took 3 seconds. Using x_train.append(x_data.iloc[(i + 1) : (i + inputRange + 1)]) was about 1 second faster.

Out of curiosity I updated my pandas from 2.0.0 to 2.1.1. Now the code takes much longer to execute, about 80 times longer. Appears to be a pandas version issue, not a python version issue.

Pandas article on improving performance.

https://pandas.pydata.org/pandas-docs/st...gperf.html

Does x_train need to be a list of pandas dataframes? Would your code work if x_train was an array of numpy arrays (a 3D numpy array)? In the example below I use numpy.lib.stride_tricks.sliding_window_view(). This creates a new array that contains a series of sliding windows, 20 rows long, from x_data.
import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view

input_range = 20
df = pd.read_csv("data.csv")
x_data = df[["Close", "High", "Low", "Volumen"]].to_numpy()
x_train = sliding_window_view(x_data, (input_range, 4)).reshape(-1, input_range, 4)
When I tried using pandas 2.1.1 and your code on a CSV file with 100,000 rows, it took 4 minutes, 15 seconds. Using the code above it took 0.05 seconds. That's 5000 times faster. For your data it should reduce processing time from an hour to under a second.

I lack some understanding about sliding_window_view(). x_data.shape = (100000, 4), but when I call sliding_window_view(x_data, (input_range, 4)), x_train's shape is (99981, 1, 20, 4). I don't know why the extra axis (1) is created. The fix for now is to reshape the array to remove it.
snippsat likes this post
Reply
#3
Hi,
thanks for the quick reply.

I have now noticed that it comes from 2.0.3 to 2.1.0 to the error.
I have assumed that I use the same Pandas version everywhere but that was not the case.
But if I install everywhere Pandas 2.0.3 it works also up to Python 3.10. (in 3.11 I have not tested it and in 3.12 I had an error message)

The with the 3D numpy array I will look at the next days times more closely.

Thanks for your help.
Reply
#4
Normally you don't specify a version when installing a package, so you get the newest stable version.
Reply
#5
(Oct-04-2023, 06:06 PM)Wirbelwind94 Wrote:
for i in range(0,len(x_data)-inputRange):
When use a standard Python loop this like in Pandas,then there is usually a lot faster way.
Also as deanhystad mention the time can blow when using code like this in different version,
as should avoid to use code like this in Pandas.

Here a test with 1,000,000 rows generated Data.csv.
This read it to a numpy.ndarray and back to a DataFrame for easier access of data.
This take 3.3-sec to do.
import pandas as pd
import numpy as np

fileName = "Data"
inputRange = 20
df = pd.read_csv(fileName + ".csv", delimiter=";")
x_data = df[["Close","High","Low","Volumen"]].values
y_data = df["Signal"].values

# Create sequences of x_data
num_sequences = len(x_data) - inputRange
x_train = np.zeros((num_sequences, inputRange, x_data.shape[1]))

for i in range(num_sequences):
    x_train[i] = x_data[i:(i + inputRange)]

# Adjust y_data to align with the end of each sequence
y_train = y_data[inputRange:]

# Convert back to a DataFrame
features = ["Close", "High", "Low", "Volumen"]
# Create multi-level columns
multi_columns = pd.MultiIndex.from_product([features, range(inputRange)], names=['Feature', 'Timestep'])
# Reshape the 3D numpy array to 2D
reshaped_data = x_train.reshape((num_sequences, -1))
df_converted = pd.DataFrame(reshaped_data, columns=multi_columns)
print(df_converted.head())
print(df_converted.tail())
Feature        Close                          ...    Volumen                   
Timestep          0           1           2   ...         17         18      19
0         100.496714  100.708180  100.096421  ...  97.350607  95.925274   983.0
1         100.358450  101.260176   99.535949  ...  98.185528  97.624068  1036.0
2         101.006138  101.566533  100.399888  ...  98.023315  97.650150  1048.0
3         102.529168  103.378286  102.401322  ...  98.283334  97.853196  1002.0
4         102.295015  103.010577  101.300510  ...  96.642765  95.833808  1012.0

[5 rows x 80 columns]
Feature         Close               ...      Volumen        
Timestep           0            1   ...           18      19
999975   -1506.391447 -1506.368737  ... -1502.330789   951.0
999976   -1506.438789 -1505.710167  ... -1502.366849  1029.0
999977   -1506.919674 -1506.542284  ... -1502.014728  1029.0
999978   -1508.160130 -1507.298449  ... -1502.327236   979.0
999979   -1509.262004 -1508.807126  ... -1500.596051  1010.0
Example look at max for High column.
>>> df_converted['High'].max()
Timestep
0      472.933948
1      473.926661
2      472.407345
3     1049.000000
4      472.933948
5      473.926661
6      472.407345
7     1049.000000
8      472.933948
9      473.926661
10     472.407345
11    1049.000000
12     472.933948
13     473.926661
14     472.407345
15    1049.000000
16     472.933948
17     473.926661
18     472.407345
19    1049.000000 
There are also serval tool that is great for better speed and memory usage,like eg Dask | Polar.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Code running many times nad not just one? korenron 4 1,376 Jul-24-2022, 08:12 AM
Last Post: korenron
  In consistency in code execution Led_Zeppelin 1 1,120 Jun-27-2022, 03:00 AM
Last Post: deanhystad
  rtmidi problem after running the code x times philipbergwerf 1 2,434 Apr-04-2021, 07:07 PM
Last Post: philipbergwerf
  Minimizing the CMD window during code execution Shaswat 1 4,605 Oct-03-2019, 07:44 AM
Last Post: Shaswat
  Function Execution Times sunnyarora 3 2,581 Mar-15-2019, 04:26 PM
Last Post: sunnyarora
  code execution after event shift838 3 2,833 Nov-26-2018, 05:10 AM
Last Post: Larz60+
  How to Make Python code execution pause and resume to create .csv and read values. Kashi 2 3,777 Jun-14-2018, 04:16 PM
Last Post: DeaD_EyE
  My code prints out my string 5 times and then just stops? Abstract_Otaku 0 1,957 Jun-13-2018, 07:11 PM
Last Post: Abstract_Otaku
  Another working code, help required for faster multithreading execution anna 0 2,263 Feb-09-2018, 03:26 AM
Last Post: anna
  Help required for faster execution of working code anna 2 3,160 Feb-09-2018, 03:00 AM
Last Post: anna

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020