Oct-01-2021, 03:02 PM
I modified the code to this:
First, it takes a long time to complete the first if statement: about 70 seconds with the last print saying "Row is 710..." I can't explain the time. It seems to go slowly to a point (e.g. Row 200-400) and then all the rest print at once, showing the same elap_time.
Then, it takes a very long time for the graph to display and when it does, it shows the full range of x-values up to about 1100-1200. It's the same graph as I saw previously when I printed out the dictionary without any limitations (here I tried to restrict only to key < 251). Previously without limitations, it took less than 1 second. Now, it takes an additional 1-2 minutes after the first if block is complete. I can't explain why it shows 251-1200 or why it takes so long.
At no point should Python be looping through the entire .csv multiple times, should it?
Mark
#goal here is to sum number of rows by DTE import time start_time = time.time() df = pd.read_csv("C:/Users/Mark/Desktop/SPX_2021_copy.csv") my_dict = dict(df['DTE'].value_counts()) for num,key in enumerate(my_dict): if num%10 == 0: check_time = time.time() elap_time = check_time - start_time print(f'Row is {num}, elapsed time is {elap_time:.2f}, and projected time is {717/(num+1)*elap_time:.2f}.') if key < 251: plt.bar(my_dict.keys(),my_dict.values())The .csv file is 307,910 rows by 16 columns. Here are a couple observations.
First, it takes a long time to complete the first if statement: about 70 seconds with the last print saying "Row is 710..." I can't explain the time. It seems to go slowly to a point (e.g. Row 200-400) and then all the rest print at once, showing the same elap_time.
Then, it takes a very long time for the graph to display and when it does, it shows the full range of x-values up to about 1100-1200. It's the same graph as I saw previously when I printed out the dictionary without any limitations (here I tried to restrict only to key < 251). Previously without limitations, it took less than 1 second. Now, it takes an additional 1-2 minutes after the first if block is complete. I can't explain why it shows 251-1200 or why it takes so long.
At no point should Python be looping through the entire .csv multiple times, should it?
Mark