Python Forum
Alternative approach to iterate numerous linear regressions with xlsx data?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Alternative approach to iterate numerous linear regressions with xlsx data?
#1
I'm a self-trained, novice programmer, and need guidance to determine a more time-efficient way to analyze my data. I have a large dataset (200+ column sets, and each column set has two columns of x and y data, with roughly 166 rows), and want to complete linear regression tests on the x and y data within each set. The linear regression is an 11-row moving average, thus there are linear regression statistics (slope) for each row. This means a linear regression slope must be calculated for each day, and the adjacent 10 rows, for up to 166 linear regression per column set. When the difference between consecutive linear regression slopes is greater than or equal to a threshold (0.3), I'd like to identify and record the first occurrence. After recording/presenting the occurrence, I'd like to exit the analysis, and move on to the next column set to repeat the process.

The code below identifies the difference in linear regression slopes between rows for just one column set, and took roughly 7 minutes to run (exiting in row 235). At this rate, I'll be running code for 24+ hours. I'm sure there's a more efficient way, perhaps outside of openpyxl to complete this many linear regressions and test for a threshold difference - I just have no idea what it is. I only started using openpyxl because the data were in .xlsx files when I received them, and Python because I used it during an undergraduate programming course.

Any recommendations on code structure, alternative analysis methods or data format conversions would be helpful. If it's not already clear, my knowledge of coding is quite low. I greatly appreciate your patience in writing out what might otherwise be 'obvious' things everyone already knows.

from __future__ import print_function
from scipy import stats
from openpyxl import load_workbook

#########################################################################

wb2 = load_workbook(filename='data1.xlsx',read_only=True)
print(wb2.sheetnames)
ws2 = wb2['summary']

mean = 15 # x data column reference
stddev = mean + 36 # y data column reference

for j in range(164, 330):
    i=j
    x = [ws2.cell(row=i,column=mean).value for i in range(i,i+11)] #11 row x data
    y = [ws2.cell(row=i,column=stddev).value for i in range(i,i+11)] #11 row y data
    x2 = [ws2.cell(row=i,column=mean).value for i in range(i+1,i+12)] #11 row x data (starting at next row down)
    y2 = [ws2.cell(row=i,column=stddev).value for i in range(i+1,i+12)] #11 row y data (starting at next row down)
    slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
    slope2, intercept2, r_value2, p_value2, std_err2 = stats.linregress(x2,y2)
    thresholdslope=abs(slope2-slope)

    
    if thresholdslope>=0.3:
        threshold_value=ws2.cell(row=i+6,column=mean)
        threshold_date=ws2.cell(row=i+6,column=3)
        print("threshold slope=",thresholdslope)
        print("i=",i)
        print("i+6=",i+6)
        print("threshold value (i+6)",threshold_value.value)
        print("threshold date=",threshold_date.value)

        break
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Thoughts on how to approach a project birdieman 13 13,814 Feb-28-2023, 11:42 AM
Last Post: get2sid
  What is the best approach to training a final model after cross validation? amjass12 0 1,848 Jul-21-2021, 10:15 AM
Last Post: amjass12
  extract tabe from nested dictionary, find generic approach,clever way tonycat 4 2,708 Aug-31-2020, 09:22 AM
Last Post: tonycat
  How to build linear regression by implementing Gradient Descent using only linear alg PythonSpeaker 1 2,157 Dec-01-2019, 05:35 PM
Last Post: Larz60+
Music Iterate over data and sum Madame32 8 3,566 Oct-14-2019, 05:54 PM
Last Post: Madame32
  Looking for Eval alternative scionsamurai 2 2,893 Jul-16-2019, 02:41 AM
Last Post: scionsamurai
  [pandas]How to liner fit time series data and get linear fit equation and r square Sri 5 3,755 Apr-04-2019, 12:00 PM
Last Post: Sri
  Write data into existing Excel (xlsx) file with multiple sheets BNB 1 15,306 Jun-01-2017, 04:22 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020