Alternative approach to iterate numerous linear regressions with xlsx data?

john_538 · Apr-07-2018, 10:15 PM

I'm a self-trained, novice programmer, and need guidance to determine a more time-efficient way to analyze my data. I have a large dataset (200+ column sets, and each column set has two columns of x and y data, with roughly 166 rows), and want to complete linear regression tests on the x and y data within each set. The linear regression is an 11-row moving average, thus there are linear regression statistics (slope) for each row. This means a linear regression slope must be calculated for each day, and the adjacent 10 rows, for up to 166 linear regression per column set. When the difference between consecutive linear regression slopes is greater than or equal to a threshold (0.3), I'd like to identify and record the first occurrence. After recording/presenting the occurrence, I'd like to exit the analysis, and move on to the next column set to repeat the process.

The code below identifies the difference in linear regression slopes between rows for just one column set, and took roughly 7 minutes to run (exiting in row 235). At this rate, I'll be running code for 24+ hours. I'm sure there's a more efficient way, perhaps outside of openpyxl to complete this many linear regressions and test for a threshold difference - I just have no idea what it is. I only started using openpyxl because the data were in .xlsx files when I received them, and Python because I used it during an undergraduate programming course.

Any recommendations on code structure, alternative analysis methods or data format conversions would be helpful. If it's not already clear, my knowledge of coding is quite low. I greatly appreciate your patience in writing out what might otherwise be 'obvious' things everyone already knows.

from __future__ import print_function
from scipy import stats
from openpyxl import load_workbook

#########################################################################

wb2 = load_workbook(filename='data1.xlsx',read_only=True)
print(wb2.sheetnames)
ws2 = wb2['summary']

mean = 15 # x data column reference
stddev = mean + 36 # y data column reference

for j in range(164, 330):
    i=j
    x = [ws2.cell(row=i,column=mean).value for i in range(i,i+11)] #11 row x data
    y = [ws2.cell(row=i,column=stddev).value for i in range(i,i+11)] #11 row y data
    x2 = [ws2.cell(row=i,column=mean).value for i in range(i+1,i+12)] #11 row x data (starting at next row down)
    y2 = [ws2.cell(row=i,column=stddev).value for i in range(i+1,i+12)] #11 row y data (starting at next row down)
    slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
    slope2, intercept2, r_value2, p_value2, std_err2 = stats.linregress(x2,y2)
    thresholdslope=abs(slope2-slope)

    
    if thresholdslope>=0.3:
        threshold_value=ws2.cell(row=i+6,column=mean)
        threshold_date=ws2.cell(row=i+6,column=3)
        print("threshold slope=",thresholdslope)
        print("i=",i)
        print("i+6=",i+6)
        print("threshold value (i+6)",threshold_value.value)
        print("threshold date=",threshold_date.value)

        break

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Thoughts on how to approach a project	birdieman	13	13,994	Feb-28-2023, 11:42 AM Last Post: get2sid
	What is the best approach to training a final model after cross validation?	amjass12	0	1,875	Jul-21-2021, 10:15 AM Last Post: amjass12
	extract tabe from nested dictionary, find generic approach,clever way	tonycat	4	2,752	Aug-31-2020, 09:22 AM Last Post: tonycat
	How to build linear regression by implementing Gradient Descent using only linear alg	PythonSpeaker	1	2,199	Dec-01-2019, 05:35 PM Last Post: Larz60+
	Iterate over data and sum	Madame32	8	3,647	Oct-14-2019, 05:54 PM Last Post: Madame32
	Looking for Eval alternative	scionsamurai	2	2,957	Jul-16-2019, 02:41 AM Last Post: scionsamurai
	[pandas]How to liner fit time series data and get linear fit equation and r square	Sri	5	3,835	Apr-04-2019, 12:00 PM Last Post: Sri
	Write data into existing Excel (xlsx) file with multiple sheets	BNB	1	15,359	Jun-01-2017, 04:22 PM Last Post: Larz60+

Alternative approach to iterate numerous linear regressions with xlsx data?

User Panel Messages

Announcements