Bottom Page

• 0 Vote(s) - 0 Average
• 1
• 2
• 3
• 4
• 5
 Alternative approach to iterate numerous linear regressions with xlsx data? john_538 Unladen Swallow Posts: 1 Threads: 1 Joined: Apr 2018 Reputation: 0 Likes received: 0 #1 Apr-07-2018, 10:15 PM I'm a self-trained, novice programmer, and need guidance to determine a more time-efficient way to analyze my data. I have a large dataset (200+ column sets, and each column set has two columns of x and y data, with roughly 166 rows), and want to complete linear regression tests on the x and y data within each set. The linear regression is an 11-row moving average, thus there are linear regression statistics (slope) for each row. This means a linear regression slope must be calculated for each day, and the adjacent 10 rows, for up to 166 linear regression per column set. When the difference between consecutive linear regression slopes is greater than or equal to a threshold (0.3), I'd like to identify and record the first occurrence. After recording/presenting the occurrence, I'd like to exit the analysis, and move on to the next column set to repeat the process. The code below identifies the difference in linear regression slopes between rows for just one column set, and took roughly 7 minutes to run (exiting in row 235). At this rate, I'll be running code for 24+ hours. I'm sure there's a more efficient way, perhaps outside of openpyxl to complete this many linear regressions and test for a threshold difference - I just have no idea what it is. I only started using openpyxl because the data were in .xlsx files when I received them, and Python because I used it during an undergraduate programming course. Any recommendations on code structure, alternative analysis methods or data format conversions would be helpful. If it's not already clear, my knowledge of coding is quite low. I greatly appreciate your patience in writing out what might otherwise be 'obvious' things everyone already knows. ```from __future__ import print_function from scipy import stats from openpyxl import load_workbook ######################################################################### wb2 = load_workbook(filename='data1.xlsx',read_only=True) print(wb2.sheetnames) ws2 = wb2['summary'] mean = 15 # x data column reference stddev = mean + 36 # y data column reference for j in range(164, 330): i=j x = [ws2.cell(row=i,column=mean).value for i in range(i,i+11)] #11 row x data y = [ws2.cell(row=i,column=stddev).value for i in range(i,i+11)] #11 row y data x2 = [ws2.cell(row=i,column=mean).value for i in range(i+1,i+12)] #11 row x data (starting at next row down) y2 = [ws2.cell(row=i,column=stddev).value for i in range(i+1,i+12)] #11 row y data (starting at next row down) slope, intercept, r_value, p_value, std_err = stats.linregress(x,y) slope2, intercept2, r_value2, p_value2, std_err2 = stats.linregress(x2,y2) thresholdslope=abs(slope2-slope) if thresholdslope>=0.3: threshold_value=ws2.cell(row=i+6,column=mean) threshold_date=ws2.cell(row=i+6,column=3) print("threshold slope=",thresholdslope) print("i=",i) print("i+6=",i+6) print("threshold value (i+6)",threshold_value.value) print("threshold date=",threshold_date.value) break ``` « Next Oldest | Next Newest »

Top Page

 Possibly Related Threads... Thread Author Replies Views Last Post How to build linear regression by implementing Gradient Descent using only linear alg PythonSpeaker 1 172 Dec-01-2019, 05:35 PM Last Post: Larz60+ Iterate over data and sum Madame32 8 785 Oct-14-2019, 05:54 PM Last Post: Madame32 Looking for Eval alternative scionsamurai 2 391 Jul-16-2019, 02:41 AM Last Post: scionsamurai [pandas]How to liner fit time series data and get linear fit equation and r square Sri 5 746 Apr-04-2019, 12:00 PM Last Post: Sri Thoughts on how to approach a project birdieman 10 4,084 Jul-09-2018, 09:02 AM Last Post: alexblack Write data into existing Excel (xlsx) file with multiple sheets BNB 1 10,673 Jun-01-2017, 04:22 PM Last Post: Larz60+

Forum Jump:

Users browsing this thread: 1 Guest(s)