Python Kolmogorov Test - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Python Kolmogorov Test (/thread-6465.html) Pages:
1
2
|
Python Kolmogorov Test - asahdkhaled - Nov-23-2017 Important: I asked that question also on Cross-Validated, but they refuse an answer because it didnt seem to be mathematical (but I think it is). So if there is a problem with my question, feel free to ask and I would like to give more details :) Can someone maybe tell me how to format the code here The T Symbol isnt working for me I have a column with continous values. I want to find out, which distribution describes my column the best. If my column is f.e.normal distributed For me, there are 4 different approaches, shown in the code: Lets assume that output.values() hold my values which I want to use for the kstest.... var, std, mean, length = np.var(output.values()), np.std(output.values()), np.mean(output.values()), len(output.values()) mini, maxi = min(output.values()), max(output.values()) a, b = (min(output.values()) - mean) / std, (max(output.values()) - mean) / std uniform, norm2 = np.random.uniform(mini, maxi, length), np.random.normal(mean, std, length) loc, scale = n.fit(output.values()) n = norm(loc=loc, scale=scale) #possibility 1: ks_2samp print ks_2samp(output.values(),norm2) #possibility 2: kstest vs n.cdf() print kstest(output.values(), n.cdf) #possibility 3:kstest vs. 'norm' print kstest(output.values(), 'norm') #possibility 4: kstest vs. 'norm' with parameters print kstest(output.values(),'norm', (mean,std)) Which of the possible approaches do you think would be correct (also for maybe other distributions, like uniform?). Or what is your best way to do it? My general question: How to determine the distribution of my column, by sampling one and doing "ks_2samp" or testing the column directly versus a specific distribution with "kstest"?? Here is the formatted code... var, std, mean, length = np.var(output.values()), np.std(output.values()), np.mean(output.values()), len(output.values()) mini, maxi = min(output.values()), max(output.values()) a, b = (min(output.values()) - mean) / std, (max(output.values()) - mean) / std uniform, norm2 = np.random.uniform(mini, maxi, length), np.random.normal(mean, std, length) loc, scale = n.fit(output.values()) n = norm(loc=loc, scale=scale) #possibility 1: ks_2samp print ks_2samp(output.values(),norm2) #possibility 2: kstest vs n.cdf() print kstest(output.values(), n.cdf) #possibility 3:kstest vs. 'norm' print kstest(output.values(), 'norm') #possibility 4: kstest vs. 'norm' with parameters print kstest(output.values(),'norm', (mean,std)) RE: Python Kolmogorov Test - heiner55 - Nov-25-2017 It is difficult to help you, because we cannot test your program without sample data. RE: Python Kolmogorov Test - asahdkhaled - Nov-25-2017 Ok here the same code, with just example data :) data= [36, 22, 24, 21, 22, 18, 14, 24, 28, 8, 22, 16, 16, 26, 17, 24, 24, 14, 15, 24, 21, 20, 19, 17, 13, 13, 17, 30, 17, 11, 45, 15, 19, 21, 15, 13, 14, 16, 25, 21] var, std, mean, length = np.var(data), np.std(data), np.mean(data), len(data) mini, maxi = min(data), max(data) a, b = (min(data) - mean) / std, (max(data) - mean) / std uniform, norm2 = np.random.uniform(mini, maxi, length), np.random.normal(mean, std, length) loc, scale = n.fit(data) n = norm(loc=loc, scale=scale) #possibility 1: ks_2samp print ks_2samp(data,norm2) #possibility 2: kstest vs n.cdf() print kstest(data, n.cdf) #possibility 3:kstest vs. 'norm' print kstest(data, 'norm') #possibility 4: kstest vs. 'norm' with parameters print kstest(data,'norm', (mean,std)) RE: Python Kolmogorov Test - Larz60+ - Nov-25-2017 loc, scale = n.fit(data) n = norm(loc=loc, scale=scale)No value specified for n in first step RE: Python Kolmogorov Test - asahdkhaled - Nov-25-2017 from scipy.stats import norm as n import numpy as np from scipy.stats import * data= [36, 22, 24, 21, 22, 18, 14, 24, 28, 8, 22, 16, 16, 26, 17, 24, 24, 14, 15, 24, 21, 20, 19, 17, 13, 13, 17, 30, 17, 11, 45, 15, 19, 21, 15, 13, 14, 16, 25, 21] var, std, mean, length = np.var(data), np.std(data), np.mean(data), len(data) mini, maxi = min(data), max(data) a, b = (min(data) - mean) / std, (max(data) - mean) / std uniform, norm2 = np.random.uniform(mini, maxi, length), np.random.normal(mean, std, length) loc, scale = n.fit(data) n_array = norm(loc=loc, scale=scale) #possibility 1: ks_2samp print ks_2samp(data,norm2) #possibility 2: kstest vs n.cdf() print kstest(data, n_array.cdf) #possibility 3:kstest vs. 'norm' print kstest(data, 'norm') #possibility 4: kstest vs. 'norm' with parameters print kstest(data,'norm', (mean,std))Ah yeah sorry. The first n ist just the norm package from scipy. The second n is just a variable. I renamed it to n_array RE: Python Kolmogorov Test - Larz60+ - Nov-25-2017 It still must be defined, and have an initial value RE: Python Kolmogorov Test - asahdkhaled - Nov-25-2017 I dont get what you mean. The code is working, n_array get filled by the new norm distribution. My question is just, how to apply the KS Test the best. There 4 possibilities, which one is the best RE: Python Kolmogorov Test - Larz60+ - Nov-25-2017 It fails when I try to run on: loc, scale = n.fit(data)n is not defined (at least not in the snippet you provide) at this point. RE: Python Kolmogorov Test - asahdkhaled - Nov-26-2017 ok, strange. For me its working like that. What python Version do you have? RE: Python Kolmogorov Test - heiner55 - Nov-26-2017 Your programs runs well on my PC: #!/usr/bin/python3 from scipy.stats import norm as n import numpy as np from scipy.stats import * data= [36, 22, 24, 21, 22, 18, 14, 24, 28, 8, 22, 16, 16, 26, 17, 24, 24, 14, 15, 24, 21, 20, 19, 17, 13, 13, 17, 30, 17, 11, 45, 15, 19, 21, 15, 13, 14, 16, 25, 21] var, std, mean, length = np.var(data), np.std(data), np.mean(data), len(data) mini, maxi = min(data), max(data) a, b = (min(data) - mean) / std, (max(data) - mean) / std uniform, norm2 = np.random.uniform(mini, maxi, length), np.random.normal(mean, std, length) loc, scale = n.fit(data) n_array = norm(loc=loc, scale=scale) #possibility 1: ks_2samp print(ks_2samp(data,norm2)) #possibility 2: kstest vs n.cdf() print(kstest(data, n_array.cdf)) #possibility 3:kstest vs. 'norm' print(kstest(data, 'norm')) #possibility 4: kstest vs. 'norm' with parameters print(kstest(data,'norm', (mean,std)))
|