hey guys,
I am challenged with the following problem:
I am analysing a pretty big dataset that consists of 100*16*3 datapoints. I want to know its probability distribution and have, therefore, performed a Kernel-Density-Estimation. The KDE seems to be bimodal distributed, i.e. the PDF is the addition of two Gaussians. I have plotted my estimated PDF and fitted a bimodal distribution to it - and both appear to be identical!
Now, I want to test this hypothesis. I want to perform a Kolmogorov-Smirnov Test to support my hypothesis that the estimated PDF is bimodal distributed. To do this, I have written the following code (please be gentle here, my coding skills still need practice):
I would be so grateful for any help in this matter!
I am challenged with the following problem:
I am analysing a pretty big dataset that consists of 100*16*3 datapoints. I want to know its probability distribution and have, therefore, performed a Kernel-Density-Estimation. The KDE seems to be bimodal distributed, i.e. the PDF is the addition of two Gaussians. I have plotted my estimated PDF and fitted a bimodal distribution to it - and both appear to be identical!
Now, I want to test this hypothesis. I want to perform a Kolmogorov-Smirnov Test to support my hypothesis that the estimated PDF is bimodal distributed. To do this, I have written the following code (please be gentle here, my coding skills still need practice):
print("\nInformation about Hypothesis test:") print("performed test: Kolmogorov-Smirnov") print("null hypothesis: the estimated pdf is bimodal distributed with mean1=0, sigma1=0.2, mean2=1, sigma2=0.2") alpha = 0.05 # 5% significance level cdf = np.cumsum(pdf)/len(x_grid) hypoDist = 0.6*norm(loc=0, scale=0.2).pdf(x_grid) + 0.3*norm(loc=1, scale=0.2).pdf(x_grid) hypoCdf = np.cumsum(hypoDist)/len(x_grid) d_u = abs(cdf - hypoCdf) # upper limit d_l = np.zeros(len(cdf)) for idx in range(0, len(cdf)): if idx == 0: d_l[idx] = 0 else: d_l[idx] = abs(cdf[idx-1] - hypoCdf[idx]) #lower limit d_crit = np.sqrt(-0.5*np.log(alpha/2))/np.sqrt(100*16*3) #100*16*3 is a magic number that equals the number of samples which we generated the estimated pdf from d_u_max = np.max(d_u) d_l_max = np.max(d_l) if d_u_max > d_crit or d_l_max > d_crit: print("result: null hypothesis rejected") else: print("result: failed to reject null hypothesis")Unfortunately, I am pretty sure there is an error in this code, since it rejects my null hypothesis, although the plots really seem identical! I am thinking: Is my computation of the Cumulative Distribution Function correct?
I would be so grateful for any help in this matter!