Python Forum
Determining Beta distribution parameters (alpha, beta) using CDF
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Determining Beta distribution parameters (alpha, beta) using CDF
#1
Hi all,
I am trying to find Beta distribution parameters (alpha, beta) by fitting a CDF curve that goes through two points. Let's say points are (x1,p1) & (x2,p2) where x1,x2 represent points on x-axis; and p1,p2 represent probability points on y-axis. Instead of x-axis scale (0-1); I am using a scale of 1-100.

My goal is to input the points and get the values of alpha, beta parameters.

Your help is much appreciated.
Reply
#2
Hi,
Note, if want to fit cdf parameters by data, rv_continous base class supplied with helper function .fit
that finds maximum likelihood estimation of distribution parameters.
You case slightly differs from that. It is about classical curve fitting, that could be easily solved using SciPy facilities.

from scipy.stats import beta
import numpy as np
from scipy.optimize import minimize


def loss_func(x, point1=(20, 0.2), point2=(50, 0.5)):  
    a, b = x[0], x[1]
    # you probably will need to tweak scale value for your needs. 
    return (beta.cdf(point1[0], a, b, scale=100) - point1[-1])**2 +(beta.cdf(point2[0], a, b, scale=100) - point2[-1])**2 
    
# So, we want to fit our cdf to pass through point1 and point2. 
minimize(loss_func, x0=np.array([0.5, 0.5]))
I got the following result:

a, b  = 1.00000141, 1.00000565
beta.cdf(50, a, b, scale=100)
Output:
0.5000014694684272
beta.cdf(20, a, b, scale=100)
Output:
0.20000055474589082
Ok, we got a quite good estimation.
Reply
#3
Hello Scidam,
I can't express my gratitude enough to you. Thank you so much for your help.
I am sorry for the late response.

I have a question about the line 12: minimize(loss_func, x0=np.array([0.5, 0.5]))
How this line of code fits cdf to pass through two points and also why array values are ([0.5, 0.5]))

Thank you so much. Please help me.
Reply
#4
Hello Scidam,
I also have questions about the return function. If you could explain to them that would be very helpful.
LINE 9: return (beta.cdf(point1[0], a, b, scale=100) - point1[-1])**2 +(beta.cdf(point2[0], a, b, scale=100) - point2[-1])**2

Q1: Why you've subtracted point1[-1]/ point2[-1] array values.
Q2: Why we have to square the values.
Q3: What is the purpose of adding the two values.

Thank you so much for your help.
Take care
Reply
#5
(Apr-24-2019, 11:21 PM)fr2019 Wrote: Why you've subtracted point1[-1]/ point2[-1] array values.

So, we are minimizing loss_func. loss_func is greater or equal 0. It never turns less 0.
The best case is when loss_func = 0. If loss_func = 0, then beta.cdf(point1[0], a, b, scale=100) - point1[-1])**2 = 0 and (beta.cdf(point2[0], a, b, scale=100) - point2[-1])**2 =0. That means
beta.cdf(point1[0], a, b, scale=100) = point1[-1] and beta.cdf(point2[0], a, b, scale=100) = point2[-1], i.e. the pdf-beta is passing through both points: point1 and point2. [-1]denotes last element of array (list, tuple in Python). Since len(point1) = 2, point1[-1] = point1[1] (the same is true for point2). The last element of array of len 2 is its second element. You can replace [-1] with [1] ([1] is second element of the array: remember about 0-based indexing scheme in Python) and nothing will change.

(Apr-24-2019, 11:21 PM)fr2019 Wrote: Q2: Why we have to square the values.
This is needed to require both conditions beta.cdf(point1[0], a, b, scale=100) = point1[-1]
and beta.cdf(point2[0], a, b, scale=100) = point2[-1] to meet at the same time. This turns the problem
of finding solution of two nonlinear equation to optimization (minimization) problem.

(Apr-24-2019, 07:23 PM)fr2019 Wrote: How this line of code fits cdf to pass through two points and also why array values are ([0.5, 0.5]))
(Apr-24-2019, 11:21 PM)fr2019 Wrote: What is the purpose of adding the two values.

This is starting point for iterative process of finding minima of the loss_func. These values could
be chosen quite arbitrary, but it is better if those are chosen close to exact (unknown) a,b-parameter
values. I chose (0.5,0.5), you can try (1.0, 2.0), for example.
Reply
#6
Hello Scidam,
Thank you so much for your great explanation. This helped me a lot. Please keep in touch.

Thanks again. Take care.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Weight Distribution 11drk9 11 513 Mar-13-2024, 06:08 AM
Last Post: Pedroski55
  alpha advantage api question jacksfrustration 1 446 Oct-09-2023, 03:39 PM
Last Post: Larz60+
Information Best distribution method inovermyhead100 0 530 Jul-19-2023, 07:39 AM
Last Post: inovermyhead100
  WARNING: Ignoring invalid distribution kucingkembar 1 24,305 Sep-02-2022, 06:49 AM
Last Post: snippsat
  How do I use a whl puython distribution? barryjo 6 1,694 Aug-15-2022, 03:00 AM
Last Post: barryjo
  Pyinstaller distribution file seems too large hammer 4 2,632 Mar-31-2022, 02:33 PM
Last Post: snippsat
  TypeError: max_value() missing 2 required positional arguments: 'alpha' and 'beta' Anldra12 2 4,167 May-15-2021, 04:15 PM
Last Post: Anldra12
  Need learning resources for Python distribution gretchenfrage 2 2,091 Nov-12-2020, 06:42 PM
Last Post: gretchenfrage
  Coin Toss - Distribution lasek723 6 3,007 Oct-04-2020, 01:36 PM
Last Post: deanhystad
  Python 3.9 alpha how to install psutil? lmh1 10 8,674 Apr-12-2020, 11:25 AM
Last Post: lmh1

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020