Python Forum
How to draw the Probability Density Function (PDF) plot regardless of sampe size?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to draw the Probability Density Function (PDF) plot regardless of sampe size?
#1
Hello, I just joined this community to learn Python. I'm not a programmer or developer. So, I'm learning Python from the beginning.

Today my question is how to draw the consistent Probability Density Function (PDF) plot regardless of sample size.

This is my code.

# Library
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

# Data frame
x = np.random.normal(45, 9, 1000)
source = {"Genotype": ["CV1"]*1000, "AGW": x}
df=pd.DataFrame(source)

# Calculating PDF
df_mean = np.mean(df["AGW"])
df_std = np.std(df["AGW"])
pdf = stats.norm.pdf(df["AGW"].sort_values(), df_mean, df_std)

# Graph
plt.plot(df["AGW"].sort_values(), pdf, color="black")
plt.xlim([0,90])
plt.xlabel("Grain weight (mg)", size=12)
plt.ylabel("Frequency", size=12)
plt.grid(True, alpha=0.3, linestyle="--")
plt.show()
[Image: 0yTHB.png]

and this is a graph. However, when I change the sample number from 1000 to 100 such as x = np.random.normal(45, 9, 100), the graph shape is changed.

[Image: l8ocw.png]

This is because lack of sample size cannot represent full normal distribution. If we draw a normal distribution graph in Excel with limited sample size, we can find the same problem. Could you let me know how I can get a consistent normal distribution graph in Python? Regardless of sample size, I'd like to obtain the same shape of normal distribution graph in given mean and standard deviation.

Could you provide some codes for that?

Many thanks!!
Reply
#2
It's been a while since I've used this math (1980's), but I believe that you must apply normalization to the vertical axis.
This post seems to do that: https://stackoverflow.com/a/24920327
amylopectin likes this post
Reply
#3
(Mar-02-2022, 09:56 AM)Larz60+ Wrote: It's been a while since I've used this math (1980's), but I believe that you must apply normalization to the vertical axis.
This post seems to do that: https://stackoverflow.com/a/24920327


Thank you so much for the link. I looked through codes, and I think if the sample size is so small, it's not possible to draw PDF curve. This approach is more correct.

In R, I can draw PDF curve regardless of sample size. In the below R code, Even though sample size is less than 10, the PDF curve is the same because R estimates the full PDF curve in given mean and Stdev. So I was looking for the same functions in Python.

I simply wanted to draw the consistent PDF curve because I wanted to show the full PDF curve in given mean and Stdev, but it's just a estimation. If sample size is so small, I think I can't present the PDF curve. In terms of this, I think Python codes for PDF is more correct. Anyhow if the sample size is more than 30, it seems to follow normal distribution, so I do not worry about limited sample size.

Thank you so much!! I learned a lot!!

AGW<-rnorm(100, mean=45, sd=9)
Genotype<-c(rep("CV1",100))

df<- data.frame (Genotype, AGW)

ggplot () +
  stat_function(data=df, aes(x=AGW), color="Black", size=1, fun = dnorm, 
                args = c(mean = mean(df$AGW), sd = sd(df$AGW))) + 
  scale_x_continuous(breaks = seq(0,90,10),limits = c(0,90)) + 
  scale_y_continuous(breaks = seq(0,0.05,0.01), limits = c(0,0.05)) +
  labs(x="Grain weight (mg)", y="Frequency") +
  theme_grey(base_size=15, base_family="serif")+
  theme(axis.line= element_line(size=0.5, colour="black")) +
  windows(width=6, height=5)
[Image: 4d8vW.jpg]
Reply
#4
Glad to hear it.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Plot function Sedos101 3 811 Aug-23-2023, 02:53 PM
Last Post: deanhystad
  Regarding how to randomizing list with having equal probability radraw 14 2,232 Nov-06-2022, 11:09 PM
Last Post: Pedroski55
  How to plot intraday data of several days in one plot mistermister 3 2,947 Dec-15-2020, 07:43 PM
Last Post: deanhystad
  finding probability of exceding certain threshold Staph 1 1,947 Dec-14-2019, 04:58 AM
Last Post: Larz60+
  size of set vs size of dict zweb 0 2,163 Oct-11-2019, 01:32 AM
Last Post: zweb
  How to plot vertically stacked plot with same x-axis and SriMekala 0 1,954 Jun-12-2019, 03:31 PM
Last Post: SriMekala
  Realized variance and daily probability distribution petergarylee 1 2,470 Jul-06-2018, 02:21 PM
Last Post: buran
  CSV file created is huge in size. How to reduce the size? pramoddsrb 0 10,515 Apr-26-2018, 12:38 AM
Last Post: pramoddsrb
  What default directory does "open" function draw from? Athenaeum 4 3,846 Oct-07-2017, 06:15 AM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020