Python Forum

Hello, I just joined this community to learn Python. I'm not a programmer or developer. So, I'm learning Python from the beginning.

Today my question is how to draw the consistent Probability Density Function (PDF) plot regardless of sample size.

This is my code.

# Library
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

# Data frame
x = np.random.normal(45, 9, 1000)
source = {"Genotype": ["CV1"]*1000, "AGW": x}
df=pd.DataFrame(source)

# Calculating PDF
df_mean = np.mean(df["AGW"])
df_std = np.std(df["AGW"])
pdf = stats.norm.pdf(df["AGW"].sort_values(), df_mean, df_std)

# Graph
plt.plot(df["AGW"].sort_values(), pdf, color="black")
plt.xlim([0,90])
plt.xlabel("Grain weight (mg)", size=12)
plt.ylabel("Frequency", size=12)
plt.grid(True, alpha=0.3, linestyle="--")
plt.show()

and this is a graph. However, when I change the sample number from 1000 to 100 such as x = np.random.normal(45, 9, 100), the graph shape is changed.

[Image: l8ocw.png]

This is because lack of sample size cannot represent full normal distribution. If we draw a normal distribution graph in Excel with limited sample size, we can find the same problem. Could you let me know how I can get a consistent normal distribution graph in Python? Regardless of sample size, I'd like to obtain the same shape of normal distribution graph in given mean and standard deviation.

Could you provide some codes for that?

Many thanks!!

It's been a while since I've used this math (1980's), but I believe that you must apply normalization to the vertical axis.
This post seems to do that: https://stackoverflow.com/a/24920327

(Mar-02-2022, 09:56 AM)Larz60+ Wrote: [ -> ]It's been a while since I've used this math (1980's), but I believe that you must apply normalization to the vertical axis.
This post seems to do that: https://stackoverflow.com/a/24920327

Thank you so much for the link. I looked through codes, and I think if the sample size is so small, it's not possible to draw PDF curve. This approach is more correct.

In R, I can draw PDF curve regardless of sample size. In the below R code, Even though sample size is less than 10, the PDF curve is the same because R estimates the full PDF curve in given mean and Stdev. So I was looking for the same functions in Python.

I simply wanted to draw the consistent PDF curve because I wanted to show the full PDF curve in given mean and Stdev, but it's just a estimation. If sample size is so small, I think I can't present the PDF curve. In terms of this, I think Python codes for PDF is more correct. Anyhow if the sample size is more than 30, it seems to follow normal distribution, so I do not worry about limited sample size.

Thank you so much!! I learned a lot!!

AGW<-rnorm(100, mean=45, sd=9)
Genotype<-c(rep("CV1",100))

df<- data.frame (Genotype, AGW)

ggplot () +
  stat_function(data=df, aes(x=AGW), color="Black", size=1, fun = dnorm, 
                args = c(mean = mean(df$AGW), sd = sd(df$AGW))) + 
  scale_x_continuous(breaks = seq(0,90,10),limits = c(0,90)) + 
  scale_y_continuous(breaks = seq(0,0.05,0.01), limits = c(0,0.05)) +
  labs(x="Grain weight (mg)", y="Frequency") +
  theme_grey(base_size=15, base_family="serif")+
  theme(axis.line= element_line(size=0.5, colour="black")) +
  windows(width=6, height=5)

Glad to hear it.

amylopectin

Larz60+

amylopectin

Larz60+