Central Limit Theorem
The Central Limit Theorem is a fundamental statistical principle that describes the behavior of the average of a large number of identically distributed, independent random variables.
Definition
The CLT states that if you have a population with any shape distribution, as long as the population has a finite standard deviation and mean , the sampling distribution of the sample mean approaches a normal distribution with a mean () and a variance as the sample size increases.
Mathematical Expression
The theorem can be expressed mathematically as follows:
where:
- is the sample mean of the samples.
- is the population mean.
- is the population variance.
- is the sample size.
Significance
- Normalization: Regardless of the population distribution shape, the distribution of the sample mean will be approximately normal if the sample size is sufficiently large.
- Predictability: It allows for making inferences about population means from sample means.
- Error Reduction: As the sample size increases, the standard error (SE) decreases, leading to more precise estimates.
Applications
- Polling and Surveys: Estimating population parameters such as voting intentions or consumer preferences from samples.
- Quality Control: Monitoring manufacturing processes where parameters like weight or volume are measured and controlled.
- Finance: Estimating the mean returns of different financial instruments to optimize investment portfolios.
Limitations
- Small Samples: The CLT may not hold well for small samples, especially if the population distribution is heavily skewed.
- Dependent Observations: The theorem assumes that the samples are independent. In cases where this assumption doesn't hold, the CLT may not apply.
Example Code in Python
To illustrate the CLT, consider the following Python code that simulates the distribution of the sample mean:
import numpy as np
import matplotlib.pyplot as plt
# Define population parameters
population_mean = 50
population_std_dev = 10
sample_size = 30
num_samples = 1000
# Generate sample means
sample_means = [np.mean(np.random.normal(population_mean, population_std_dev, sample_size)) for _ in range(num_samples)]
# Plot distribution of sample means
plt.hist(sample_means, bins=30, color='blue', edgecolor='black', alpha=0.7)
plt.title('Distribution of Sample Means')
plt.xlabel('Sample Mean')
plt.ylabel('Frequency')
plt.show()
This code generates multiple samples from a normal distribution, computes their means, and plots the histogram of these means to illustrate that they tend to follow a normal distribution, as predicted by the Central Limit Theorem.