lake | mean_length | sd_length | se_length | count |
---|---|---|---|---|
I3 | 265.6061 | 28.30378 | 3.483954 | 66 |
I8 | 362.5980 | 52.33901 | 5.182334 | 102 |
Lecture 05: Probability and Statistical Inference
Lecture 4: Review
- Introduction to histograms or frequency distributions
- Probability Distribution Functions (PDF)
- Descriptive Statistics
Center - mean, median, mode
Spread - range, variance, standard deviation
Lecture 4: Review - Statistical Concepts
- Introduction to histograms or frequency distributions
- Probability Distribution Functions (PDF)
- Descriptive Statistics
Center - mean, median, mode
Spread - range, variance, standard deviation
Lecture 4: Review - Summary Statistics
- Introduction to histograms or frequency distributions
- Probability Distribution Functions (PDF)
- Descriptive Statistics
Center - mean, median, mode
Spread - range, variance, standard deviation
Lecture 5: Probability and Statistical Inference
The goals for today
- Statistical inference fundamentals
- Hypothesis testing principles
- T Distributions
- One sample T Tests
- Two sample T Test
Lecture 5: Confidence intervals
In the more typical case DON’T know the population σ or standard deviation
- estimate it from the samples
- and when sample size is <~30)
- can’t use the standard normal (z) distribution
Instead, we use Student’s t distribution
Lecture 5: Understanding t-distribution
When sample sizes are small, the t-distribution is more appropriate than the normal distribution.
- Similar to normal distribution but with heavier tails
- Shape depends on degrees of freedom (df = n-1)
- With large df (>30), approaches the normal distribution
- Used for:
Small sample sizes
When population standard deviation is unknown
Calculating confidence intervals
Conducting t-tests
Lecture 5: t-distribution Properties
When sample sizes are small, the t-distribution is more appropriate than the normal distribution.
- Similar to normal distribution (1.96 = 2.5% tails) but with heavier tails
- Shape depends on degrees of freedom (df = n-1)
- With large df (>30), approaches the normal distribution
- Used for:
Small sample sizes
When population standard deviation is unknown
Calculating confidence intervals
Conducting t-tests
Practice Exercise 4: Using the t-distribution
Let’s compare confidence intervals using the normal approximation (z) versus the t-distribution for our fish data.
# Calculate CI using both z and t distributions for a smaller subset
<- grayling_df %>%
small_sample filter(lake == "I3") %>%
slice_sample(n = 10)
# Calculate statistics
<- mean(small_sample$length_mm)
sample_mean <- sd(small_sample$length_mm)
sample_sd <- nrow(small_sample)
sample_n <- sample_sd / sqrt(sample_n)
sample_se
# Calculate confidence intervals
<- sample_mean - 1.96 * sample_se
z_ci_lower <- sample_mean + 1.96 * sample_se
z_ci_upper
# For t-distribution, get critical value for 95% CI with df = n-1
<- qt(0.975, df = sample_n - 1)
t_crit <- sample_mean - t_crit * sample_se
t_ci_lower <- sample_mean + t_crit * sample_se
t_ci_upper
# Display results
cat("Mean:", round(sample_mean, 1), "mm\n")
Mean: 279.2 mm
cat("Standard deviation:", round(sample_sd, 2), "mm\n")
Standard deviation: 20.03 mm
cat("Standard error:", round(sample_se, 2), "mm\n")
Standard error: 6.33 mm
cat("95% CI using z:", round(z_ci_lower, 1), "to", round(z_ci_upper, 1), "mm\n")
95% CI using z: 266.8 to 291.6 mm
cat("95% CI using t:", round(t_ci_lower, 1), "to", round(t_ci_upper, 1), "mm\n")
95% CI using t: 264.9 to 293.5 mm
cat("t critical value:", round(t_crit, 3), "vs z critical value: 1.96\n")
t critical value: 2.262 vs z critical value: 1.96
Student’s t-distribution Formula
To calculate CI for sample from “unknown” population:
\(\text{CI} = \bar{y} \pm t \cdot \frac{s}{\sqrt{n}}\)
Where:
- ȳ is sample mean
- 𝑛 is sample size
- s is sample standard deviation
- t t-value corresponding the probability of the CI
- t in t-table for different degrees of freedom (n-1)
Lecture 5: Student’s t-distribution Table
Here is a t-table
- Values of t that correspond to probabilities
- Probabilities listed along top
- Sample dfs are listed in the left-most column
- Probabilities are given for one-tailed and two-tailed “questions”
Lecture 5: One-tailed Questions
One-tailed questions: area of distribution left or (right) of a certain value
- n=20 (df=19) - 90% of the observations found left
- t= 1.328 (10% are outside)
Lecture 5: Two-tailed Questions
Two-tailed questions refer to area between certain values
- n= 20 (df=19), 90% of the observations are between
- t=-1.729 and t=1.729 (10% are outside)
Lecture 5: Calculating CI Example
Let’s calculate CIs again:
Use two-sided test
- \(\text{CI} = \bar{y} \pm t \cdot \frac{s}{\sqrt{n}}\)
- 95% CI Sample A: = 272.8 ± 2.262 * (37.81/(9^0.5)) = 1.650788
- The 95% CI is between 244.3 and 301.3
- “The 95% CI for the population mean from sample A is 272.8 ± 28.5”
Lecture 5: Applications of t-distribution
So:
- Can assess confidence that population mean is within a certain range
- Can use t distribution to ask questions like:
- “What is probability of getting sample with mean = ȳ from population with mean = µ?” (1 sample t-test)
- “What is the probability that two samples came from same population?” (2 sample t-test)
Lecture 5: Single Sample T-Test
We want to test if the mean fish length in I3 differs from 240mm.
Activity: Define hypotheses and identify assumptions
H₀: μ = 240 (The mean fish length in I3 is 240mm)
H₁: μ ≠ 240 (The mean fish length in I3 is not 240mm)
Assumptions for t-test:
- Data is normally distributed
- Observations are independent
- No significant outliers
Assumptions in R - qqplots from car
# Filter for just windward side needles
# YOUR TASK: Test normality of windward pine needle lengths
# QQ Plot
qqPlot(i3_df$length_mm,
main = "QQ Plot for length of Grayling",
ylab = "Sample Quantiles")
[1] 53 35
Statistical Test of Normality
Shapiro-Wilk test
# Shapiro-Wilk test
<- shapiro.test(i3_df$length_mm)
shapiro_test print(shapiro_test)
Shapiro-Wilk normality test
data: i3_df$length_mm
W = 0.91051, p-value = 0.0001623
Checking for Outliers
# Check for outliers using boxplot
# YOUR CODE HERE
%>% ggplot(aes(lake, length_mm))+geom_boxplot() i3_df
Practice Exercise 1: One-Sample t-Test
Let’s perform a one-sample t-test to determine if the mean fish length in I3 Lake differs from 240 mm:
# what is the mean
<- mean(i3_df$length_mm, na.rm=TRUE)
i3_mean cat("Mean:", round(i3_mean, 1), "mm\n")
Mean: 265.6 mm
# Perform a one-sample t-test
<- t.test(i3_df$length_mm, mu = 240)
t_test_result
# View the test results
t_test_result
One Sample t-test
data: i3_df$length_mm
t = 7.3497, df = 65, p-value = 4.17e-10
alternative hypothesis: true mean is not equal to 240
95 percent confidence interval:
258.6481 272.5640
sample estimates:
mean of x
265.6061
Interpret this test result by answering these questions:
- What was the null hypothesis?
- What was the alternative hypothesis?
- What does the p-value tell us?
- Should we reject or fail to reject the null hypothesis at α = 0.05?
- What is the practical interpretation of this result for fish biologists?
Lecture 5: Hypothesis Testing Framework
Hypothesis testing is a systematic way to evaluate research questions using data.
Key components:
- Null hypothesis (Ho): Typically assumes “no effect” or “no difference”
- Alternative hypothesis (Ha): The claim we’re trying to support
- Statistical test: Method for evaluating evidence against H₀
- P-value: Probability of observing our results (or more extreme) if H₀ is true
- Significance level (α): Threshold for rejecting H₀, typically 0.05
Decision rule: Reject Ho if p-value < α == p < 0.05
Lecture 5: Hypothesis Testing - Original Scale
Hypothesis testing is a systematic way to evaluate research questions using data.
Key components:
Null hypothesis (H₀): Typically assumes “no effect” or “no difference”
Alternative hypothesis (Hₐ): The claim we’re trying to support
Statistical test: Method for evaluating evidence against H₀
P-value: Probability of observing our results (or more extreme) if H₀ is true
Significance level (α): Threshold for rejecting H₀, typically 0.05
Decision rule: Reject H₀ if p-value < α
Lecture 5: Interpreting One-Sample T-Test Results
Activity: Interpret the t-test results
- What does the p-value tell us?
- Should we reject or fail to reject the null hypothesis?
How to report this result in a scientific paper:
“A two-tailed, one-sample t-test at α=0.05 showed that the mean pine needle length on the windward side (… mm, SD = …) [was/was not] significantly different from the expected 55 mm, t(…) = …, p = …”
Lecture 5: Two Sample T-Tests Introduction
For example
- what is probability that population X is the same as population Y?
How would you assess this question using what we learned?
This is what we will do with the pine needles…
Lecture 5: Comparing Two Samples
For example
- what is probability that population X is the same as population Y?
How would you assess this question using what we learned?
# Now create a boxplot to visualize the difference in fish lengths between these lakes:
<- read_csv("data/pine_needles.csv")
pine_df
# Create a boxplot comparing the two lakes
<- pine_df %>%
pine_wind_plot ggplot(aes(x = wind, y = length_mm, fill = wind)) +
geom_boxplot() +
labs(title = "Pine Needle Lengths by Wind Exposure",
x = "Position",
y = "Length (mm)",
fill = "Wind Position") +
scale_fill_manual(values = c("lee" = "forestgreen", "wind" = "skyblue"),
labels = c("lee" = "Leeward", "wind" = "Windward"))
pine_wind_plot
# Based on the t-test results and the boxplot
#
# what can you conclude about the fish populations in these two lakes?
Practice Exercise 2: Formulating Hypotheses
For the following research questions about pine needles write the null and alternative hypotheses:
- Are needles on the lee side longer than the needles on the windy side?
What are the hypotheses?
Ho =
Ha =
Lecture 5: Two-Sample T-Test Framework
Now, let’s compare pine needle lengths between windward and leeward sides of trees.
Question: Is there a significant difference in needle length between the windward and leeward sides?
This requires a two-sample t-test.
Two-sample t-test compares means from two independent groups.
\(t = \frac{\bar{x}_1 - \bar{x}_2}{S_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\)
where:
- x̄₁ and x̄₂: These represent the sample means of the two groups you’re comparing
- s²ₚ: This is the pooled variance, calculated as: s²ₚ = [(n₁ - 1)s₁² + (n₂ - 1)s₂²] / (n₁ + n₂ - 2), where s₁² and s₂² are the sample variances of the two groups.
- n₁ and n₂: These are the sample sizes of the two groups.
- √(1/n₁ + 1/n₂): This represents the pooled standard error.
\(t = \frac{SIGNAL}{NOISE}\)
Practice Exercise 3: Summary Statistics
Before conducting the test, we need to understand the data for each group.
You need this and the graph to see what is going on ….
<- pine_df %>% group_summary group_by(wind) %>% summarize( mean_length = mean(length_mm), sd_length = sd(length_mm), n = n(), se_length = sd_length / sqrt(n) ) print(group_summary)
# A tibble: 2 × 5 wind mean_length sd_length n se_length <chr> <dbl> <dbl> <int> <dbl> 1 lee 20.4 2.45 24 0.500 2 wind 14.9 1.91 24 0.390
Visualizing Group Differences
# Create a boxplot comparing the two sides
pine_wind_plot
Practice Exercise 4: Effect Size
We could also look at the difference in means… some cool code here
# Assuming your dataframe is called df
%>%
group_summary summarize(difference = mean_length[wind == "wind"] - mean_length[wind == "lee"])
# A tibble: 1 × 1
difference
<dbl>
1 -5.5
Practice Exercise 5: ggplot Summary Statistics
GGplot also has code to make the mean and standard error plots we are interested in along whit a lot of others
# Assuming your dataframe is called df
<- ggplot(pine_df, aes(x = wind, y = length_mm, color = wind)) +
pine_mean_se_plot stat_summary(fun = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
labs(title = "Mean Pine Needle Length by Wind Exposure",
x = "Wind Exposure",
y = "Mean Length (mm)") +
coord_cartesian(ylim = c(0,25))+
scale_color_manual(values = c("lee" = "forestgreen", "wind" = "skyblue"),
labels = c("lee" = "Leeward", "wind" = "Windward"))+
theme_classic()
pine_mean_se_plot
Lecture 5: Testing Assumptions for Two-Sample T-Test
For a two-sample t-test, we need to check:
- Normality within each group
- Equal variances between groups (for standard t-test)
- Independent observations
If assumptions are violated:
- Welch’s t-test (unequal variances)
- Non-parametric alternatives (Mann-Whitney U test)
Practice Exercise 6: Creating Group Data
qqplots
Note you need to test each groups separately…
# Assuming your dataframe is called df
pine_mean_se_plot
Practice Exercise 7: Separate Group Data
qqplots
Note you need to test each groups separately…
# how do you make separate dataframes to do this on?
# Separate data by groups
<- pine_df %>% filter(wind == "wind")
windward_data <- pine_df %>% filter(wind == "lee")
leeward_data head(leeward_data)
# A tibble: 6 × 6
date group n_s wind tree_no length_mm
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 3/20/25 cephalopods n lee 1 20
2 3/20/25 cephalopods n lee 1 21
3 3/20/25 cephalopods n lee 1 23
4 3/20/25 cephalopods n lee 1 25
5 3/20/25 cephalopods n lee 1 21
6 3/20/25 cephalopods n lee 1 16
Practice Exercise 8: QQ Plot for Windward Data
qqplots
Note you need to test each groups separately…
# QQ Plot for windward group
qqPlot(windward_data$length_mm,
main = "QQ Plot for Windward Pine Needles",
ylab = "Sample Quantiles")
[1] 21 22
Practice Exercise 9: Shapiro-Wilk Test
Shapiro-Wilk test
Note you need to test each groups separately…
# Shapiro-Wilk test for windward group
<- shapiro.test(windward_data$length_mm)
shapiro_windward print("Shapiro-Wilk test for windward data:")
[1] "Shapiro-Wilk test for windward data:"
print(shapiro_windward)
Shapiro-Wilk normality test
data: windward_data$length_mm
W = 0.96062, p-value = 0.451
Practice Exercise 10: QQ Plot for Leeward Data
qqplots
Note you need to test each groups separately…
# You can also test the leeward group
# QQ Plot for leeward group
qqPlot(leeward_data$length_mm,
main = "QQ Plot for Leeward Pine Needles",
ylab = "Sample Quantiles")
[1] 4 16
Practice Exercise 11: Shapiro-Wilk for Leeward
Shapiro-Wilk test
Note you need to test each groups separately…
# Shapiro-Wilk test for leeward group
<- shapiro.test(leeward_data$length_mm)
shapiro_lee print("Shapiro-Wilk test for leeward data:")
[1] "Shapiro-Wilk test for leeward data:"
print(shapiro_lee)
Shapiro-Wilk normality test
data: leeward_data$length_mm
W = 0.95477, p-value = 0.3425
Practice Exercise 12: Combined Normality Test
There are always a lot of ways to do this in R
# there are always two ways
# Test for normality using Shapiro-Wilk test for each wind group
# All in one pipeline using tidyverse approach
<- pine_df %>%
normality_results group_by(wind) %>%
summarize(
shapiro_stat = shapiro.test(length_mm)$statistic,
shapiro_p_value = shapiro.test(length_mm)$p.value,
normal_distribution = if_else(shapiro_p_value > 0.05, "Normal", "Non-normal")
)
# Print the results
print(normality_results)
# A tibble: 2 × 4
wind shapiro_stat shapiro_p_value normal_distribution
<chr> <dbl> <dbl> <chr>
1 lee 0.955 0.343 Normal
2 wind 0.961 0.451 Normal
Practice Exercise 13: Test Equal Variances
Levenes test can be done on the original dataframe
# Method 1: Using car package's leveneTest
# This is often preferred as it's more robust to departures from normality
<- leveneTest(length_mm ~ wind, data = pine_df)
levene_result print("Levene's Test for Homogeneity of Variance:")
[1] "Levene's Test for Homogeneity of Variance:"
print(levene_result)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 1.2004 0.2789
46
Lecture 5: Conducting the Two-Sample T-Test
Now we can compare the mean pine needle lengths between windward and leeward sides.
Ho: μ₁ = μ₂ (The mean needle lengths are equal)
Ha: μ₁ ≠ μ₂ (The mean needle lengths are different)
Deciding between:
Standard t-test (equal variances)
Welch’s t-test (unequal variances)
Note the Levenes Test should be NOT SIGNIFICANT - What is the null hypothesis
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 1.2004 0.2789
46
Lecture 5: Running the Two-Sample T-Test
Now we can do a two sample TTEST
Calculate t-statistic manually (optional)
YOUR CODE HERE:
t = (mean1 - mean2) / sqrt((s1^2/n1) + (s2^2/n2))
# YOUR TASK: Conduct a two-sample t-test
# Use var.equal=TRUE for standard t-test or var.equal=FALSE for Welch's t-test
# Standard t-test (if variances are equal)
<- t.test(length_mm ~ wind, data = pine_df, var.equal = TRUE)
t_test_result print("Standard two-sample t-test:")
[1] "Standard two-sample t-test:"
print(t_test_result)
Two Sample t-test
data: length_mm by wind
t = 8.6792, df = 46, p-value = 3.01e-11
alternative hypothesis: true difference in means between group lee and group wind is not equal to 0
95 percent confidence interval:
4.224437 6.775563
sample estimates:
mean in group lee mean in group wind
20.41667 14.91667
# Welch's t-test (if variances are unequal)
# YOUR CODE HERE
Lecture 5: Interpreting Two-Sample T-Test Results
Interpret the results of the two-sample t-test
What can we conclude about the needle lengths on windward vs. leeward sides?
How to report this result in a scientific paper:
“A two-tailed, two-sample t-test at α=0.05 showed [a significant/no significant] difference in needle length between windward (M = …, SD = …) and leeward (M = …, SD = …) sides of pine trees, t(…) = …, p = ….”
Lecture 5: Visualizing the Results
Interpret the results of the two-sample t-test
What can we conclude about the needle lengths on windward vs. leeward sides?
How to report this result in a scientific paper:
“A two-tailed, two-sample t-test at α=0.05 showed [a significant/no significant] difference in needle length between windward (M = …, SD = …) and leeward (M = …, SD = …) sides of pine trees, t(…) = …, p = ….”
Lecture 5: Assumptions of Parametric Tests
Common assumptions for t-tests:
- Normality: Data comes from normally distributed populations
- Equal variances (for two-sample tests)
- Independence: Observations are independent
- No outliers: Extreme values can influence results
What can we do if our data violates these assumptions?
Alternatives when assumptions are violated:
- Data transformation (log, square root, etc.)
- Non-parametric tests
- Robust statistical methods
Lecture 5: Summary and Conclusions
In this activity, we’ve:
- Formulated hypotheses about pine needle length
- Tested assumptions for parametric tests
- Conducted one-sample and two-sample t-tests
- Visualized data using appropriate methods
- Learned how to interpret and report t-test results
Key takeaways:
- Always check assumptions before conducting tests
- Visualize your data to understand patterns
- Report results comprehensively
- Consider alternatives when assumptions are violated