# A tibble: 2 × 5
side mean_length sd_length se_length count
<chr> <dbl> <dbl> <dbl> <int>
1 shady 17.6 2.51 0.886 8
2 sunny 16.2 2.64 0.934 8
Lecture 05: Probability and Statistical Inference
Lecture 4: Review
- Introduction to histograms or frequency distributions
- Probability Distribution Functions (PDF)
- Z scores and T scores
- Tests of means using T-Tests
one sample
two sample
- Tests of means using T-Tests
- one sample - is the sample mean different from a hypothesized mean?
- two sample - are the sample means from two samples the same or different?
- for Two sample T-tests the df = n1+ n2 -2 = 8+8-2=14
Lecture 5: Probability and Statistical Inference
The goals for today
- Statistical inference fundamentals
- Hypothesis testing principles
- T Distributions
- One sample T Test
- Two sample T Test
- Paired T Test
- Assumption tests
Lecture 5: Probability and Statistical Inference
The goals for today
- Statistical inference fundamentals
- Hypothesis testing principles
- T Distributions
- One sample T Test
- Two sample T Test
- Paired T Test
Lecture 5: One-tailed Questions
One-tailed questions: area of distribution left or (right) of a certain value for a one sample test
- n=8 (df=7) - 95% of the observations found left
- t= 1.895 (5% are outside)
xxxx
Lecture 5: Two-tailed Questions
Two-tailed questions refer to area between certain values
- n= 8 (df=7), 95% of the observations are between
- t=-2.365 and t=2.365 (2.5% are outside on each side)
- One tailed was t= 1.895 (5% are outside)
Lecture 5: Calculating CI Example
Let’s calculate CIs again:
Use two-sided test
- \(\text{CI} = \bar{y} \pm t \cdot \frac{s}{\sqrt{n}}\)
- 95% CI Sample A: = 17.6 ± 2.365 * (2.51/(8^0.5)) = +/- 2.098746
- The 95% CI is between 15.50 and 19.70
- “The 95% CI for the population mean from sample A is 17.6 ± 2.1
Lecture 5: Applications of t-distribution
So:
- Can assess confidence that population mean is within a certain range
- Can use t distribution to ask questions like:
- “What is probability of getting sample with mean = ȳ
from population with mean = µ?” (1 sample t-test) - “What is the probability that two samples came from
the same population?” (2 sample t-test)
- “What is probability of getting sample with mean = ȳ
Lecture 5: One Sample T-Test
We want to test if the mean needle length on one side differs from 15mm.
Activity: Define hypotheses and identify assumptions
H₀: μ = 15 (The mean needle length on shade side is 15mm)
H₁: μ ≠ 15 (The mean needle length on shade side is not 240mm)
Assumptions for t-test:
- Data is normally distributed
- Observations are independent
- No significant outliers
Assumptions in R - qqplots from car
# YOUR TASK: Test normality of all pine needle lengths
# QQ Plot
qqPlot(ps_df$length_mm,
main = "QQ Plot for length of pine needles",
ylab = "Sample Quantiles")
[1] 8 11
Statistical Test of Normality
Shapiro-Wilk test
# Shapiro-Wilk test
shapiro.test(ps_df$length_mm)
Shapiro-Wilk normality test
data: ps_df$length_mm
W = 0.92754, p-value = 0.2228
Checking for Outliers
# Check for outliers using boxplot
# YOUR CODE HERE
# Create a boxplot comparing the two lakes
<- ps_df %>%
shady_sunny_plot ggplot(aes(x = side, y = length_mm, fill = side)) +
geom_boxplot() +
labs(
x = "side",
y = "Length (mm)",
fill = "side")
shady_sunny_plot
Practice Exercise 1: One-Sample t-Test
Let’s perform a one-sample t-test to determine if the mean needle length on the shady side differs from 15 mm:
# what is the mean
<- mean(ps_shady_df$length_mm, na.rm = TRUE)
ps_shade_mean cat("Mean:", round(ps_shade_mean, 1), "mm\n")
Mean: 17.6 mm
# Perform a one-sample t-test
<- t.test(ps_shady_df$length_mm, mu = 15)
t_test_result t_test_result
One Sample t-test
data: ps_shady_df$length_mm
t = 2.9414, df = 7, p-value = 0.02167
alternative hypothesis: true mean is not equal to 15
95 percent confidence interval:
15.51092 19.70030
sample estimates:
mean of x
17.60561
Interpret this test result by answering these questions:
- What was the null hypothesis?
- What was the alternative hypothesis?
- What does the p-value tell us?
- Should we reject or fail to reject the null hypothesis at α = 0.05?
- What is the practical interpretation of this result for botanists?
Lecture 5: Hypothesis Testing Framework
Hypothesis testing is a systematic way to evaluate research questions using data.
Key components:
- Null hypothesis (Ho): Typically assumes “no effect” or “no difference”
- Alternative hypothesis (Ha): The claim we’re trying to support
- Statistical test: Method for evaluating evidence against H₀
- P-value: Probability of observing our results (or more extreme) if H₀ is true
- Significance level (α): Threshold for rejecting H₀, typically 0.05
Decision rule: Reject Ho if p-value less than α or shorthand p < 0.05
Lecture 5: Hypothesis Testing
Hypothesis testing is a systematic way to evaluate research questions using data.
Key components:
- Null hypothesis (H₀): Typically assumes “no effect” or “no difference”
- Alternative hypothesis (Hₐ): The claim we’re trying to support
- Statistical test: Method for evaluating evidence against H₀
- P-value: Probability of observing our results (or more extreme) if H₀ is true
- Significance level (α): Threshold for rejecting H₀, typically 0.05
Decision rule: Reject H₀ if p-value < α
Lecture 5: Interpreting One-Sample T-Test Results
Activity: Interpret the t-test results
- What does the p-value tell us?
- Should we reject or fail to reject the null hypothesis?
How to report this result in a scientific paper:
“A one-sample t-test at α=0.05 showed that the mean needle length (… mm, SD = …) [was/was not] significantly different from the expected 15 mm, t(…) = …, p = …”
Lecture 5: Two Sample T-Tests Introduction
For example
- what is probability that population X is the same as population Y?
- How would you assess this question using what we learned?
- This is what we will do with the needle length again…
Lecture 5: Comparing Two Samples
For example
- what is probability that population X is the same as population Y?
How would you assess this question using what we learned?
shady_sunny_plot
# Based on the t-test results and the boxplot
#
# what can you conclude about the needle lenght on the two sides?
Practice Exercise 2: Formulating Hypotheses
For the following research questions about needle lengths write the null and alternative hypotheses:
- Are needle lengths on shady and sunny sides different?
What are the hypotheses?
Ho =
Ha =
Lecture 5: Two-Sample T-Test Framework
Now, let’s compare needles lengths from the two sides
Question: Is there a significant difference in needle length between the sides?
This requires a two-sample t-test.
Two-sample t-test compares means from two independent groups.
\(t = \frac{\bar{x}_1 - \bar{x}_2}{S_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\)
where:
- x̄₁ and x̄₂: These represent the sample means of the two groups you’re comparing
- s²ₚ: This is the pooled variance, calculated as: s²ₚ = [(n₁ - 1)s₁² + (n₂ - 1)s₂²] / (n₁ + n₂ - 2), where s₁² and s₂² are the sample variances of the two groups.
- n₁ and n₂: These are the sample sizes of the two groups.
- √(1/n₁ + 1/n₂): This represents the pooled standard error.
\(t = \frac{SIGNAL}{NOISE}\)
Practice Exercise 3: Summary Statistics
Before conducting the test, we need to understand the data for each group.
You need this and the graph to see what is going on ….
<- ps_df %>% group_summary group_by(side) %>% summarize( mean_length = mean(length_mm), sd_length = sd(length_mm), n = n(), se_length = sd_length / sqrt(n) ) group_summary
# A tibble: 2 × 5 side mean_length sd_length n se_length <chr> <dbl> <dbl> <int> <dbl> 1 shady 17.6 2.51 8 0.886 2 sunny 16.2 2.64 8 0.934
Practice Exercise 4: Effect Size
We could also look at the difference in means… some cool code here
# Assuming your dataframe is called df
%>%
group_summary summarize(difference = mean_length[side == "shady"] - mean_length[side == "sunny"])
# A tibble: 1 × 1
difference
<dbl>
1 1.45
Practice Exercise 5: ggplot Summary Statistics
GGplot also has code to make the mean and standard error plots we are interested in along with a lot of others
# Assuming your dataframe is called df
<- ggplot(ps_df, aes(x = side, y = length_mm, color = side)) +
needle_mean_se_plot stat_summary(fun = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
labs(
x = "side",
y = "Mean Length (mm)") +
theme_classic()
needle_mean_se_plot
Lecture 5: Testing Assumptions for Two-Sample T-Test
For a two-sample t-test, we need to check:
- Normality within each group
- Equal variances between groups (for standard t-test)
- Independent observations
If assumptions are violated:
- Welch’s t-test (unequal variances)
- Non-parametric alternatives (Mann-Whitney U test)
Practice Exercise 6: Separate Group Data
Note you need to test each groups separately…
# how do you make separate dataframes to do this on?
# Separate data by groups
head(ps_shady_df)
# A tibble: 6 × 5
# Groups: group, tree_no, tree_char [6]
group tree_no tree_char side length_mm
<chr> <dbl> <chr> <chr> <dbl>
1 big_fat_fecund_female_fish 2 tree_2 shady 15.4
2 bill 3 tree_3 shady 16.7
3 ciabatta 5 tree_5 shady 19.1
4 fake_data 8 tree_8 shady 17.4
5 five 1 tree_1 shady 20.3
6 moose_walkin 7 tree_7 shady 20.7
head(ps_sunny_df)
# A tibble: 6 × 5
# Groups: group, tree_no, tree_char [6]
group tree_no tree_char side length_mm
<chr> <dbl> <chr> <chr> <dbl>
1 big_fat_fecund_female_fish 2 tree_2 sunny 13.2
2 bill 3 tree_3 sunny 16.0
3 ciabatta 5 tree_5 sunny 17.7
4 fake_data 8 tree_8 sunny 13.0
5 five 1 tree_1 sunny 19.9
6 moose_walkin 7 tree_7 sunny 18.4
Practice Exercise 8: Combined Normality Test
There are always a lot of ways to do this in R
# there are always two ways
# Test for normality using Shapiro-Wilk test for each wind group
# All in one pipeline using tidyverse approach
<- ps_df %>%
normality_results group_by(side) %>%
summarize(
shapiro_stat = shapiro.test(length_mm)$statistic,
shapiro_p_value = shapiro.test(length_mm)$p.value,
normal_distribution = if_else(shapiro_p_value > 0.05, "Normal", "Non-normal"))
normality_results
# A tibble: 2 × 4
side shapiro_stat shapiro_p_value normal_distribution
<chr> <dbl> <dbl> <chr>
1 shady 0.966 0.868 Normal
2 sunny 0.900 0.289 Normal
Practice Exercise 13: Test Equal Variances
Levenes test can be done on the original dataframe
Note: the Levenes Test should be NOT SIGNIFICANT - What is the null hypothesis
# Method 1: Using car package's leveneTest
# This is often preferred as it's more robust to departures from normality
<- leveneTest(length_mm ~ side, data = ps_df)
levene_result print("Levene's Test for Homogeneity of Variance:")
[1] "Levene's Test for Homogeneity of Variance:"
print(levene_result)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.2062 0.6567
14
Lecture 5: Conducting the Two-Sample T-Test
Now we can compare the mean needle lengths between shady and sunny sides.
Ho: μ₁ = μ₂ (The needle lengths do not differ)
Ha: μ₁ ≠ μ₂ (The mean needle lengths differ - direction is not specified)
Calculate t-statistic manually (optional) - YOUR CODE HERE:
t = (mean1 - mean2) / sqrt((s1^2/n1) + (s2^2/n2))
Deciding between:
- Standard t-test (equal variances)
- Welch’s t-test (unequal variances)
# YOUR TASK: Conduct a two-sample t-test
# Use var.equal=TRUE for standard t-test or var.equal=FALSE for Welch's t-test
# Standard t-test (if variances are equal)
<- t.test(length_mm ~ side, data = ps_df, var.equal = TRUE)
t_test_result print("Standard two-sample t-test:")
[1] "Standard two-sample t-test:"
print(t_test_result)
Two Sample t-test
data: length_mm by side
t = 1.1279, df = 14, p-value = 0.2783
alternative hypothesis: true difference in means between group shady and group sunny is not equal to 0
95 percent confidence interval:
-1.309330 4.214005
sample estimates:
mean in group shady mean in group sunny
17.60561 16.15328
Lecture 5: Conducting the Two-Sample T-Test
Now we can compare the mean needle lengths between shady and sunny sides.
Ho: μ₁ = μ₂ (The needle lengths do not differ)
Ha: μ₁ ≠ μ₂ (The mean needle lengths differ - direction is not specified)
Calculate t-statistic manually (optional) - YOUR CODE HERE:
t = (mean1 - mean2) / sqrt((s1^2/n1) + (s2^2/n2))
Deciding between:
- Standard t-test (equal variances)
- Welches t-test (unequal variances)
# YOUR TASK: Conduct a two-sample t-test
# Use var.equal=TRUE for standard t-test or var.equal=FALSE for Welch's t-test
# Standard t-test (if variances are equal)
<- t.test(length_mm ~ side, data = ps_df, var.equal = FALSE)
t_test_result print("Welches two-sample t-test:")
[1] "Welches two-sample t-test:"
print(t_test_result)
Welch Two Sample t-test
data: length_mm by side
t = 1.1279, df = 13.96, p-value = 0.2784
alternative hypothesis: true difference in means between group shady and group sunny is not equal to 0
95 percent confidence interval:
-1.310069 4.214743
sample estimates:
mean in group shady mean in group sunny
17.60561 16.15328
Lecture 5: Difference between a Two-Sample T and Welch’s T Test
Standard t-test (Student’s t-test)
- Assumes equal variances between the two groups being compared
- Uses a pooled variance estimate that combines data from both group
- Has higher statistical power when the equal variance assumption is met
- Degrees of freedom = n₁ + n₂ - 2
Welch’s t-test
- Does not assume equal variances between groups (also called the “unequal variances t-test”)
- Uses separate variance estimates for each group
- More robust when group variances are different
- Uses a more complex degrees of freedom calculation (Welch-Satterthwaite equation) and decimal!!!
- Degrees of freedom are typically non-integer and usually smaller than the standard t-test
Lecture 5: Interpreting Two-Sample T-Test Results
Interpret the results of the two-sample t-test
What can we conclude about the needle lengths on sunny vs shady sides?
How to report this result in a scientific paper:
“A two-tailed, two-sample t-test at α=0.05 showed [a significant/no significant] difference in needle length between sunny (M = …, SD = …) and shady (M = …, SD = …) sides of pine trees, t(…) = …, p = ….”
Lecture 5: Now what does a paired T test tell us
Paired t-test:
Compares two measurements from the same subjects or matched pairs Tests whether the mean difference between paired observations equals zero Examples: before/after measurements on the same people, left vs right measurements, matched case-control studies Uses the differences between pairs as the data points Generally more powerful because it controls for individual variation
# YOUR TASK: Conduct a two-sample t-test
# Use var.equal=TRUE for standard t-test or var.equal=FALSE for Welch's t-test
<- ps_df %>%
ps_wide_df pivot_wider(
names_from = "side",
values_from = length_mm
)
# Standard t-test (if variances are equal)
<- t.test(ps_wide_df$sunny, ps_wide_df$shady, paired = TRUE)
paired_t_test_result print("Standard two-sample t-test:")
[1] "Standard two-sample t-test:"
print(paired_t_test_result)
Paired t-test
data: ps_wide_df$sunny and ps_wide_df$shady
t = -2.7818, df = 7, p-value = 0.02723
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-2.6868652 -0.2178092
sample estimates:
mean difference
-1.452337
# Welch's t-test (if variances are unequal)
# YOUR CODE HERE
Lecture 5: What is going on??
Note that thee is a lot of variation within trees but the trend is the same
ps_plot
Lecture 5: Assumptions of Parametric Tests
Common assumptions for t-tests:
- Normality: Data comes from normally distributed populations
- Equal variances (for two-sample tests)
- Independence: Observations are independent
- No outliers: Extreme values can influence results
What can we do if our data violates these assumptions?
Alternatives when assumptions are violated:
- Data transformation (log, square root, etc.)
- Non-parametric tests
- Robust statistical methods
Lecture 5: Summary and Conclusions
In this activity, we’ve:
- Formulated hypotheses about pine needle length
- Tested assumptions for parametric tests
- Conducted one-sample and two-sample t-tests
- Visualized data using appropriate methods
- Learned how to interpret and report t-test results
Key takeaways:
- Always check assumptions before conducting tests
- Visualize your data to understand patterns
- Report results comprehensively
- Consider alternatives when assumptions are violated - non parametric tests…