Covered
Useful hypotheses:
Rely on specifying
Ho is the hypothesis of “no effect”
Ha (research or alternate hypothesis)
Together Ho and Ha encompass all possible outcomes:
Ho: µ=0, Ha: µ ≠ 0
Ho: µ = 35, Ha: µ ≠ 35
Ho: µ1 = µ2, Ha: µ1 ≠ µ2
Ho: µ > 0, Ha: µ ≤ 0
can be directional mean is greater than 0 or mean is not equal or less than 0
this becomes a one sided test as it predicts only one direction
Statistical tests assess likelihood of the null hypothesis being true
Hypothesis tests
Statistical test results:
p = 0.3 means that if I repeated the study 100 times, I would get this (or more extreme) result due to chance 30 times
p = 0.03 means that if I repeated the study 100 times, I would get this (or more extreme) result due to chance 3 times
Which p-value suggests Ho likely false?
At what point reject Ho?
p < 0.05 conventional “significance threshold” (α = alpha or p value)
p < 0.05 means: if Ho is true and we repeated the study 100 times
Statistical test results:
Traditionally α=0.05 is used as a cut off for rejecting null hypothesis
There is nothing magical about 0.05 - actual p-values need to be reported - also need to decide prior to study
p-value range | Interpretation |
---|---|
P > 0.10 | No evidence against Ho - data appear consistent with Ho |
0.05 < P < 0.10 | Weak evidence against the Ho in favor of Ha |
0.01 < P < 0.05 | Moderate evidence against Ho in favor of Ha |
0.001 < P < 0.01 | Strong evidence against Ho in favor of Ha |
P < 0.001 | Very strong evidence against Ho in favor of Ha |
A p-value is the probability of observing the sample result (or something more extreme) if the null hypothesis is true.
p-value is NOT the probability that H₀ is true
p-value is NOT the probability that results occurred by chance
Statistical significance ≠ practical significance
one sample TTEST
two sample TTEST
Fisher:
p-value as informal measure of discrepancy between data and Ho
“If p is between 0.1 and 0.9 there is certainly no reason to suspect the hypothesis tested. If it is below 0.02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at .05 …”
the dotted line is the alpha = 0.05
When making decisions based on hypothesis tests, two types of errors can occur:
Type I Error (False Positive) - Rejecting H₀ when it’s actually true - Probability = α (significance level) - “Finding an effect that isn’t real”
Type II Error (False Negative) - Failing to reject H₀ when it’s actually false - Probability = β - “Missing an effect that is real”
Statistical Power = 1 - β - Probability of correctly rejecting a false H₀ - Increases with: - Larger sample size - Larger effect size - Lower variability - Higher α level
The red area represents the power in the experiment
The farther apart the means the lower the beta error is… or you have higher power.
When making decisions based on hypothesis tests, two types of errors can occur:
Type I Error (False Positive) - Rejecting H₀ when it’s actually true - Probability = α (significance level) - “Finding an effect that isn’t real”
Type II Error (False Negative) - Failing to reject H₀ when it’s actually false - Probability = β - “Missing an effect that is real”
Statistical Power = 1 - β - Probability of correctly rejecting a false H₀ - Increases with: - Larger sample size - Larger effect size - Lower variability - Higher α level
The red area represents the power in the experiment
The farther apart the means the lower the beta error is… or you have higher power.
Practice Exercise 6: Interpreting P-values and Errors
Given the following scenarios, identify whether a Type I or Type II error might have occurred:
pooled standard deviation
Cohen’s d
delta = 0.6741298
: The standardized effect size (Cohen’s d)library(car)
library(patchwork)
library(tidyverse)
grayling_df <- read_csv("data/gray_I3_I8.csv")
i3_df <- grayling_df %>% filter(lake=="I3")
i8_df <- grayling_df %>% filter(lake=="I8")
# Calculate power for detecting a 30 mm difference
n1 <- nrow(i3_df)
n2 <- nrow(i8_df)
sd_pooled <- sqrt((var(i3_df$length_mm) * (n1-1) +
var(i8_df$length_mm) * (n2-1)) /
(n1 + n2 - 2))
# Calculate power
effect_size <- 30 / sd_pooled # Cohen's d
df <- n1 + n2 - 2
alpha <- 0.05
power <- power.t.test(n = min(n1, n2),
delta = effect_size,
sd = 1, # Using standardized effect size
sig.level = alpha,
type = "two.sample",
alternative = "two.sided")
# Display results
power
Two-sample t test power calculation
n = 66
delta = 0.6741298
sd = 1
sig.level = 0.05
power = 0.9702076
alternative = two.sided
NOTE: n is number in *each* group
Statistical power represents the probability of detecting a true effect (rejecting the null hypothesis when it is false). In this case, with a power of 97%, there’s a 97% chance of detecting a true difference of 30 units between the means of the two groups if such a difference actually exists.
A power analysis like this is typically done for one of these purposes:
With 97% power, this test has excellent ability to detect the specified effect size. Generally, 80% power is considered acceptable, so 97% indicates a very well-powered study for detecting a difference of 30mm between the groups.
Error bars are graphical representations of the variability of data that show:
Common types of error bars:
When interpreting graphs:
Pseudoreplication occurs when measurements that are not independent are analyzed as if they were independent.
Examples of pseudoreplication:
How to avoid pseudoreplication:
The statistical concepts we’ve covered today are essential for fisheries biologists and ecologists:
Real-world applications:
Key concepts covered: