Understanding standard normal distributions and z-scores
Calculating and interpreting standard error
Creating confidence intervals
Working with the Student’s t-distribution
Today’s focus:
Review more r code
understand α alpha and β beta errors
do more
1 sample t tests
2 sample t tests
# Install packages if needed (uncomment if necessary)# install.packages("readr")# install.packages("tidyverse")# install.packages("car")# install.packages("here")# Load librarieslibrary(patchwork)library(car) # For diagnostic testslibrary(tidyverse) # For data manipulation and visualizationlibrary(readxl)# Load the pine needle data# Use here() function to specify the pathpine_switch_df <-read_excel("data/class_pine needle length switched.xlsx")# Examine the first few rowshead(pine_switch_df)
# A tibble: 6 × 5
group tree_no tree_char side length_mm
<chr> <dbl> <chr> <chr> <dbl>
1 five 1 tree_1 sunny 22.7
2 five 1 tree_1 sunny 21.1
3 five 1 tree_1 sunny 18.6
4 five 1 tree_1 sunny 18.6
5 five 1 tree_1 sunny 21.0
6 five 1 tree_1 sunny 18.9
Before conducting hypothesis tests, we should always explore our data to understand its characteristics.
Let’s calculate summary statistics and create visualizations.
Activity: Calculate basic summary statistics for pine needle length
# YOUR TASK: Calculate summary statistics for pine needle length# Hint: Use summarize() function to calculate mean, sd, n, etc.# Create a summary table for all pine needlespine_summary <- ps_df %>%group_by(side) %>%summarize(mean_length =mean(length_mm, na.rm=TRUE),sd_length =sd(length_mm, na.rm=TRUE),n =sum(!is.na(length_mm)),se_length = sd_length / (n^0.5),t_critical =qt(0.975, df = n -1), # 95% CI uses 0.975 (two-tailed)ci_lower = mean_length - t_critical * se_length,ci_upper = mean_length + t_critical * se_length )print(pine_summary)
# Now calculate summary statistics by wind exposure# YOUR CODE HERE
Part 1: Visualizing the Data
Activity: Create visualizations of pine needle length
Create a histogram and a boxplot to visualize the distribution of pine needle length values.
Effective data visualization helps us understand:
The central tendency
The spread of the data
Potential outliers
Shape of distribution
Your Task
# YOUR TASK: Create a histogram of pine needle length# Hint: Use ggplot() and geom_histogram()# Histogram of all pine needle lengthsggplot(ps_df, aes(x = length_mm, fill = side)) +geom_histogram(binwidth =2) +labs(title ="Distribution of Pine Needle Lengths",x ="Length (mm)",y ="Frequency") +theme_minimal()+facet_wrap("side", ncol =1)
# how can you do this by side to see both plots
What is the Effect size or difference in means?
Practice Exercise: Calculate Effect size
We could also look at the difference in means… some cool code here
# Assuming your dataframe is called dfpine_summary %>%summarize(difference = mean_length[side =="sunny"] -mean_length[side =="shady"])
# A tibble: 1 × 1
difference
<dbl>
1 -1.45
Part 2: Conducting the Two-Sample T-Test
Activity: Conduct a two-sample t-test
Now we can compare the mean pine needle lengths between sunny and shady sides.
H₀: μ₁ = μ₂ (The mean needle lengths are equal)
H₁: μ₁ ≠ μ₂ (The mean needle lengths are different)
Deciding between:
Standard t-test (equal variances)
Welch’s t-test (unequal variances)
Based on our Levene’s test result.
# YOUR TASK: Conduct a two-sample t-test# Use var.equal=TRUE for standard t-test or var.equal=FALSE for Welch's t-test# Standard t-test (if variances are equal)t_test_result <-t.test(length_mm ~ side, data = ps_df, var.equal =TRUE)print("Standard two-sample t-test:")
[1] "Standard two-sample t-test:"
print(t_test_result)
Two Sample t-test
data: length_mm by side
t = 1.1279, df = 14, p-value = 0.2783
alternative hypothesis: true difference in means between group shady and group sunny is not equal to 0
95 percent confidence interval:
-1.309330 4.214005
sample estimates:
mean in group shady mean in group sunny
17.60561 16.15328
What is Power
Statistical power represents the probability of detecting a true effect (rejecting the null hypothesis when it is false). In this case, with a power of 97%, there’s a 97% chance of detecting a true difference of 30 units between the means of the two groups if such a difference actually exists.
A power analysis like this is typically done for one of these purposes:
Before data collection to determine required sample size
After a study to evaluate if the sample size was adequate
To determine the minimum detectable effect size with the given sample
With 97% power, this test has excellent ability to detect the specified effect size. Generally, 80% power is considered acceptable, so 97% indicates a very well-powered study for detecting a difference of 30mm between the groups.
Two-sample t test power calculation
n = 8
delta = 1
sd = 2.575237
sig.level = 0.05
power = 0.1083557
alternative = two.sided
NOTE: n is number in *each* group
Now to make a final plot
Typically we will make a plot that has the mean and standard error on it to represent the data
Your task is to make a more publication quality final plot
This is how you can customize a plot with your own theme
Run this block or ctrl/command + enter on the very start of the code as running inside the function fails.
# Load your custom theme -----theme_class <-function(base_size =14, base_family ="Sans"){theme(# REMOVE PLOT FILL AND GRIDSpanel.background=element_rect(fill ="transparent", colour ="transparent"), plot.background=element_rect(fill="transparent", colour=NA),# removes the grid linespanel.grid.major =element_line(linetype ="blank"),panel.grid.minor =element_line(linetype ="blank"), # X and Y LABELS APPEARANCEaxis.text =element_text(colour ="black"), # this is for the text on ticks and overrides all below if not setaxis.title.x=element_text(size=18, face="bold"), # this is for the x titleaxis.title.y=element_text(size=18, face="bold"), # this is for the y titleaxis.text.x =element_text(size=16, face="bold", angle=0, vjust = .5, hjust=.5), # for X tick labelsaxis.text.y =element_text(size=16, face="bold"),# for Y tick labels# plot.title = element_text(hjust = 0.5, colour="black", size=22, face="bold"),# ADD AXES LINES AND SIZEaxis.ticks =element_line(colour ="black"),axis.line.x =element_line(color="black", linewidth =0.5,linetype ="solid" ),axis.line.y =element_line(color="black", linewidth =0.5, linetype ="solid"),# LEGEND# LEGEND TEXTlegend.text =element_text(colour="black", size =16, face ="bold"),# LEGEND TITLElegend.title =element_text(colour="black", size=18, face="bold"),# LEGEND POSITION AND JUSTIFICATION # legend.justification=c(0.1,1),legend.position="right",# REMOVE BOX BEHIND LEGEND SYMBOLSlegend.key =element_rect(fill ="transparent", colour ="transparent"),# REMOVE LEGEND BOXlegend.background =element_rect(fill ="transparent", colour ="transparent"),# ADD PLOT BOX# panel.border = element_rect(color = "black", fill = NA, size=2),# turn off an element element_blank()panel.border =element_blank(),plot.title =element_text(hjust =0, vjust=2.12),plot.caption =element_text(hjust =0, vjust=1.12))}
Now apply your theme to the plot above
We can add things to an existing plot as we go.. and it is not over writing the plot as we are not using the <-