Lecture 18 - Multivariate Community Analysis

Author

Bill Perry

Introduction: Why NMDS?

The Problem with Ecological Community Data

  • Ecological community data presents unique challenges for traditional statistical methods
  • When sampling species abundances across sites
  • encounter data that is sparse (most species are absent from most sites) - many zeros
  • exhibits non-linear relationships between species and environmental gradients
  • has no clear distributional form
  • Principal Components Analysis (PCA) works beautifully for continuous environmental variables
    • assumes linear relationships
    • can leverage correlations between variables
    • for species abundance data - assumptions often fail spectacularly
  • NMDS offers a fundamentally different approach:
    • instead of trying to preserve exact distances or maximize variance explained

    • focuses on preserving the rank order of similarities between samples

    • makes it remarkably robust for ecological data.

Lecture 18: NMDS - Multivariate Community Analysis

Data - built in R and other places

Code Description of the variable
das Distance from the source [km]
alt Altitude [m a.s.l.]
pen Slope [per thousand]
deb Mean minimum discharge [m3s-1]
pH pH of water
dur Calcium concentration (hardness) [mgL-1]
pho Phosphate concentration [mgL-1]
nit Nitrate concentration [mgL-1]
amn Ammonium concentration [mgL-1]
oxy Dissolved oxygen [mgL-1]
dbo Biological oxygen demand [mgL-1]
Code Common Name (English) Family
Chb Bullhead Cottidae
Tru Brown trout Salmonidae
Vai Minnow Cyprinidae
Loc Stone loach Balitoridae
Omb Grayling Salmonidae
Bla Souffia Cyprinidae
Hot Common nase Cyprinidae
Tox Sofie Cyprinidae
Van Dace Cyprinidae
Che Chub Cyprinidae
Bar Barbel Cyprinidae
Lot Burbot Lotidae
Spi Spirlin Cyprinidae
Gou Gudgeon Cyprinidae
Bro Pike Esocidae
Per Perch Percidae
Tan Tench Cyprinidae
Gar Roach Cyprinidae
Lam Brook lamprey petromizonidae

Today’s Data: The Doubs River

About the Doubs River Dataset

  • Study System:
    • Doubs River, France
    • 29 sites from upstream to downstream
    • Collected by Verneaux (1973)
    • Classic community ecology dataset
  • Two Datasets:
    1. Environmental: 11 water quality variables
    2. Species: 27 fish species abundances (0-5 scale)
  • Research Questions:
    • How do fish communities change along the river?
    • Which environmental factors drive community composition?
    • Are there distinct community types?

Non-metric Multidimensional Scaling (NMDS)

  • What is NMDS?
    • Overview of NMDS:
      • Purpose: Visualize dissimilarity between objects in 2D/3D space
      • Goal: Preserve rank order of distances, not exact distances
      • Method: Iterative re-positioning to minimize stress
      • Output: Ordination plot showing relationships between samples
    • Key Differences from PCA:
      • Uses dissimilarity matrices (not covariance)

      • Non-parametric (rank-based)

      • Better for non-linear ecological relationships

      • No assumption about data distribution

  • Before running NMDS - analyzing dissimilarity matrices not raw abundance data
    • dissimilarity matrix contains values for every pair of samples = “how different”

    • typically range from 0 (identical) to 1 (completely different).

Why Bray-Curtis for Community Data?

  • Bray-Curtis dissimilarity is standard choice for community data because it handles zeros appropriately (joint absences don’t contribute to similarity)
    • bounded between 0 and 1

    • weights by abundance (not just presence/absence) - ignores joint absences entirely

    • Similarity is based only on what’s actually present and shared between sites

    • performs well with sparse data

The formula is:

\[BC_{jk} = \frac{\sum_i |y_{ij} - y_{ik}|}{\sum_i (y_{ij} + y_{ik})}\]

Where \(y_{ij}\) is the abundance of species \(i\) at site \(j\)

  • numerator sums up the absolute differences in abundance for each species between two sites
  • denominator sums up the total abundance at both sites
  • Bray-Curtis asks: “Of all the individuals at these two sites, what proportion would need to be moved to make the sites identical?”

The NMDS Algorithm

  • What NMDS Actually Does
    • NMDS tries to place samples in a low-dimensional space (usually 2D)
    • so rank order of distances in ordination matches rank order of dissimilarities in original matrix
  • Not your PCA which preserves exact distances (or variance)
  • NMDS only cares about relative ordering:
    • if site A is more similar to site B than to site C in the original data
    • then A should be closer to B than to C in the ordination

How NMDS Works step by step

  • The NMDS Algorithm:
    1. Calculate dissimilarity matrix between all pairs of sites
    2. Start with random configuration of points in 2D space
    3. Calculate stress = difference between original distances and ordination distances
    4. Move points to reduce stress
    5. Repeat until stress cannot be reduced further
    6. Try multiple random starts to avoid local minima
  • Stress Values:
    • < 0.1: Good representation
    • 0.1-0.2: Acceptable
    • > 0.2: Poor representation

Understanding Stress

  • What is Stress?
    • Stress is key diagnostic for NMDS quality
    • measures disagreement between original dissimilarities and distances in ordination
    • stress is calculated by first ranking all pairwise dissimilarities
    • then calculating corresponding distances in the ordination
    • fitting a monotonic regression line to the relationship between these
    • finally computing sum of squared deviations from this line.
  • Lower stress means ordination better represents original relationships
  • The Shepard Diagram: Visualizing Stress
    • Shepard diagram (or stress plot) is primary diagnostic tool for NMDS
    • shows relationship between original dissimilarities (x-axis) and ordination distances (y-axis) with a fitted monotonic line
    • If NMDS is working well - points should closely follow the line
    • Scatter indicates loss of information.

Interpreting the Shepard Diagram

  • Shepard diagram
    • each point represents one pair of sites
    • x-axis shows their original Bray-Curtis dissimilarity
    • y-axis shows their distance in the 2D ordination
    • stepped line is a monotonic regression, = can only go up/stay flat not down
    • because NMDS preserves rank order - “stress” = scatter around line
  • Insights from our Shepard diagram:
    • minimal scatter
  • Stress Guidelines
    • Clarke (1993) widely-used guidelines for interpreting stress
      • <0.05 excellent representation w/no misinterpretation
      • 0.05 - 0.10 good ordination - no real risk of false inferences
      • 0.10 - 0.20 usable ordination but details at lower end interpreted cautiously
      • 0.20 and 0.30 used with caution/potentially higher dimensions considered
      • >0.30 = random placement.

Choosing the Number of Dimensions

  • How do we know 2 dimensions is enough?
  • create a scree plot showing stress for different numbers of dimensions:

NMDS Assumptions

  • NMDS Assumptions:
  • Few assumptions:
    • Samples are independent
    • Dissimilarity measure is appropriate
    • Sufficient data for stable solution
  • No assumptions about:
    • Data distribution
    • Linear relationships
    • Homoscedasticity
    • Normality
  • This makes NMDS very robust for ecological data!

NMDS in Practice

Running NMDS on Fish Communities

# Prepare species data (remove site column)
spe_matrix <- doubs_spe %>% select(-site, -reach) %>% as.matrix()
# Add small constant to avoid issues with zeros
spe_matrix <- spe_matrix + 0.1
# Run NMDS with multiple random starts
set.seed(123)
fish_nmds <- metaMDS(spe_matrix, distance = "bray",  # Bray-Curtis dissimilarity
                     k = 2,   trymax = 100)            # 2 dimensions + Maximum tries
## Run 0 stress 0.07449349 
## Run 1 stress 0.1201175 
## Run 2 stress 0.1201749 
## Run 3 stress 0.1195174 
## Run 4 stress 0.1201175 
## Run 5 stress 0.1468615 
## Run 6 stress 0.1395204 
## Run 7 stress 0.09450578 
## Run 8 stress 0.07460885 
## ... Procrustes: rmse 0.02069815  max resid 0.09861179 
## Run 9 stress 0.1273318 
## Run 10 stress 0.1200159 
## Run 11 stress 0.1273318 
## Run 12 stress 0.07449349 
## ... Procrustes: rmse 4.373442e-06  max resid 1.595028e-05 
## ... Similar to previous best
## Run 13 stress 0.09450578 
## Run 14 stress 0.140665 
## Run 15 stress 0.07459993 
## ... Procrustes: rmse 0.02014078  max resid 0.09796964 
## Run 16 stress 0.1262615 
## Run 17 stress 0.07460885 
## ... Procrustes: rmse 0.02069823  max resid 0.09861196 
## Run 18 stress 0.09134568 
## Run 19 stress 0.07460885 
## ... Procrustes: rmse 0.02069807  max resid 0.09861147 
## Run 20 stress 0.07460885 
## ... Procrustes: rmse 0.02069863  max resid 0.09861444 
## *** Best solution repeated 1 times
cat("Final stress:", round(fish_nmds$stress, 3))
## Final stress: 0.074

NMDS in Practice

  • Code Explanation:
  • fish_nmds <- metaMDS(spe_matrix, distance = "bray", # Bray-Curtis dissimilarity
    • k = 2, trymax = 100) # 2 dimensions + Maximum tries
  • metaMDS(): Main NMDS function from vegan package
    • distance = "bray": Bray-Curtis dissimilarity (best for abundances)
    • k = 2: Two dimensions for plotting
    • trymax = 100: Try 100 random starting configurations
    • Small constant added to avoid zero-distance issues

Interpreting NMDS Output

  • Understanding the Output:

    1. Stress = 0.074 : This is “excellent”
    2. Our 2D representation preserves the original distances well
    3. Convergent solutions: Found stable solutions from multiple tries
    4. Two dimensions: Axis 1 and Axis 2 have no inherent meaning (unlike PCA components)
    5. No eigenvalues: NMDS doesn’t calculate variance explained per axis

Creating Enhanced NMDS Plots

  • What the NMDS Shows:
    • Clear separation between river reaches

    • Gradient pattern from upper to lower reaches

    • Sites within each reach are more similar to each other than to other reaches

Different plot using the actual hull definitions from vegan

PERMANOVA: Testing Multivariate Differences

  • What is PERMANOVA?
    • PERMANOVA (Permutational Multivariate ANOVA):
      • Purpose: Test whether groups have different multivariate centroids
      • Method: ANOVA using distance matrices instead of raw data
      • Advantage: No distributional assumptions
      • Permutation: Creates null distribution by randomly reassigning group labels
    • Think of it as:
      • Multivariate version of ANOVA

      • Uses distances between samples instead of means

      • Tests: “Are the centers of these groups different in multivariate space?” .

How PERMANOVA Works

  • PERMANOVA Algorithm:
    1. Calculate distance matrix between all pairs of samples
    2. Calculate F-statistic based on distances:
      • Between-group sum of squares
      • Within-group sum of squares
    3. Permute group labels randomly (e.g., 999 times)
    4. Recalculate F-statistic for each permutation
    5. Compare observed F to permutation distribution
    6. P-value = proportion of permuted F ≥ observed F
  • Why permutation?
    • No assumptions about data distribution

    • Creates empirical null distribution

    • Accounts for complex dependency structures

PERMANOVA Hypotheses

  • Research Question: “Do fish communities differ significantly between river reaches?”
  • Statistical Hypotheses:
    • H₀: The centroids of fish communities are the same across all river reaches (Upper = Middle = Lower)
    • H₁: At least one river reach has a different community centroid
  • In practical terms: -
    • H₀: River position doesn’t affect community composition

    • H₁: River position significantly affects community composition

PERMANOVA Assumptions

  • PERMANOVA Assumptions:
  • Required:
    1. Independence: Samples are independent
    2. Exchangeability: Under H₀, observations are exchangeable between groups
    3. Homogeneity of dispersion: Groups have similar multivariate spread
  • Not required:
    • Normality or other distribution
    • Linearity
  • Checking Assumptions:
    • Use betadisper() to test homogeneity of dispersion
    • If violated, PERMANOVA is testing dispersion differences and not location differences
## Homogeneity of dispersion test p-value: 0.8037
  • vegan::betadisper() converts bray-curtis to euclidean distances to calculate distance from sample to center
  • Principal Coordinates Analysis (PCoA) - ordination for distances between samples in a low-dimensional, Euclidean space

Running PERMANOVA

  • PERMANOVA on Fish Communities
    • Line-by-line interpretation:
      1. reach: The factor being tested (river reach)
      2. Df = 2: Degrees of freedom (3 groups - 1)
      3. SumOfSqs: Between-group sum of squares
      4. R2: Proportion of variance explained by reach
      5. F: F-statistic (ratio of between/within group variation)
      6. Pr(>F): P-value from permutation test
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = spe_matrix ~ reach, data = doubs_env, permutations = 999, distance = "bray")
##          Df SumOfSqs      R2      F Pr(>F)    
## Model     2   2.3631 0.42634 9.6614  0.001 ***
## Residual 26   3.1798 0.57366                  
## Total    28   5.5429 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • What this means:
    • Significant result (p < 0.001): We reject H₀

    • River reach explains substantial variation in fish communities

    • Fish communities differ significantly between river reaches

    • Very few permutations gave F ≥ observed F

Pairwise PERMANOVA Tests

  • Interpretation of Pairwise Results:
    • All pairwise comparisons are statistically significant even after Bonferroni correction

    • Upper vs Lower shows the strongest difference (highest F-statistic)

    • Downstream comparisons to up and middle explains a substantial portion of variance (R² > 0.3)

    • Biological interpretation: Fish communities change progressively down the river

##                     pairs Df SumsOfSqs   F.Model        R2 p.value p.adjusted
## 1   Upstream vs Midstream  1 0.4068962  3.434528 0.1680747   0.021      0.063
## 2  Upstream vs Downstream  1 1.8495110 15.490360 0.4767679   0.001      0.003
## 3 Midstream vs Downstream  1 1.2829784  9.972485 0.3565105   0.001      0.003
##   sig
## 1    
## 2   *
## 3   *

Visualizing PERMANOVA Results

Results:

ANOSIM: Analysis of Similarities

  • ANOSIM (Analysis of Similarities):
    • Purpose: are samples w/in groups are more similar than samples between groups
    • Method: Based on rank dissimilarities
    • Statistic: R-statistic ranging from -1 to +1
    • Interpretation:
      • R ≈ 1: Groups are completely separated

      • R ≈ 0: Groups are indistinguishable

      • R < 0: More dissimilarity within groups than between

  • Differences from PERMANOVA:
    • ANOSIM uses ranks of distances

    • PERMANOVA uses actual distances

    • ANOSIM is more robust but less powerful

How ANOSIM Works

  • ANOSIM Algorithm:
    1. Calculate dissimilarity matrix between all samples
    2. Rank all dissimilarities from smallest to largest
    3. Calculate mean rank of within-group dissimilarities (r̄w)
    4. Calculate mean rank of between-group dissimilarities (r̄b)
    5. Compute R-statistic: R = (r̄b - r̄w) / (N(N-1)/4) where N = total number of samples
    6. Permute group labels and recalculate R many times
    7. P-value = proportion of permuted R ≥ observed R
  • R-statistic interpretation:
    • R = 1: Perfect separation

    • R = 0: No separation

    • R = -1: More similar between groups than within

# Run ANOSIM
anosim_result <- anosim(spe_matrix, doubs_env$reach, 
                       distance = "bray", permutations = 999)

# Display results
anosim_result
## 
## Call:
## anosim(x = spe_matrix, grouping = doubs_env$reach, permutations = 999,      distance = "bray") 
## Dissimilarity: bray 
## 
## ANOSIM statistic R: 0.5082 
##       Significance: 0.001 
## 
## Permutation: free
## Number of permutations: 999

Running ANOSIM

  • ANOSIM Results Interpretation:
    • R = 0.5082 : This indicates “good” separation between groups
    • **p-value = 0.001 - significant result
    • Biological meaning: Fish communities well-separated between river reaches, with communities within each reach being much more similar to each other than to communities in other reaches
  • “Within” box is clearly lower than the “Between” box
    • sites within same river reach tend to be more similar to each other (lower dissimilarity ranks) than sites in different reaches (higher dissimilarity ranks)

    • R statistic of 0.51 quantifies this separation

      • there’s clear differentiation, but with some overlap.
  • Interpreting R:

    • R = 1 - Complete separation
      • all within-group dissimilarities smaller than between-group dissimilarities
    • R > 0.75 Well separated
    • R > 0.5 Separated, but overlapping
    • R > 0.25 Barely separated
    • R ≈ 0 No difference between groups

ANOSIM vs PERMANOVA Comparison

## # A tibble: 2 × 6
##   Method    `Test Statistic` `P-value` Interpretation   `What it tests` Approach
##   <chr>     <chr>                <dbl> <chr>            <chr>           <chr>   
## 1 PERMANOVA F = 9.66             0.001 Groups have dif… Differences in… Uses ac…
## 2 ANOSIM    R = 0.508            0.001 Excellent group… Overlap betwee… Uses ra…

ANOSIM vs PERMANOVA Comparison

  • When to use which:
    • PERMANOVA:
      • More powerful for detecting differences

      • Better for complex experimental designs

      • Can handle interactions and covariates

      • Preferred for most applications

    • ANOSIM:
      • More robust to outliers
      • Simpler interpretation
      • Good for initial exploratory analysis
      • Useful when distributions are very non-normal

PERMANOVA vs ANOSIM

  • What’s the Difference? Three Scenarios
  • Both methods test whether groups differ, but they ask different questions:
    • PERMANOVA asks: Do the group centroids differ in location?
      • Uses actual squared distances and partitions variance like ANOVA
    • ANOSIM asks: Are within-group dissimilarities smaller than between-group?
      • Uses ranked distances to compares rank distributions

Scenario 2 is Key!

  • The Problem:
    • Groups A and B have the same centroid (same average community)
    • But Group B is much more variable (spread out)
    • PERMANOVA will detect this as “significant”!
  • Why?
    • PERMANOVA tests distances from points to centroids
    • More spread = larger distances = detected as “different”

Dispersion test for PERMANOVA

  • Solution: Always run betadisper() before interpreting PERMANOVA!
    • What PERMANOVA “Sees”

What ANOSIM “Sees”

  • This is different

Side-by-Side Comparison of PCA and ANOSIM

Comparison

Environmental Drivers

  • Environmental Vector Results:
    • Which Environmental Variables Matter?
    • Significant variables (p < 0.05):
  • What this means:
    • variables significantly correlate w/ comm. comp
    • explain spatial arrangement of sites in NMDS
  • arrows show direction of max change for env. variables
    • Longer arrows = stronger correlations with ordination
    • Arrow direction = sites have high vs. low values of variable
  • distance from source (das) and altitude (alt) point opp. directions
    • (expected - altitude decreases as distance increases)
  • Oxygen (oxy) decreases downstream
  • BOD (dbo) increases downstream = more org. pollution
## 
## ***VECTORS
## 
##        NMDS1    NMDS2     r2 Pr(>r)    
## das  0.98098  0.19411 0.7284  0.001 ***
## alt -1.00000 -0.00057 0.5650  0.001 ***
## pen -0.61902  0.78538 0.2546  0.026 *  
## deb  0.99744 -0.07146 0.5701  0.001 ***
## pH  -0.09760 -0.99523 0.0746  0.374    
## dur  0.99967 -0.02575 0.2960  0.008 ** 
## pho  0.22860  0.97352 0.5228  0.001 ***
## nit  0.66665  0.74537 0.5200  0.001 ***
## amm  0.19902  0.98000 0.5471  0.002 ** 
## oxy -0.46402 -0.88583 0.7826  0.001 ***
## dbo  0.20211  0.97936 0.6883  0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999

Environmental Gradient Analysis

  • Key Environmental Patterns:
    • Distance from source (das): Strong correlate with community change
    • Oxygen (oxy): Decreases downstream, affects fish communities
    • Biological oxygen demand (dbo): Increases downstream (pollution indicator)
    • Altitude (alt): Decreases downstream, associated with temperature changes

Summary and Conclusions

  • Key Findings
  • NMDS Results:
    • Stress =: Excellent representation
    • Clear gradient from upper to lower river reaches
    • Within-reach similarity greater between-reach similarity
  • PERMANOVA Results:
    • Highly significant differences between reaches (p < 0.001)
    • River reach explains substantial variance in community composition
    • All pairwise comparisons significant after correction
  • ANOSIM Results:
    • R = 0.508 p = 0.001 : Excellent separation
    • Confirms PERMANOVA findings with different approach
    • Strong within-group coherence
  • Environmental Drivers:
    • Distance from source: Primary gradient
    • Dissolved oxygen: Decreases downstream
    • Biological oxygen demand: Increases downstream
    • Multiple correlated factors drive community change
  • Biological Interpretation:
    • River continuum concept supported
    • Progressive community change from source to mouth
    • Environmental filtering shapes community assembly
    • Pollution gradient evident in lower reaches

Take-Home

  • NMDS (Non-metric Multidimensional Scaling):
    • Visualizes community dissimilarity in 2D/3D
    • Preserves rank order of distances (non-metric)
    • No distributional assumptions
    • Stress < 0.2 for acceptable solutions
    • Great for exploring ecological gradients
  • PERMANOVA (Permutational MANOVA):
    • Tests for differences in group centroids
    • Uses distance matrices, not raw data
    • No distributional assumptions
    • Can handle complex designs
    • Check dispersion homogeneity assumption
  • ANOSIM (Analysis of Similarities):
    • Tests group separation using rank distances
    • R-statistic: -1 (no separation) to +1 (complete)
    • More robust to outliers than PERMANOVA
    • Simpler but less powerful
    • Good for exploratory analysis
  • Environmental Drivers:
    • Use envfit() to correlate environment with ordination
    • Identifies which variables explain community patterns
    • Helps understand ecological mechanisms
    • Essential for management implications
Back to top