# load the libraries each time you restart R
library(tidyverse)
library(lubridate)
library(readxl)
library(scales)
library(skimr)
library(janitor)
library(patchwork)
# Read in file using tidyverse code-----
mm.df <- read_csv("data/mms.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## center = col_character(),
## color = col_character(),
## diameter = col_double(),
## mass = col_double()
## )
Note that you can read in excel files in the same way.
# Note you can read in excel files just as easy
mm_excel.df <- read_excel("data/mms.xlsx")
One way is to click the blue trianlge in the environment tab in the upper right
You can also use code to inspect the structure of the dataset
# data Structure
str(mm.df)
## spec_tbl_df [816 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ center : chr [1:816] "peanut butter" "peanut butter" "peanut butter" "peanut butter" ...
## $ color : chr [1:816] "blue" "brown" "orange" "brown" ...
## $ diameter: num [1:816] 16.2 16.5 15.5 16.3 15.6 ...
## $ mass : num [1:816] 2.18 2.01 1.78 1.98 1.62 2.59 1.9 2.55 2.07 2.26 ...
## - attr(*, "spec")=
## .. cols(
## .. center = col_character(),
## .. color = col_character(),
## .. diameter = col_double(),
## .. mass = col_double()
## .. )
# or
glimpse(mm.df)
## Rows: 816
## Columns: 4
## $ center <chr> "peanut butter", "peanut butter", "peanut butter", "peanut bu…
## $ color <chr> "blue", "brown", "orange", "brown", "yellow", "brown", "yello…
## $ diameter <dbl> 16.20, 16.50, 15.48, 16.32, 15.59, 17.43, 15.45, 17.30, 16.37…
## $ mass <dbl> 2.18, 2.01, 1.78, 1.98, 1.62, 2.59, 1.90, 2.55, 2.07, 2.26, 1…
Before we go too far it is often important to save the modified data
We can use the read_r package to do this with write_csv
# Saving files -----
# We can save the file we just read in using
# Saving dataframes -----
# lets say you have made a lot of changes and its now time to save the dataframe
write_csv(mm.df, "finalized_data/mm_output.csv")
This script will go over a lot of the basics of creating graphs in GGPlot and later on we will go over how to do more specialized things. This is by no means a complete guide to GGPlot but will do most of the things that you will need to do in GGPlot. Any suggestions or recommendations of things to add would be welcome.
I feel that graphing is the key to all data analysis. If you can look at your data you can begin to see patterns that you may have predicted and want to test statistically. You will also be able to see outliers that exist that might affect resutls faster than looking at summary statistics.
Using proper GGPlot code you are supposed to have dat = , y = and x = ….
I have found that these are not necessary most of the time and we can talk about this later.
# GGplot uses layers to build a graph
ggplot(data=mm.df, aes(x=color, y=diameter)) + # this sets up data
geom_point() # this adds a geometry to present the data from above
Because GGPlot builds things in layers you can add other geoms to the plot. Below you should try this code and see what happens when you put in +
after geom_line() and then add geom_boxplot()
.
# Add geom_point() -----
# Add points to the graph below using geom_point()
ggplot(mm.df, aes(x=color, y=diameter)) +
geom_point()
You can add in simple axes labels that are not formatted. Using the labs(x= " “, y =” ") statement. You can add in line breaks by putting in a \n
in the statement that you have below.
# Adding axes labels ----
ggplot(mm.df, aes(x=color, y=diameter)) +
geom_boxplot() +
geom_point() +
labs(x = "Color", y = "Diameter")
What I find really nice is being able to create formatted axes labels. You can do this a few ways but I have found the that the expression statement works the best for my needs. You can add in a ~
to add a space between symbols and a *
will connect things without a space.
# Label expressions -----
# Adding special formatting to labels
ggplot(mm.df, aes(x=color, y=diameter)) +
geom_boxplot() +
geom_point() +
labs(x = "color", y = expression(bold("Diameter ("*mu*"*1000)")))