# Load necessary libraries
library(janitor)
library(readxl)
library(tidyverse)
Reading and Writing Data
Learn to read data from various file types and save your processed data.
Objective
Learn how to read data from different file types, process it, and then save your results to an output directory. We’ll cover:
- CSV files
- Excel files
- Tab-delimited files
- Space-delimited files
For more sample data files, see the Dataframes page.
Load Required Libraries
We’ll use tidyverse
for CSV and delimited files, and readxl
for Excel files.
Reading Data Files
- CSV Files
# Read a CSV file
<- read_csv("data/mms.csv") mm_df
- Excel Files
# Read an Excel file
<- read_excel("data/mms.xlsx") mm_excel_df
- Tab-Delimited Files
# Read a tab-delimited file (alternatively, use read_tsv)
<- read_delim("data/mms_tab.txt", delim = "\t") mm_tab_df
- Space-Delimited Files
# Read a space-delimited file
<- read_delim("data/mms_space.txt", delim = " ") mm_space_df
Inspecting the Data
After reading in a file, check its structure using:
# Quickly inspect the data
glimpse(mm_df)
Rows: 816
Columns: 4
$ center <chr> "peanut butter", "peanut butter", "peanut butter", "peanut bu…
$ color <chr> "blue", "brown", "orange", "brown", "yellow", "brown", "yello…
$ diameter <dbl> 16.20, 16.50, 15.48, 16.32, 15.59, 17.43, 15.45, 17.30, 16.37…
$ mass <dbl> 2.18, 2.01, 1.78, 1.98, 1.62, 2.59, 1.90, 2.55, 2.07, 2.26, 1…
# or
head(mm_df)
# A tibble: 6 × 4
center color diameter mass
<chr> <chr> <dbl> <dbl>
1 peanut butter blue 16.2 2.18
2 peanut butter brown 16.5 2.01
3 peanut butter orange 15.5 1.78
4 peanut butter brown 16.3 1.98
5 peanut butter yellow 15.6 1.62
6 peanut butter brown 17.4 2.59
Saving Processed Data
Before saving your results, ensure the output directory exists. You can create it if needed:
# Create the output directory if it doesn't exist
if (!dir.exists("output")) {
dir.create("output")
}
Then, save your data frame as a CSV file:
# Save the processed data to the output directory
write_csv(mm_df, "output/mm_output.csv")
Cleaning up messy or poorly formatted variable names
To do this we will use janitor that has a lot of functions that automate this task.
# Read an Excel file
<- read_excel("data/mms.xlsx") %>%
mm_excel_df clean_names()