Reading and Writing Data

Learn to read data from various file types and save your processed data.

Objective

Learn how to read data from different file types, process it, and then save your results to an output directory. We’ll cover:

  • CSV files
  • Excel files
  • Tab-delimited files
  • Space-delimited files

For more sample data files, see the Dataframes page.


Load Required Libraries

We’ll use tidyverse for CSV and delimited files, and readxl for Excel files.

# Load necessary libraries
library(janitor)
library(readxl)
library(tidyverse)

Reading Data Files

  1. CSV Files
# Read a CSV file
mm_df <- read_csv("data/mms.csv")
  1. Excel Files
# Read an Excel file
mm_excel_df <- read_excel("data/mms.xlsx")
  1. Tab-Delimited Files
# Read a tab-delimited file (alternatively, use read_tsv)
mm_tab_df <- read_delim("data/mms_tab.txt", delim = "\t")
  1. Space-Delimited Files
# Read a space-delimited file
mm_space_df <- read_delim("data/mms_space.txt", delim = " ")

Inspecting the Data

After reading in a file, check its structure using:

# Quickly inspect the data
glimpse(mm_df)
Rows: 816
Columns: 4
$ center   <chr> "peanut butter", "peanut butter", "peanut butter", "peanut bu…
$ color    <chr> "blue", "brown", "orange", "brown", "yellow", "brown", "yello…
$ diameter <dbl> 16.20, 16.50, 15.48, 16.32, 15.59, 17.43, 15.45, 17.30, 16.37…
$ mass     <dbl> 2.18, 2.01, 1.78, 1.98, 1.62, 2.59, 1.90, 2.55, 2.07, 2.26, 1…
# or 
head(mm_df)
# A tibble: 6 × 4
  center        color  diameter  mass
  <chr>         <chr>     <dbl> <dbl>
1 peanut butter blue       16.2  2.18
2 peanut butter brown      16.5  2.01
3 peanut butter orange     15.5  1.78
4 peanut butter brown      16.3  1.98
5 peanut butter yellow     15.6  1.62
6 peanut butter brown      17.4  2.59

Saving Processed Data

Before saving your results, ensure the output directory exists. You can create it if needed:

# Create the output directory if it doesn't exist
if (!dir.exists("output")) {
  dir.create("output")
}

Then, save your data frame as a CSV file:

# Save the processed data to the output directory
write_csv(mm_df, "output/mm_output.csv")

Cleaning up messy or poorly formatted variable names

To do this we will use janitor that has a lot of functions that automate this task.

# Read an Excel file
mm_excel_df <- read_excel("data/mms.xlsx") %>%    
              clean_names()