In this assignment, you’ll practice some of the R skills you’ll need for the final project. The assignment is broken into 3 parts:
Part 1: Reading Package Docs (20 points)Part 2: Getting and Cleaning Data (40 points)Part 3: Reading Remote Files (40 points)In the final project, you’ll be asked to use some external packages to create an R processing pipeline that gets data, cleans it, analyzes it, and present results. In this assignment, we’re going to work with some of those packages to get familiar with them.
You can access R function documentation using syntax like
?function. This documentation typically contains a
description of the function’s purpose, a list of parameters, and the
function signature. In addition, many package authors include examples,
small blocks of code using the package’s functionality that you can
copy, paste, and run directly.
In this part of the assignment, I’d like you to practice reading and running these examples. Please copy one example of each of the following functions from its documentation. Try running these yourself, but you do not need to include the output in your submission.
ymd_hms() from the {lubridate}
packagestr_extract() from the {stringr}
packagesetnames() from the {data.table}
packagemap_dbl() from the {purrr} packageNOTE: Please, just COPY from these packages’ documentation… do not write your own custom examples.
Upload a file to the Programming Assignment 2
dropbox on D2L with a name like
{Firstname}_{Lastname}_Assignment2pt1.R (example:
James_Lamb_Assignment2pt1.R). This file should contain 10
sections, one for each of the functions above. Please copy one example
per function into your script, and use the format given in the example
below:
#==============================================================================#
# package: data.table
# function: setnames
# Example:
DT <- data.table(a=1:2,b=3:4,c=5:6) # compare to data.table
try(tracemem(DT)) # by reference, no deep or shallow copies
setnames(DT,"b","B") # by name, no match() needed (warning if "b" is missing)
#==============================================================================#
#==============================================================================#
# package: stats
# function: cor
# Example:
cor(1:10, 2:11) # == 1
#==============================================================================#
As you go further into the world of data science, you’ll hear this said often: most of a data scientist’s time is spent getting and cleaning data. Let’s practice that crucial activity.
In the second part of this assignment, we’ll be working with data from the St. Louis Federal Reserve (FRED).
Using the resources I’ve provided below, your goal is to download two series: Japanese inflation (YOY percent change) and U.S. inflation (YOY percent change).
Then you’ll join them into a single data.frame and plot
them against each other on a line plot using R’s base plotting
system.
{quantmod} + FRED tutorial: https://www.quantmod.com/documentation/getSymbols.FRED.htmlUpload a file to the Programming Assignment 2
dropbox on D2L with a name like
{Firstname}_{Lastname}_Assignment2pt2.R (example:
James_Lamb_Assignment2pt2.R). In the example script below,
I’ve filled in the pieces needed to pull the U.S. data. Your task is to
fill out the rest of this script so that it produces a plot comparing
Japan and the U.S.
You need to complete all of the following steps, indicated by the
phrase ### FILL THIS IN ### in the example script
below:
merge() command to create
a single dataset for plotting
?merge?graphics::plot)lines(type = "l")
col argument within lines() to set
the color of this line to “red”# Load dependencies
library(quantmod)
# Get Data
usaDF <- as.data.frame(
quantmod::getSymbols(
Symbols = "FPCPITOTLZGUSA"
, src = "FRED"
, auto.assign = FALSE
)
)
usaDF["Date"] <- as.Date(row.names(usaDF))
jpnDF <- as.data.frame(
quantmod::getSymbols(
### FILL THIS IN ###
)
)
jpnDF["Date"] <- as.Date(row.names(jpnDF))
# rename columns
names(usaDF) <- c("USA_INF_YOY", "Date")
names(jpnDF) <- ### FILL THIS IN ###
# Combine the two data.frames with merge()
mergedDF <- merge(
x = usaDF,
y = ### FILL THIS IN ###,
by = "Date"
)
# Plot the two series
plot(
x = mergedDF$Date,
y = mergedDF$JPN_INF_YOY,
type = ### FILL THIS IN ###,
main = ### FILL THIS IN ###,
xlab = ### FILL THIS IN ###,
ylab = ### FILL THIS IN ###
)
lines(
x = ### FILL THIS IN ###,
y = ### FILL THIS IN ###,
type = "l",
col = ### FILL THIS IN ###
)
legend(
x = "topright"
, col = c("black", "red")
, legend = c("JPN", "USA")
, lty = c(1, 1)
)
If this works correctly, you should see a plot similar to this:
In the final project, you’ll be asked to submit code that reads in data from remote storage.
In this part of the assignment, you’ll practice this important skill.
hint: Read https://jameslamb.github.io/intro-to-r/code/programming-supplement.html#Working_with_Files for some information about reading data from files.
Upload a file to the Programming Assignment 2
dropbox on D2L with a name like
{Firstname}_{Lastname}_Assignment2pt3.R (example:
James_Lamb_Assignment2pt3.R).
In the file, implement 2 functions:
get_remote_data()
url: string containing a URL pointing to a CSV file on
the internet"Downloaded data from '<url>'. Full dataset has <m> rows and <n> columns."
<url> is replaced with the URL passed via
url<m> is replaced with the number of rows in the
data<n> is replaced with the number of columns in the
datadata.frame representation of the file’s
contentsdownload.file(), or otherwise
leave files on the filesystemget_dataset()
url: same interpretation as abovenum_rows: an integer. If >=1, the
function returns the first num_rows rows of the data. If
<= 0, the function returns all rows.get_remote_data() and uses its resultdata.frame with the first
num_rows of the data indicated by urldownload.file(), or otherwise
leave files on the filesystemYour submitted script should only contain library()
calls (if you choose to use external packages) and your implementations
of those 2 functions.
Follow the template below, only adding code between the
### comments.
Any code you add outside those ### comments will
be ignored.
### BEGIN CODE ###
# (optional - add library() calls here)
### END CODE ###
get_remote_data <- function(url){
### BEGIN CODE ###
# (add code here)
### END CODE ###
}
get_dataset <- function(url, num_rows){
### BEGIN CODE ###
# (add code here)
### END CODE ###
}
Test your code using the following snippet.
Do not include this test code in your submission.
#--- test 1: should print and return the expected data ---#
fullDF <- get_remote_data(
url = "https://raw.githubusercontent.com/jameslamb/intro-to-r/refs/heads/main/sample-data/iris.csv"
)
# "Downloaded data from 'https://raw.githubusercontent.com/jameslamb/intro-to-r/refs/heads/main/sample-data/iris.csv'. Dataset has 150 rows and 6 columns."
#--- test 2: should subset rows ---#
subsetDF <- get_dataset(
url = "https://raw.githubusercontent.com/jameslamb/intro-to-r/refs/heads/main/sample-data/iris.csv"
, num_rows = 27
)
stopifnot(dim(subsetDF) == c(27, 6))
#--- test 3: should return all rows if num_rows == -1 ---#
subsetDF <- get_dataset(
url = "https://raw.githubusercontent.com/jameslamb/intro-to-r/refs/heads/main/sample-data/iris.csv"
, num_rows = -1
)
stopifnot(dim(subsetDF) == c(150, 6))