In this extra credit assignment, your task is to write an R function
that shows your understanding of modifying data frames in place using
the {data.table} package.
In most R code, when you pass data into a function, R creates a copy of it. Consider this example.
some_number <- 11
addSix <- function(some_number) {
some_number <- some_number + 6
return(some_number)
}
# this should return 17
addSix(some_number)
## [1] 17
# but some_number is still 11
print(some_number)
## [1] 11
In the example code, the value some_number is unchanged
by the code inside the function addSix(). This is because
when the function is called, a copy of that object is
created. Changing a copy doesn’t change the original value.
This is good because it makes your code safe and easy to reason about. But that ease-of-use comes at a cost. Creating copies consumes time and memory. If R functions just modified their arguments directly, they could run more quickly and use less memory.
Some packages in R do support this pattern of just modifying inputs
directly, which is called “modifying in place” or “pass-by-reference”.
One such package, {data.table}, allows you to modify data
frames in R without copying them.
See the example below.
library(data.table)
## Warning: package 'data.table' was built under R version 4.4.3
randomDT <- data.table::data.table(
x = rnorm(10)
, y = rnorm(10)
)
print(randomDT)
## x y
## <num> <num>
## 1: -0.3635938 -0.93689125
## 2: 0.8000444 1.15348849
## 3: 0.2545262 -0.49272222
## 4: 1.1050339 0.08861516
## 5: 0.2239490 0.04470975
## 6: 0.3043927 -0.07872333
## 7: 0.5863823 1.18891894
## 8: -1.4412849 -2.17314240
## 9: 1.0383001 1.53658612
## 10: 0.2348604 0.13554731
# add "_col" to the end of each column name
changeNames <- function(dataset) {
data.table::setnames(
x = dataset
, old = names(dataset)
, new = paste0(names(dataset), "_old")
)
return(invisible(NULL))
}
ret <- changeNames(randomDT)
# this function returns nothing
print(ret)
## NULL
# but the data.table has been changed!
print(randomDT)
## x_old y_old
## <num> <num>
## 1: -0.3635938 -0.93689125
## 2: 0.8000444 1.15348849
## 3: 0.2545262 -0.49272222
## 4: 1.1050339 0.08861516
## 5: 0.2239490 0.04470975
## 6: 0.3043927 -0.07872333
## 7: 0.5863823 1.18891894
## 8: -1.4412849 -2.17314240
## 9: 1.0383001 1.53658612
## 10: 0.2348604 0.13554731
Your task in this assignment is to write an R function that modifies
a data.table in place.
The {data.table} package is not covered in this class.
You can learn about it from the following resources.
?data.table::setnamesTo complete this assignment, fill out the body of the function
defined below. Do not remove or change any of the code or comments. The
function assumes that the input is the built-in attitude
dataset.
library(data.table)
data(attitude)
attitudeDT <- data.table::as.data.table(attitude)
The function below contains comments that describe what the function
should do. Add exactly one statement under each comment block. The
[must-use] section describes the functions or operators
from the {data.table} package that you must use in that
statement. The [description] block describes what the code
should do.
library(data.table)
changeDataset <- function(inDT) {
# [must-use]
# :=, .N
#
# [description]
# add a new column "random_garbage"
# [must-use]
# setorderv()
#
# [description]
# sort inDT by "rating" and "privileges"
# [must-use]
# :=
#
# [description]
# remove the column "advance"
# [must-use]
# setcolorder()
#
# [description]
# change the column order to
# 1. "complaints"
# 2. "rating"
# 3. "privileges"
# 4. "random_garbage"
# 5. "critical"
# 6. "raises"
# 7. "learning"
# [must-use]
# :=
#
# [description]
# multiply every value in "raises" by 20
# [must-use]
# setnames()
#
# [description]
# add the length of each column name to the column name
# so e.g. "rating" should become "rating_6" because "rating" has 6
# characters. You must use nchar() to get the lengths.
return(invisible(NULL))
}
The code below can be used to test your solution. Define your
function changeDataset(), then run this code. If it does
not throw any errors, you’ve completed the assignment successfully.
# tests
library(data.table)
local_file <- "solution.rds"
if (!file.exists(local_file)) {
solution_url <- "https://github.com/jameslamb/intro-to-r/releases/download/mu_rprog/extra-credit-solution.rds"
download.file(
url = solution_url
, destfile = local_file
)
}
solutionDT <- readRDS(local_file)
data("attitude")
attitudeDT <- data.table::as.data.table(attitude)
ret <- changeDataset(attitudeDT)
if (!is.null(ret)) {
stop("changeDataset() should not return anything")
}
errors <- data.table:::all.equal.data.table(
attitudeDT
, solutionDT
)
if (!isTRUE(errors)) {
print(errors)
stop("Errors were found!")
} else {
print("your code is great and so are you")
}
When you feel your solution is working, submit an R script with your
solution to the “Extra Credit” dropbox on D2L, under
Assessments --> Dropbox.
This assignment is worth up to 3 percentage points added to your final grade for the class. 0.5 points will be awarded for each statement that you implement correctly.