relabel {git2rdata} | R Documentation |
Relabel Factor Levels by Updating the Metadata
Description
Imagine the situation where we have a dataframe with a factor variable and we
have stored it with write_vc(optimize = TRUE)
. The raw data file contains
the factor indices and the metadata contains the link between the factor
index and the corresponding label. See
vignette("version_control", package = "git2rdata")
. In such a case,
relabelling a factor can be fast and lightweight by updating the metadata.
Usage
relabel(file, root = ".", change)
Arguments
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
root |
The root of a project. Can be a file path or a |
change |
either a |
Value
invisible NULL
.
See Also
Other storage:
list_data()
,
prune_meta()
,
read_vc()
,
rename_variable()
,
rm_data()
,
verify_vc()
,
write_vc()
Examples
# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "alice@example.org")
# Create a dataframe and store it as an optimized git2rdata object.
# Note that write_vc() uses optimization by default.
# Stage and commit the git2rdata object.
ds <- data.frame(
a = c("a1", "a2"),
b = c("b2", "b1"),
stringsAsFactors = TRUE
)
junk <- write_vc(ds, "relabel", repo, sorting = "b", stage = TRUE)
cm <- commit(repo, "initial commit")
# check that the workspace is clean
status(repo)
# Define new labels as a list and apply them to the git2rdata object.
new_labels <- list(
a = list(a2 = "a3")
)
relabel("relabel", repo, new_labels)
# check the changes
read_vc("relabel", repo)
# relabel() changed the metadata, not the raw data
status(repo)
git2r::add(repo, "relabel.*")
cm <- commit(repo, "relabel using a list")
# Define new labels as a dataframe and apply them to the git2rdata object
change <- data.frame(
factor = c("a", "a", "b"),
old = c("a3", "a1", "b2"),
new = c("c2", "c1", "b3"),
stringsAsFactors = TRUE
)
relabel("relabel", repo, change)
# check the changes
read_vc("relabel", repo)
# relabel() changed the metadata, not the raw data
status(repo)