write_vc {git2rdata} | R Documentation |
Store a Data.Frame as a Git2rdata Object on Disk
Description
A git2rdata object consists of two files.
The ".tsv"
file contains the raw data as a plain text tab separated file.
The ".yml"
contains the metadata on the columns in plain text YAML format.
See vignette("plain text", package = "git2rdata")
for more details on the
implementation.
Usage
write_vc(
x,
file,
root = ".",
sorting,
strict = TRUE,
optimize = TRUE,
na = "NA",
...,
split_by
)
## S3 method for class 'character'
write_vc(
x,
file,
root = ".",
sorting,
strict = TRUE,
optimize = TRUE,
na = "NA",
...,
split_by = character(0)
)
## S3 method for class 'git_repository'
write_vc(
x,
file,
root,
sorting,
strict = TRUE,
optimize = TRUE,
na = "NA",
...,
stage = FALSE,
force = FALSE
)
Arguments
x |
the |
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
root |
The root of a project. Can be a file path or a |
sorting |
an optional vector of column names defining which columns to
use for sorting |
strict |
What to do when the metadata changes. |
optimize |
If |
na |
the string to use for missing values in the data. |
... |
parameters used in some methods |
split_by |
An optional vector of variables name to split the text files.
This creates a separate file for every combination.
We prepend these variables to the vector of |
stage |
Logical value indicating whether to stage the changes after
writing the data. Defaults to |
force |
Add ignored files. Default is FALSE. |
Value
a named vector with the file paths relative to root
. The names
contain the hashes of the files.
Note
..generic
is a reserved name for the metadata and is a forbidden
column name in a data.frame
.
See Also
Other storage:
list_data()
,
prune_meta()
,
read_vc()
,
relabel()
,
rename_variable()
,
rm_data()
,
verify_vc()
Examples
## on file system
# create a directory
root <- tempfile("git2rdata-")
dir.create(root)
# write a dataframe to the directory
write_vc(iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length")
# check that a data file (.tsv) and a metadata file (.yml) exist.
list.files(root, recursive = TRUE)
# read the git2rdata object from the directory
read_vc("iris", root)
# store a new version with different observations but the same metadata
write_vc(iris[1:5, ], "iris", root)
list.files(root, recursive = TRUE)
# Removing a column requires version requires new metadata.
# Add strict = FALSE to override the existing metadata.
write_vc(
iris[1:6, -2], "iris", root, sorting = "Sepal.Length", strict = FALSE
)
list.files(root, recursive = TRUE)
# storing the orignal version again requires another update of the metadata
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Width", strict = FALSE)
list.files(root, recursive = TRUE)
# optimize = FALSE stores the data more verbose. This requires larger files.
write_vc(
iris[1:6, ], "iris2", root, sorting = "Sepal.Width", optimize = FALSE
)
list.files(root, recursive = TRUE)
## on git repo using a git2r::git-repository
# initialise a git repo using the git2r package
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "alice@example.org")
# store a dataframe in git repo.
write_vc(iris[1:6, ], file = "iris", root = repo, sorting = "Sepal.Length")
# This git2rdata object is not staged by default.
status(repo)
# read a dataframe from a git repo
read_vc("iris", repo)
# store a new version in the git repo and stage it in one go
write_vc(iris[1:5, ], "iris", repo, stage = TRUE)
status(repo)
# store a verbose version in a different gir2data object
write_vc(
iris[1:6, ], "iris2", repo, sorting = "Sepal.Width", optimize = FALSE
)
status(repo)