R: Store a Data.Frame as a Git2rdata Object on Disk

write_vc {git2rdata}

R Documentation

Store a Data.Frame as a Git2rdata Object on Disk

Description

A git2rdata object consists of two files. The ".tsv" file contains the raw data as a plain text tab separated file. The ".yml" contains the metadata on the columns in plain text YAML format. See vignette("plain text", package = "git2rdata") for more details on the implementation.

Usage

write_vc(
  x,
  file,
  root = ".",
  sorting,
  strict = TRUE,
  optimize = TRUE,
  na = "NA",
  ...,
  split_by
)

## S3 method for class 'character'
write_vc(
  x,
  file,
  root = ".",
  sorting,
  strict = TRUE,
  optimize = TRUE,
  na = "NA",
  ...,
  split_by = character(0)
)

## S3 method for class 'git_repository'
write_vc(
  x,
  file,
  root,
  sorting,
  strict = TRUE,
  optimize = TRUE,
  na = "NA",
  ...,
  stage = FALSE,
  force = FALSE
)

Arguments

`x`	the `data.frame`.
`file`	the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. `file` is a path relative to the `root`. Note that `file` must point to a location within `root`.
`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).
`sorting`	an optional vector of column names defining which columns to use for sorting `x` and in what order to use them. The default empty `sorting` yields a warning. Add `sorting` to avoid this warning. Strongly recommended in combination with version control. See `vignette("efficiency", package = "git2rdata")` for an illustration of the importance of sorting.
`strict`	What to do when the metadata changes. `strict = FALSE` overwrites the data and the metadata with a warning listing the changes, `strict = TRUE` returns an error and leaves the data and metadata as is. Defaults to `TRUE`.
`optimize`	If `TRUE`, recode the data to get smaller text files. If `FALSE`, `meta()` converts the data to character. Defaults to `TRUE`.
`na`	the string to use for missing values in the data.
`...`	parameters used in some methods
`split_by`	An optional vector of variables name to split the text files. This creates a separate file for every combination. We prepend these variables to the vector of `sorting` variables.
`stage`	Logical value indicating whether to stage the changes after writing the data. Defaults to `FALSE`.
`force`	Add ignored files. Default is FALSE.

Value

a named vector with the file paths relative to root. The names contain the hashes of the files.

Note

..generic is a reserved name for the metadata and is a forbidden column name in a data.frame.

Examples

## on file system

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# write a dataframe to the directory
write_vc(iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length")
# check that a data file (.tsv) and a metadata file (.yml) exist.
list.files(root, recursive = TRUE)
# read the git2rdata object from the directory
read_vc("iris", root)

# store a new version with different observations but the same metadata
write_vc(iris[1:5, ], "iris", root)
list.files(root, recursive = TRUE)
# Removing a column requires version requires new metadata.
# Add strict = FALSE to override the existing metadata.
write_vc(
  iris[1:6, -2], "iris", root, sorting = "Sepal.Length", strict = FALSE
)
list.files(root, recursive = TRUE)
# storing the orignal version again requires another update of the metadata
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Width", strict = FALSE)
list.files(root, recursive = TRUE)
# optimize = FALSE stores the data more verbose. This requires larger files.
write_vc(
  iris[1:6, ], "iris2", root, sorting = "Sepal.Width", optimize = FALSE
)
list.files(root, recursive = TRUE)



## on git repo using a git2r::git-repository

# initialise a git repo using the git2r package
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "alice@example.org")

# store a dataframe in git repo.
write_vc(iris[1:6, ], file = "iris", root = repo, sorting = "Sepal.Length")
# This git2rdata object is not staged by default.
status(repo)
# read a dataframe from a git repo
read_vc("iris", repo)

# store a new version in the git repo and stage it in one go
write_vc(iris[1:5, ], "iris", repo, stage = TRUE)
status(repo)

# store a verbose version in a different gir2data object
write_vc(
  iris[1:6, ], "iris2", repo, sorting = "Sepal.Width", optimize = FALSE
)
status(repo)

[Package git2rdata version 0.4.0 Index]