R: Create a data frame from a Workbook

wb_to_df {openxlsx2}

R Documentation

Create a data frame from a Workbook

Description

Simple function to create a data.frame from a sheet in workbook. Simple as in it was simply written down. read_xlsx() and wb_read() are just internal wrappers of wb_to_df() intended for people coming from other packages.

Usage

wb_to_df(
  file,
  sheet,
  start_row = 1,
  start_col = NULL,
  row_names = FALSE,
  col_names = TRUE,
  skip_empty_rows = FALSE,
  skip_empty_cols = FALSE,
  skip_hidden_rows = FALSE,
  skip_hidden_cols = FALSE,
  rows = NULL,
  cols = NULL,
  detect_dates = TRUE,
  na.strings = "#N/A",
  na.numbers = NA,
  fill_merged_cells = FALSE,
  dims,
  show_formula = FALSE,
  convert = TRUE,
  types,
  named_region,
  keep_attributes = FALSE,
  check_names = FALSE,
  ...
)

read_xlsx(
  file,
  sheet,
  start_row = 1,
  start_col = NULL,
  row_names = FALSE,
  col_names = TRUE,
  skip_empty_rows = FALSE,
  skip_empty_cols = FALSE,
  rows = NULL,
  cols = NULL,
  detect_dates = TRUE,
  named_region,
  na.strings = "#N/A",
  na.numbers = NA,
  fill_merged_cells = FALSE,
  check_names = FALSE,
  ...
)

wb_read(
  file,
  sheet = 1,
  start_row = 1,
  start_col = NULL,
  row_names = FALSE,
  col_names = TRUE,
  skip_empty_rows = FALSE,
  skip_empty_cols = FALSE,
  rows = NULL,
  cols = NULL,
  detect_dates = TRUE,
  named_region,
  na.strings = "NA",
  na.numbers = NA,
  check_names = FALSE,
  ...
)

Arguments

`file`	An xlsx file, wbWorkbook object or URL to xlsx file.
`sheet`	Either sheet name or index. When missing the first sheet in the workbook is selected.
`start_row`	first row to begin looking for data.
`start_col`	first column to begin looking for data.
`row_names`	If `TRUE`, the first col of data will be used as row names.
`col_names`	If `TRUE`, the first row of data will be used as column names.
`skip_empty_rows`	If `TRUE`, empty rows are skipped.
`skip_empty_cols`	If `TRUE`, empty columns are skipped.
`skip_hidden_rows`	If `TRUE`, hidden rows are skipped.
`skip_hidden_cols`	If `TRUE`, hidden columns are skipped.
`rows`	A numeric vector specifying which rows in the xlsx file to read. If `NULL`, all rows are read.
`cols`	A numeric vector specifying which columns in the xlsx file to read. If `NULL`, all columns are read.
`detect_dates`	If `TRUE`, attempt to recognize dates and perform conversion.
`na.strings`	A character vector of strings which are to be interpreted as `NA`. Blank cells will be returned as `NA`.
`na.numbers`	A numeric vector of digits which are to be interpreted as `NA`. Blank cells will be returned as `NA`.
`fill_merged_cells`	If `TRUE`, the value in a merged cell is given to all cells within the merge.
`dims`	Character string of type "A1:B2" as optional dimensions to be imported.
`show_formula`	If `TRUE`, the underlying Excel formulas are shown.
`convert`	If `TRUE`, a conversion to dates and numerics is attempted.
`types`	A named numeric indicating, the type of the data. Names must match the returned data. See Details for more.
`named_region`	Character string with a `named_region` (defined name or table). If no sheet is selected, the first appearance will be selected. See `wb_get_named_regions()`
`keep_attributes`	If `TRUE` additional attributes are returned. (These are used internally to define a cell type.)
`check_names`	If `TRUE` then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names.
`...`	additional arguments

Details

The returned data frame will have named rows matching the rows of the worksheet. With col_names = FALSE the returned data frame will have column names matching the columns of the worksheet. Otherwise the first row is selected as column name.

Depending if the R package hms is loaded, wb_to_df() returns hms variables or string variables in the hh:mm:ss format.

The types argument must be a named numeric.

0: character
1: numeric
2: date
3: posixt (datetime)
4: logical

wb_to_df() will not pick up formulas added to a workbook object via wb_add_formula(). This is because only the formula is written and left to be evaluated when the file is opened in a spreadsheet software. Opening, saving and closing the file in a spreadsheet software will resolve this.

Examples

###########################################################################
# numerics, dates, missings, bool and string
example_file <- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2")
wb1 <- wb_load(example_file)

# import workbook
wb_to_df(wb1)

# do not convert first row to column names
wb_to_df(wb1, col_names = FALSE)

# do not try to identify dates in the data
wb_to_df(wb1, detect_dates = FALSE)

# return the underlying Excel formula instead of their values
wb_to_df(wb1, show_formula = TRUE)

# read dimension without colNames
wb_to_df(wb1, dims = "A2:C5", col_names = FALSE)

# read selected cols
wb_to_df(wb1, cols = c("A:B", "G"))

# read selected rows
wb_to_df(wb1, rows = c(2, 4, 6))

# convert characters to numerics and date (logical too?)
wb_to_df(wb1, convert = FALSE)

# erase empty rows from dataset
wb_to_df(wb1, skip_empty_rows = TRUE)

# erase empty columns from dataset
wb_to_df(wb1, skip_empty_cols = TRUE)

# convert first row to rownames
wb_to_df(wb1, sheet = 2, dims = "C6:G9", row_names = TRUE)

# define type of the data.frame
wb_to_df(wb1, cols = c(2, 5), types = c("Var1" = 0, "Var3" = 1))

# start in row 5
wb_to_df(wb1, start_row = 5, col_names = FALSE)

# na string
wb_to_df(wb1, na.strings = "a")

###########################################################################
# Named regions
file_named_region <- system.file("extdata", "namedRegions3.xlsx", package = "openxlsx2")
wb2 <- wb_load(file_named_region)

# read dataset with named_region (returns global first)
wb_to_df(wb2, named_region = "MyRange", col_names = FALSE)

# read named_region from sheet
wb_to_df(wb2, named_region = "MyRange", sheet = 4, col_names = FALSE)

# read_xlsx() and wb_read()
example_file <- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2")
read_xlsx(file = example_file)
df1 <- wb_read(file = example_file, sheet = 1)
df2 <- wb_read(file = example_file, sheet = 1, rows = c(1, 3, 5), cols = 1:3)