readWorksheet-methods {XLConnect} | R Documentation |
Reading data from worksheets
Description
Reads data from worksheets of a workbook
.
Usage
## S4 method for signature 'workbook,numeric'
readWorksheet(object,sheet,startRow,startCol,endRow,endCol,autofitRow,autofitCol,
region,header,rownames,colTypes,forceConversion,dateTimeFormat,check.names,
useCachedValues,keep,drop, simplify, readStrategy)
## S4 method for signature 'workbook,character'
readWorksheet(object,sheet,startRow,startCol,endRow,endCol,autofitRow,autofitCol,
region,header,rownames,colTypes,forceConversion,dateTimeFormat,check.names,
useCachedValues,keep,drop, simplify, readStrategy)
Arguments
object |
The |
sheet |
The name or index of the worksheet to read from |
startRow |
The index of the first row to read from. Defaults to |
startCol |
The index of the first column to read from. Defaults to |
endRow |
The index of the last row to read from. Defaults to |
endCol |
The index of the last column to read from. Defaults to |
autofitRow |
|
autofitCol |
|
region |
A range specifier in the form 'A10:B18'. This provides an alternative way to specify
|
header |
Interpret the first row of the specified area as column
headers. The default is |
rownames |
Index ( |
colTypes |
Column types to use when reading in the data. Specified as a |
forceConversion |
|
dateTimeFormat |
Date/time format used when doing date/time conversions. Defaults to |
check.names |
|
useCachedValues |
|
keep |
Vector of column names or indices to be kept in the output data frame.
It is possible to specify either |
drop |
Vector of column names or indices to be dropped in the output data frame.
It is possible to specify either |
simplify |
|
readStrategy |
|
Details
Reads data from the worksheet specified by sheet
. Data is read
starting at the top left corner specified by startRow
and
startCol
down to the bottom right corner specified by
endRow
and endCol
. If header = TRUE
, the first row
is interpreted as column names of the resulting data.frame
.
If
startRow <= 0
then the first available row in the sheet is assumed.
If endRow = 0
then the last available row in the sheet is assumed.
For endRow = -n
with n > 0, the 'last row' - n rows is assumed. This
is useful in cases where you want to skip the last n rows.
If startCol <= 0
then the minimum column between startRow
and
endRow
is assumed. If endCol = 0
then the maximum column between
startRow
and endRow
is assumed. If endCol = -n
with n > 0,
the maximum column between startRow
and endRow
except for the last n
columns is assumed.
In other words, if no boundaries are specified readWorksheet
assumes the "bounding box" of the data as the corresponding boundaries.
The arguments autofitRow
and autofitCol
(both defaulting to TRUE
) can be used to skip leading and trailing empty rows even in case startRow
, endRow
, startCol
and endCol
are specified to values > 0
. This can be useful if data is expected within certain given boundaries but the exact location is not available.
If all four coordinate arguments are missing this behaves as above with
startRow = 0
, startCol = 0
, endRow = 0
and
endCol = 0
. In this case readWorksheet
assumes the
"bounding box" of the data as the corresponding boundaries.
All arguments (except object
) are vectorized. As such, multiple worksheets (and also multiple data regions
from the same worksheet) can be read with one method call. If only one single data region is read, the return value
is a data.frame
. If multiple data regions are specified, the return value is a list
of data.frame
's
returned in the order they have been specified. If worksheets have been specified by name, the list
will be a
named list
named by the corresponding worksheets.
Note
If no specific column types (see argument colTypes
) are specified,
readWorksheet
tries to determine the resulting column types based on the read cell types. If different cell
types are found in a specific column, the most general of those is used and mapped to the corresponding R data type.
The order of data types from least to most general is Boolean (logical
) < DateTime (POSIXct
) <
Numeric (numeric
) < String (character
). E.g. if a column is read that contains cells of type Boolean,
Numeric and String then the resulting column in R would be character
since character
is the most general
type.
Some additional information with respect to forcing data type conversion
using forceConversion = TRUE
:
Forcing conversion from String to Boolean:
TRUE
is returned if and only if the target string is "true" (ignoring any capitalization). Any other string will returnFALSE
.Forcing conversion from Numeric to DateTime: since Excel understands Dates/Times as Numerics with some additional formatting, a conversion from a Numeric to a DateTime is actually possible. Numerics in this case represent the number of days since 1900-01-00 (yes, day 00! - see http://www.cpearson.com/excel/datetime.htm). Note that in R 0 is represented as 1899-12-31 since there is no 1900-01-00. Fractional days represent hours, minutes, and seconds.
Author(s)
Martin Studer
Thomas Themel
Nicola Lambiase
Mirai Solutions GmbH https://mirai-solutions.ch
See Also
workbook
,
writeWorksheet
,
readNamedRegion
,
writeNamedRegion
,
readWorksheetFromFile
,
readTable
,
onErrorCell
Examples
## Not run:
## Example 1:
# mtcars xlsx file from demoFiles subfolder of package XLConnect
demoExcelFile <- system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
# Load workbook
wb <- loadWorkbook(demoExcelFile)
# Read worksheet 'mtcars' (providing no specific area bounds;
# with default header = TRUE)
data <- readWorksheet(wb, sheet = "mtcars")
## Example 2:
# mtcars xlsx file from demoFiles subfolder of package XLConnect
demoExcelFile <- system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
# Load workbook
wb <- loadWorkbook(demoExcelFile)
# Read worksheet 'mtcars' (providing area bounds; with default header = TRUE)
data <- readWorksheet(wb, sheet = "mtcars", startRow = 1, startCol = 3,
endRow = 15, endCol = 8)
## Example 3:
# mtcars xlsx file from demoFiles subfolder of package XLConnect
demoExcelFile <- system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
# Load workbook
wb <- loadWorkbook(demoExcelFile)
# Read worksheet 'mtcars' (providing area bounds using the region argument;
# with default header = TRUE)
data <- readWorksheet(wb, sheet = "mtcars", region = "C1:H15")
## Example 4:
# conversion xlsx file from demoFiles subfolder of package XLConnect
excelFile <- system.file("demoFiles/conversion.xlsx", package = "XLConnect")
# Load workbook
wb <- loadWorkbook(excelFile)
# Read worksheet 'Conversion' with pre-specified column types
# Note: in the worksheet all data was entered as strings!
# forceConversion = TRUE is used to force conversion from String
# into the less generic data types Numeric, DateTime & Boolean
df <- readWorksheet(wb, sheet = "Conversion", header = TRUE,
colTypes = c(XLC$DATA_TYPE.NUMERIC,
XLC$DATA_TYPE.DATETIME,
XLC$DATA_TYPE.BOOLEAN),
forceConversion = TRUE,
dateTimeFormat = "%Y-%m-%d %H:%M:%S")
## Example 5:
# mtcars xlsx file from demoFiles subfolder of package XLConnect
demoExcelFile <- system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
# Load workbook
wb <- loadWorkbook(demoExcelFile)
# Read the columns 1, 3 and 5 from the sheet 'mtcars' (with default header = TRUE)
data <- readWorksheet(wb, sheet = "mtcars", keep=c(1,3,5))
## End(Not run)