labels {papeR} | R Documentation |
Extract labels from and set labels for data frames
Description
Labels can be stored as an attribute "variable.label"
for each
variable in a data set using the assignment function. With the
extractor function one can assess these labels.
Usage
## S3 method for class 'data.frame'
labels(object, which = NULL, abbreviate = FALSE, ...)
## assign labels
labels(data, which = NULL) <- value
## check if data.frame is a special labeled data.frame ('ldf')
is.ldf(object)
## convert object to labeled data.frame ('ldf')
convert.labels(object)
as.ldf(object, ...)
## special plotting function for labeled data.frames ('ldf')
## S3 method for class 'ldf'
plot(x, variables = names(x),
labels = TRUE, by = NULL, with = NULL,
regression.line = TRUE, line.col = "red", ...)
Arguments
object |
a |
data |
a |
which |
either a number indicating the label to extract or a character
string with the variable name for which the label should be
extracted. One can also use a vector of numerics or character
strings to extract mutiple labels. If |
value |
a vector containing the labels (in the order of the variables). If which is given, only the corresponding subset is labeled. Note that all other labels contain the variable name as label afterwards. |
abbreviate |
logical (default: |
... |
further options passed to function In In |
x |
a labeled |
variables |
character vector or numeric vector defining (continuous) variables
that should be included in the table. Per default, all numeric and
factor variables of |
labels |
labels for the variables. If |
by |
a character or numeric value specifying a variable in the data set.
This variable can be either a grouping |
with |
a character or numeric value specifying a numeric variable
|
regression.line |
a logical argument specifying if a regression line should be added
to scatter plots (which are plotted if both |
line.col |
the color of the regression line. |
Details
All labels are stored as attributes of the columns of the data frame, i.e., each variable has (up to) one attribute which contains the variable lable.
One can set or extract labels from data.frame
objects.
If no labels are specified labels(data)
returns the column
names of the data frame.
Using abbreviate = TRUE
, all labels are abbreviated to (at
least) 4 characters such that they are unique. Other minimal lengths
can specified by setting minlength
(see examples below).
Univariate plots can be easily obtained for all numeric and factor
variables in a data set data
by using plot(data)
.
Bivariate plots can be obtained by specifying by
. In case of a
factor variable, grouped boxplot
s or spineplot
s are
plotted depending on the class of the variable specified in
variables
. In case of a numeric variable, grouped
boxplot
s or scatter plots are plotted depending on the
class of the variable specified in variables
. Note that one
cannot specify by
and with
at the same time (as they are
internally identical). Note that missings are excluded plot wise (also
for bivariate plots).
Value
labels(data)
returns a named vector of variable labels, where
the names match the variable names and the values represent the labels.
Note
If a data set is generated by read.spss
in package
foreign, labels are stored in a single attribute of the data
set. Assigning new labels, e.g. via labels(data) <-
labels(data)
removes this attribute and stores all labels as
attributes of the variables. Alternatively one can use
data <- convert.labels(data)
.
Author(s)
Benjamin Hofner
See Also
read.spss
in package foreign
Examples
############################################################
### Basic labels manipulations
data <- data.frame(a = 1:10, b = 10:1, c = rep(1:2, 5))
labels(data) ## only the variable names
is.ldf(data) ## not yet
## now set labels
labels(data) <- c("my_a", "my_b", "my_c")
## one gets a named character vector of labels
labels(data)
## data is now a ldf:
is.ldf(data)
## Altervatively one could use as.ldf(data) or convert.labels(data);
## This would keep the default labels but set the class
## correctly.
## set labels for a and b only
## Note that which represents the variable names!
labels(data, which = c("a", "b")) <- c("x", "y")
labels(data)
## reset labels (to variable names):
labels(data) <- NULL
labels(data)
## set label for a only and use default for other labels:
labels(data, which = "a") <- "x"
labels(data)
## attach label for new variable:
data2 <- data
data2$z <- as.factor(rep(2:3, each = 5))
labels(data2) ## no real label for z, only variable name
labels(data2, which = "z") <- "new_label"
labels(data2)
############################################################
### Abbreviate labels
## attach long labels to data
labels(data) <- c("This is a long label", "This is another long label",
"This also")
labels(data)
labels(data, abbreviate = TRUE, minlength = 10)
############################################################
### Data manipulations
## reorder dataset:
tmp <- data2[, c(1, 4, 3, 2)]
labels(tmp)
## labels are kept and order is updated
## subsetting to single variables:
labels(tmp[, 2]) ## not working as tmp[, 2] drops to vector
## note that the label still exists but cannot be extracted
## using labels.default()
str(tmp[, 2])
labels(tmp[, 2, drop = FALSE]) ## prevent dropping
## one can also cbind labeled data.frame objects:
labels(cbind(data, tmp[, 2]))
## or better:
labels(cbind(data, tmp[, 2, drop = FALSE]))
## or rbind labeled.data.set objects:
labels(rbind(data, tmp[, -2]))
############################################################
### Plotting data sets
## plot the data auto"magically"; numerics as boxplot, factors as barplots
par(mfrow = c(2,2))
plot(data2)
## a single plot
plot(data2, variables = "a")
## grouped plot
plot(data2, variables = "a", by = "z")
## make "c" a factor and plot "c" vs. "z"
data2$c <- as.factor(data2$c)
plot(data2, variables = "c", by = "z")
## the same
plot(data2, variables = 3, by = 4)
## plot everithing against "b"
## (grouped boxplots, stacked barplots or scatterplots)
plot(data2, with = "b")