data.set {memisc} | R Documentation |
Data Set Objects
Description
"data.set"
objects are collections of "item"
objects,
with similar semantics as data frames. They are distinguished
from data frames so that coercion by as.data.fame
leads to a data frame that contains only vectors and factors.
Nevertheless most methods for data frames are inherited by
data sets, except for the method for the within
generic
function. For the within
method for data sets, see the details section.
Thus data preparation using data sets retains all informations about item annotations, labels, missing values etc. While (mostly automatic) conversion of data sets into data frames makes the data amenable for the use of R's statistical functions.
dsView
is a function that displays data sets in a similar
manner as View
displays data frames. (View
works
with data sets as well, but changes them first into data frames.)
Usage
data.set(...,row.names = NULL, check.rows = FALSE, check.names = TRUE,
stringsAsFactors = FALSE, document = NULL)
as.data.set(x, row.names=NULL, ...)
## S4 method for signature 'list'
as.data.set(x,row.names=NULL,...)
is.data.set(x)
## S3 method for class 'data.set'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)
## S4 method for signature 'data.set'
within(data, expr, ...)
dsView(x)
## S4 method for signature 'data.set'
head(x,n=20,...)
## S4 method for signature 'data.set'
tail(x,n=20,...)
Arguments
... |
For the |
row.names , check.rows , check.names , stringsAsFactors , optional |
arguments
as in |
document |
NULL or an optional character vector that contains documenation of the data. |
x |
for |
data |
a data set, that is, an object of class "data.set". |
expr |
an expression, or several expressions enclosed in curly braces. |
n |
integer; the number of rows to be shown by |
Details
The as.data.frame
method for data sets is just a copy
of the method for list. Consequently, all items in the data set
are coerced in accordance to their measurement
setting,
see as.vector,item-method
and measurement
.
The within
method for data sets has the same effect as
the within
method for data frames, apart from two differences:
all results of the computations are coerced into items if
they have the appropriate length, otherwise, they are automatically
dropped.
Currently only one method for the generic function as.data.set
is defined: a method for "importer" objects.
Value
data.set
and the within
method for
data sets returns a "data.set" object, is.data.set
returns a logical value, and as.data.frame
returns
a data frame.
Examples
Data <- data.set(
vote = sample(c(1,2,3,8,9,97,99),size=300,replace=TRUE),
region = sample(c(rep(1,3),rep(2,2),3,99),size=300,replace=TRUE),
income = exp(rnorm(300,sd=.7))*2000
)
Data <- within(Data,{
description(vote) <- "Vote intention"
description(region) <- "Region of residence"
description(income) <- "Household income"
wording(vote) <- "If a general election would take place next tuesday,
the candidate of which party would you vote for?"
wording(income) <- "All things taken into account, how much do all
household members earn in sum?"
foreach(x=c(vote,region),{
measurement(x) <- "nominal"
})
measurement(income) <- "ratio"
labels(vote) <- c(
Conservatives = 1,
Labour = 2,
"Liberal Democrats" = 3,
"Don't know" = 8,
"Answer refused" = 9,
"Not applicable" = 97,
"Not asked in survey" = 99)
labels(region) <- c(
England = 1,
Scotland = 2,
Wales = 3,
"Not applicable" = 97,
"Not asked in survey" = 99)
foreach(x=c(vote,region,income),{
annotation(x)["Remark"] <- "This is not a real survey item, of course ..."
})
missing.values(vote) <- c(8,9,97,99)
missing.values(region) <- c(97,99)
# These to variables do not appear in the
# the resulting data set, since they have the wrong length.
junk1 <- 1:5
junk2 <- matrix(5,4,4)
})
# Since data sets may be huge, only a
# part of them are 'show'n
Data
## Not run:
# If we insist on seeing all, we can use 'print' instead
print(Data)
## End(Not run)
str(Data)
summary(Data)
## Not run:
# If we want to 'View' a data set we can use 'dsView'
dsView(Data)
# Works also, but changes the data set into a data frame first:
View(Data)
## End(Not run)
Data[[1]]
Data[1,]
head(as.data.frame(Data))
EnglandData <- subset(Data,region == "England")
EnglandData
xtabs(~vote+region,data=Data)
xtabs(~vote+region,data=within(Data, vote <- include.missings(vote)))