svyby {survey} R Documentation

## Survey statistics on subsets

### Description

Compute survey statistics on subsets of a survey defined by factors.

### Usage

```svyby(formula, by ,design,...)
## Default S3 method:
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
drop.empty.groups=TRUE, covmat=FALSE, return.replicates=FALSE,
na.rm.by=FALSE, na.rm.all=FALSE,
multicore=getOption("survey.multicore"))
## S3 method for class 'survey.design2'
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
drop.empty.groups=TRUE, covmat=FALSE, influence=covmat,
na.rm.by=FALSE, na.rm.all=FALSE, multicore=getOption("survey.multicore"))

## S3 method for class 'svyby'
SE(object,...)
## S3 method for class 'svyby'
deff(object,...)
## S3 method for class 'svyby'
coef(object,...)
## S3 method for class 'svyby'
confint(object,  parm, level = 0.95,df =Inf,...)
unwtd.count(x, design, ...)
svybys(formula,  bys,  design, FUN, ...)
```

### Arguments

 `formula,x` A formula specifying the variables to pass to `FUN` (or a matrix, data frame, or vector) `by` A formula specifying factors that define subsets, or a list of factors. `design` A `svydesign` or `svrepdesign` object `FUN` A function taking a formula and survey design object as its first two arguments. `...` Other arguments to `FUN`. NOTE: if any of the names of these are partial matches to `formula`,`by`, or `design`, you must specify the `formula`,`by`, or `design` argument by name, not just by position. `deff` Request a design effect from `FUN` `keep.var` If `FUN` returns a `svystat` object, extract standard errors from it `keep.names` Define row names based on the subsets `verbose` If `TRUE`, print a label for each subset as it is processed. `vartype` Report variability as one or more of standard error, confidence interval, coefficient of variation, percent coefficient of variation, or variance `drop.empty.groups` If `FALSE`, report `NA` for empty groups, if `TRUE` drop them from the output `na.rm.by` If true, omit groups defined by `NA` values of the `by` variables

.

 `na.rm.all` If true, check for groups with no non-missing observations for variables defined by `formula` and treat these groups as empty `covmat` If `TRUE`, compute covariances between estimates for different subsets. Allows `svycontrast` to be used on output. Requires that `FUN` supports either `return.replicates=TRUE` or `influence=TRUE` `return.replicates` Only for replicate-weight designs. If `TRUE`, return all the replicates as the "replicates" attribute of the result `influence` Return the influence functions of the result `multicore` Use `multicore` package to distribute subsets over multiple processors? `parm` a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. `level` the confidence level required. `df` degrees of freedom for t-distribution in confidence interval, use `degf(design)` for number of PSUs minus number of strata `object` An object of class `"svyby"` `bys` one-sided formula with each term specifying a grouping (rather than being combined to give a grouping

### Details

The variance type "ci" asks for confidence intervals, which are produced by `confint`. In some cases additional options to `FUN` will be needed to produce confidence intervals, for example, `svyquantile` needs `ci=TRUE` or `keep.var=FALSE`.

`unwtd.count` is designed to be passed to `svyby` to report the number of non-missing observations in each subset. Observations with exactly zero weight will also be counted as missing, since that's how subsets are implemented for some designs.

Parallel processing with `multicore=TRUE` is useful only for fairly large problems and on computers with sufficient memory. The `multicore` package is incompatible with some GUIs, although the Mac Aqua GUI appears to be safe.

The variant `svybys` creates a separate table for each term in `bys` rather than creating a joint table.

### Value

An object of class `"svyby"`: a data frame showing the factors and the results of `FUN`.

For `unwtd.count`, the unweighted number of non-missing observations in the data matrix specified by `x` for the design.

### Note

The function works by making a lot of calls of the form `FUN(formula, subset(design, by==i))`, where `formula` is re-evaluated in each subset, so it is unwise to use data-dependent terms in `formula`. In particular, ```svyby(~factor(a), ~b, design=d, svymean)```, will create factor variables whose levels are only those values of `a` present in each subset. Either use `update.survey.design` to add variables to the design object instead or specify the levels explicitly in the call to `factor`.

### Note

Asking for a design effect (`deff=TRUE`) from a function that does not produce one will cause an error or incorrect formatting of the output. The same will occur with `keep.var=TRUE` if the function does not compute a standard error.

`svytable` and `ftable.svystat` for contingency tables, `ftable.svyby` for pretty-printing of `svyby`

### Examples

```data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

svyby(~api99, ~stype, dclus1, svymean)
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5,ci=TRUE,vartype="ci")
## without ci=TRUE svyquantile does not compute standard errors
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5, keep.var=FALSE)
svyby(~api99, list(school.type=apiclus1\$stype), dclus1, svymean)
svyby(~api99+api00, ~stype, dclus1, svymean, deff=TRUE,vartype="ci")
svyby(~api99+api00, ~stype+sch.wide, dclus1, svymean, keep.var=FALSE)
## report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, dclus1, unwtd.count, keep.var=FALSE)

rclus1<-as.svrepdesign(dclus1)

svyby(~api99, ~stype, rclus1, svymean)
svyby(~api99, ~stype, rclus1, svyquantile, quantiles=0.5)
svyby(~api99, list(school.type=apiclus1\$stype), rclus1, svymean, vartype="cv")
svyby(~enroll,~stype, rclus1,svytotal, deff=TRUE)
svyby(~api99+api00, ~stype+sch.wide, rclus1, svymean, keep.var=FALSE)
##report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, rclus1, unwtd.count, keep.var=FALSE)

## comparing subgroups using covmat=TRUE
mns<-svyby(~api99, ~stype, rclus1, svymean,covmat=TRUE)
vcov(mns)
svycontrast(mns, c(E = 1, M = -1))

str(svyby(~api99, ~stype, rclus1, svymean,return.replicates=TRUE))

tots<-svyby(~enroll, ~stype, dclus1, svytotal,covmat=TRUE)
vcov(tots)
svycontrast(tots, quote(E/H))

## comparing subgroups uses the delta method unless replicates are present
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE,return.replicates=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))

## extractor functions
(a<-svyby(~enroll, ~stype, rclus1, svytotal, deff=TRUE, verbose=TRUE,
vartype=c("se","cv","cvpct","var")))
deff(a)
SE(a)
cv(a)
coef(a)
confint(a, df=degf(rclus1))

## ratio estimates
svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio)

ratios<-svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio,covmat=TRUE)
vcov(ratios)

## empty groups
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean)
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean,drop.empty.groups=FALSE)

## Multiple tables
svybys(~api00,~comp.imp+sch.wide,design=dclus1,svymean)

```

