wtd.stats {Hmisc} | R Documentation |

## Weighted Statistical Estimates

### Description

These functions compute various weighted versions of standard
estimators. In most cases the `weights`

vector is a vector the same
length of `x`

, containing frequency counts that in effect expand `x`

by these counts. `weights`

can also be sampling weights, in which
setting `normwt`

to `TRUE`

will often be appropriate. This results in
making `weights`

sum to the length of the non-missing elements in
`x`

. `normwt=TRUE`

thus reflects the fact that the true sample size is
the length of the `x`

vector and not the sum of the original values of
`weights`

(which would be appropriate had `normwt=FALSE`

). When `weights`

is all ones, the estimates are all identical to unweighted estimates
(unless one of the non-default quantile estimation options is
specified to `wtd.quantile`

). When missing data have already been
deleted for, `x`

, `weights`

, and (in the case of `wtd.loess.noiter`

) `y`

,
specifying `na.rm=FALSE`

will save computation time. Omitting the
`weights`

argument or specifying `NULL`

or a zero-length vector will
result in the usual unweighted estimates.

`wtd.mean`

, `wtd.var`

, and `wtd.quantile`

compute
weighted means, variances, and quantiles, respectively. `wtd.Ecdf`

computes a weighted empirical distribution function. `wtd.table`

computes a weighted frequency table (although only one stratification
variable is supported at present). `wtd.rank`

computes weighted
ranks, using mid–ranks for ties. This can be used to obtain Wilcoxon
tests and rank correlation coefficients. `wtd.loess.noiter`

is a
weighted version of `loess.smooth`

when no iterations for outlier
rejection are desired. This results in especially good smoothing when
`y`

is binary. `wtd.quantile`

removes any observations with
zero weight at the beginning. Previously, these were changing the
quantile estimates.

`num.denom.setup`

is a utility function that allows one to deal with
observations containing numbers of events and numbers of trials, by
outputting two observations when the number of events and non-events
(trials - events) exceed zero. A vector of subscripts is generated
that will do the proper duplications of observations, and a new binary
variable `y`

is created along with usual cell frequencies (`weights`

)
for each of the `y=0`

, `y=1`

cells per observation.

### Usage

```
wtd.mean(x, weights=NULL, normwt="ignored", na.rm=TRUE)
wtd.var(x, weights=NULL, normwt=FALSE, na.rm=TRUE,
method=c('unbiased', 'ML'))
wtd.quantile(x, weights=NULL, probs=c(0, .25, .5, .75, 1),
type=c('quantile','(i-1)/(n-1)','i/(n+1)','i/n'),
normwt=FALSE, na.rm=TRUE)
wtd.Ecdf(x, weights=NULL,
type=c('i/n','(i-1)/(n-1)','i/(n+1)'),
normwt=FALSE, na.rm=TRUE)
wtd.table(x, weights=NULL, type=c('list','table'),
normwt=FALSE, na.rm=TRUE)
wtd.rank(x, weights=NULL, normwt=FALSE, na.rm=TRUE)
wtd.loess.noiter(x, y, weights=rep(1,n),
span=2/3, degree=1, cell=.13333,
type=c('all','ordered all','evaluate'),
evaluation=100, na.rm=TRUE)
num.denom.setup(num, denom)
```

### Arguments

`x` |
a numeric vector (may be a character or |

`num` |
vector of numerator frequencies |

`denom` |
vector of denominators (numbers of trials) |

`weights` |
a numeric vector of weights |

`normwt` |
specify |

`na.rm` |
set to |

`method` |
determines the estimator type; if |

`probs` |
a vector of quantiles to compute. Default is 0 (min), .25, .5, .75, 1 (max). |

`type` |
For |

`y` |
a numeric vector the same length as |

`span` , `degree` , `cell` , `evaluation` |
see |

### Details

The functions correctly combine weights of observations having
duplicate values of `x`

before computing estimates.

When `normwt=FALSE`

the weighted variance will not equal the
unweighted variance even if the weights are identical. That is because
of the subtraction of 1 from the sum of the weights in the denominator
of the variance formula. If you want the weighted variance to equal the
unweighted variance when weights do not vary, use `normwt=TRUE`

.
The articles by Gatz and Smith discuss alternative approaches, to arrive
at estimators of the standard error of a weighted mean.

`wtd.rank`

does not handle NAs as elegantly as `rank`

if
`weights`

is specified.

### Value

`wtd.mean`

and `wtd.var`

return scalars. `wtd.quantile`

returns a
vector the same length as `probs`

. `wtd.Ecdf`

returns a list whose
elements `x`

and `Ecdf`

correspond to unique sorted values of `x`

.
If the first CDF estimate is greater than zero, a point (min(x),0) is
placed at the beginning of the estimates.
See above for `wtd.table`

. `wtd.rank`

returns a vector the same
length as `x`

(after removal of NAs, depending on `na.rm`

). See above
for `wtd.loess.noiter`

.

### Author(s)

Frank Harrell

Department of Biostatistics

Vanderbilt University School of Medicine

fh@fharrell.com

Benjamin Tyner

btyner@gmail.com

### References

Research Triangle Institute (1995): SUDAAN User's Manual, Release 6.40, pp. 8-16 to 8-17.

Gatz DF, Smith L (1995): The standard error of a weighted mean concentration–I. Bootstrapping vs other methods. Atmospheric Env 11:1185-1193.

Gatz DF, Smith L (1995): The standard error of a weighted mean concentration–II. Estimating confidence intervals. Atmospheric Env 29:1195-1200.

https://en.wikipedia.org/wiki/Weighted_arithmetic_mean

### See Also

`mean`

, `var`

, `quantile`

, `table`

, `rank`

, `loess.smooth`

, `lowess`

,
`plsmo`

, `Ecdf`

, `somers2`

, `describe`

### Examples

```
set.seed(1)
x <- runif(500)
wts <- sample(1:6, 500, TRUE)
std.dev <- sqrt(wtd.var(x, wts))
wtd.quantile(x, wts)
death <- sample(0:1, 500, TRUE)
plot(wtd.loess.noiter(x, death, wts, type='evaluate'))
describe(~x, weights=wts)
# describe uses wtd.mean, wtd.quantile, wtd.table
xg <- cut2(x,g=4)
table(xg)
wtd.table(xg, wts, type='table')
# Here is a method for getting stratified weighted means
y <- runif(500)
g <- function(y) wtd.mean(y[,1],y[,2])
summarize(cbind(y, wts), llist(xg), g, stat.name='y')
# Empirically determine how methods used by wtd.quantile match with
# methods used by quantile, when all weights are unity
set.seed(1)
u <- eval(formals(wtd.quantile)$type)
v <- as.character(1:9)
r <- matrix(0, nrow=length(u), ncol=9, dimnames=list(u,v))
for(n in c(8, 13, 22, 29))
{
x <- rnorm(n)
for(i in 1:5) {
probs <- sort( runif(9))
for(wtype in u) {
wq <- wtd.quantile(x, type=wtype, weights=rep(1,length(x)), probs=probs)
for(qtype in 1:9) {
rq <- quantile(x, type=qtype, probs=probs)
r[wtype, qtype] <- max(r[wtype,qtype], max(abs(wq-rq)))
}
}
}
}
r
# Restructure data to generate a dichotomous response variable
# from records containing numbers of events and numbers of trials
num <- c(10,NA,20,0,15) # data are 10/12 NA/999 20/20 0/25 15/35
denom <- c(12,999,20,25,35)
w <- num.denom.setup(num, denom)
w
# attach(my.data.frame[w$subs,])
```

*Hmisc*version 5.1-3 Index]