tweights {tboot} | R Documentation |
Function tweights
Description
Returns a vector p
of resampling probabilities
such that the column means of tboot(dataset = dataset, p = p)
equals target
on average.
Usage
tweights(dataset, target = apply(dataset, 2, mean), distance = "klqp",
maxit = 1000, tol = 1e-08, warningcut = 0.05, silent = FALSE,
Nindependent = 0)
Arguments
dataset |
Data frame or matrix to use to find row weights. |
target |
Numeric vector of target column means. If the 'target' is named, then all elements of names(target) should be in the dataset. |
distance |
The distance to minimize. Must be either 'euchlidean,' 'klqp' or 'klpq' (i.e. Kullback-Leibler). 'klqp' which is exponential tilting is recommended. |
maxit |
Defines the maximum number of iterations for optimizing 'kl' distance. |
tol |
Tolerance. If the achieved mean is to0 far from the target (i.e. as defined by tol) an error will be thrown. |
warningcut |
Sets the cutoff for determining when a large weight will trigger a warning. |
silent |
Allows silencing of some messages. |
Nindependent |
Assumes the input also includes 'Nindependent' samples with independent columns. See details. |
Details
Let p_i = 1/n
be the probability of sampling subject i
from a dataset with n
individuals (i.e. rows of the dataset) in the classic resampling with replacement scheme.
Also, let q_i
be the probability of sampling subject i
from a dataset with n
individuals in our new resampling scheme. Let d(q,p)
represent a distance between the two resampling schemes. The tweights
function seeks to solve the problem:
q = argmin_p d(q,p)
Subject to the constraint that:
sum_i q_i = 1
and
dataset' q = target
where dataset is a n x K matrix of variables input to the function.
d_{euclidian}(q,p) = sqrt( \sum_i (p_i-q_i)^2 )
d_{kl}(q,p) = \sum_i (log(p_i) - log(q_i))
Optimization for Euclidean distance is a quadratic program and utilizes the ipop function in kernLab. Optimization for the others utilize a Newton-Raphson type iterative algorithm.
If the original target cannot be achieved. Something close to the original target will be selected. A warning will be produced and the new target displayed.
The 'Nindependent' option augments the dataset by assuming some additional specified number of patients. These patients are assumed to made up of a random bootstrapped sample from the dataset for each variable marginally leading to independent variables.
Value
An object of type tweights
. This object contains the following components:
- weights
Tilted weights for resampling
- originalTarget
Will be null if target was not changed.
- target
Actual target that was attempted.
- achievedMean
Achieved mean from tilting.
- dataset
Inputed dataset.
- X
Reformated dataset.
- Nindependent
Inputed 'Nindependent' option.
See Also
Examples
target=c(Sepal.Length=5.5, Sepal.Width=2.9, Petal.Length=3.4)
w = tweights(dataset = iris, target = target, silent = TRUE)
simulated_data = tboot(nrow = 1000, weights = w)