gtreg.mp {binGroup}R Documentation

Fitting Group Testing Models in Matrix Pooling Setting

Description

gtreg.mp is a function to fit the group testing regression model in the matrix pooling setting specified through a symbolic description of the linear predictor and descriptions of the group testing setting.

Usage

gtreg.mp(formula, data, coln, rown, arrayn, retest = NULL,
 sens = 1, spec = 1,  linkf = c("logit", "probit", "cloglog"),
 sens.ind = NULL, spec.ind = NULL,  start = NULL,
 control = gt.control(...), ...)

EM.mp(col.resp, row.resp, X, coln, rown, sqn, ret, sens, spec,
 linkf, sens.ind, spec.ind, start = NULL, control = gt.control())

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which gtreg.mp is called.

coln

a vector, list or data frame that specifies column group number for each sample

rown

a vector, list or data frame that specifies row group number for each sample

arrayn

a vector, list or data frame that specifies array number for each sample

retest

a vector, list or data frame of individual retest results. A 0 denotes negative and 1 denotes positive. A NA denotes that no retest is performed for that individual. Default value is NULL for no retests.

sens

sensitivity of the group tests, set to be 1 by default.

spec

specificity of the group tests, set to be 1 by default.

sens.ind

sensitivity of the individual retests, set to be equal to sens if not specified otherwise.

spec.ind

specificity of the individual retests, set to be equal to spec if not specified otherwise.

linkf

a character string specifying one of the three link functions for a binomial model: "logit" (default) or "probit" or "cloglog".

start

starting values for the parameters in the linear predictor.

control

a list of parameters for controlling the fitting process. See the documentation for gt.control for details.

col.resp

For EM.mp: vector of group responses of column pools for all samples. 0 denotes negative and 1 denotes positive.

row.resp

For EM.mp: vector of group responses of row pools for all samples. 0 denotes negative and 1 denotes positive.

X

For EM.mp: the design matrix of the covariates.

sqn

For EM.mp: a vector that specifies array number

ret

For EM.mp: a vector containing individual retest results

...

arguments to be passed to gt.control: see argument control

Details

With matrix pooling, individual samples are placed in a matrix-like grid where samples are pooled within each row and within each column. This leads to two kinds of group responses: row and column group responses. Thus, a typical predictor has the form cbind(col.resp, row.resp) ~ covariates where col.resp is the (numeric) column group response vector and row.resp is the (numeric) row group response vector. The covariates term is a series of terms which specifies a linear predictor for individual responses. Note that it is actually the unobserved individual responses, not the observed group responses, which are modeled by the covariates. In col.resp and row.resp, a 0 denotes a negative response and 1 denotes a positive response, where the probability of an individual positive response is being modeled directly. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on; to avoid this pass a terms object as the formula.

A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second.

EM.mp is the workhorse function. It applies Xie's EM algorithm to the likelihood function written in terms of the unobserved individual responses. In each E step, the Gibbs sampling technique is used to estimate the conditional probabilities. Because of the large number of Gibbs samples needed to achieve convergence, the model fitting process could be quite slow, especially when multiple positive rows and columns are observed. In this case, we can either increase the Gibbs sample size to help achieve convergence or loosen the convergence criteria by increasing tol at the expense of perhaps poorer estimates. If follow-up retests are performed, the retest results going into the model will help achieve convergence faster with the same Gibbs sample size and convergence criteria. In each M step, we use glm.fit to update the parameter estimates

Value

gtreg.mp returns an object of class "gt.mp" which inherits from the class "gt". See later in this section. The function summary (i.e., summary.gt.mp) can be used to obtain or print a summary of the results. The group testing function predict (i.e., predict.gt) can be used to make predictions on "gt.mp" objects. An object of class "gt.mp" is a list containing at least the following components:

coefficients

a named vector of coefficients.

hessian

estimated Hessian matrix of the negative log likelihood function, serves as an estimate of the information matrix

counts

the number of iterations performed in the EM algorithm.

Gibbs.sample.size

the number of Gibbs samples generated in each E step.

call

the matched call.

formula

the formula supplied.

terms

the terms object used.

link

the link function used in the model.

Author(s)

Boan Zhang

References

Xie, M. (2001), Regression analysis of group testing samples, Statistics in Medicine, 20, 1957-1969.

See Also

summary.gt.mp and predict.gt for gt.mp methods. gtreg for the group testing regression model in the simple pooling setting.

Examples

## --- Continuing the Example from  '?sim.mp':
# 5*6 and 4*5 matrix
set.seed(9128)
sa1a<-sim.mp(par=c(-7,0.1), n.row=c(5,4), n.col=c(6,5),
 sens=0.95, spec=0.95)
sa1<-sa1a$dframe


## Not run: 
fit1 <- gtreg.mp(formula = cbind(col.resp, row.resp) ~ x, data = sa1, 
                 coln = coln, rown = rown, arrayn = arrayn, 
                 sens = 0.95, spec = 0.95, tol = 0.005, n.gibbs = 2000, trace = TRUE)
fit1
summary(fit1)


## End(Not run)

## Here is an example of how long this fitting process may take. For the 
## following simulated data, it takes a computer with 2.2GHZ processor and 
## 3GB RAM about 6 minutes to achieve convergence.
set.seed(9012)
sa2a<-sim.mp(par=c(-7,0.1), n.row=c(10,10,10,10), n.col=c(10,10,10,10), 
             sens=0.95, spec=0.95)
sa2<-sa2a$dframe

## Not run: 
fit2 <- gtreg.mp(formula = cbind(col.resp, row.resp) ~ x, data = sa2, 
                 coln = coln, rown = rown, arrayn = arrayn, retest = retest,
                 sens = 0.95, spec = 0.95, start = c(-7, 0.1), tol = 0.005)

fit2
summary(fit2)


## End(Not run)


[Package binGroup version 2.2-1 Index]