{binGroup}R Documentation

Fitting Group Testing Models in Matrix Pooling Setting

Description is a function to fit the group testing regression model in the matrix pooling setting specified through a symbolic description of the linear predictor and descriptions of the group testing setting.

Usage, data, coln, rown, arrayn, retest = NULL,
 sens = 1, spec = 1,  linkf = c("logit", "probit", "cloglog"),
 sens.ind = NULL, spec.ind = NULL,  start = NULL,
 control = gt.control(...), ...), row.resp, X, coln, rown, sqn, ret, sens, spec,
 linkf, sens.ind, spec.ind, start = NULL, control = gt.control())



an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.


an optional data frame, list or environment (or object coercible by to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which is called.


a vector, list or data frame that specifies column group number for each sample


a vector, list or data frame that specifies row group number for each sample


a vector, list or data frame that specifies array number for each sample


a vector, list or data frame of individual retest results. A 0 denotes negative and 1 denotes positive. A NA denotes that no retest is performed for that individual. Default value is NULL for no retests.


sensitivity of the group tests, set to be 1 by default.


specificity of the group tests, set to be 1 by default.


sensitivity of the individual retests, set to be equal to sens if not specified otherwise.


specificity of the individual retests, set to be equal to spec if not specified otherwise.


a character string specifying one of the three link functions for a binomial model: "logit" (default) or "probit" or "cloglog".


starting values for the parameters in the linear predictor.


a list of parameters for controlling the fitting process. See the documentation for gt.control for details.


For vector of group responses of column pools for all samples. 0 denotes negative and 1 denotes positive.


For vector of group responses of row pools for all samples. 0 denotes negative and 1 denotes positive.


For the design matrix of the covariates.


For a vector that specifies array number


For a vector containing individual retest results


arguments to be passed to gt.control: see argument control


With matrix pooling, individual samples are placed in a matrix-like grid where samples are pooled within each row and within each column. This leads to two kinds of group responses: row and column group responses. Thus, a typical predictor has the form cbind(col.resp, row.resp) ~ covariates where col.resp is the (numeric) column group response vector and row.resp is the (numeric) row group response vector. The covariates term is a series of terms which specifies a linear predictor for individual responses. Note that it is actually the unobserved individual responses, not the observed group responses, which are modeled by the covariates. In col.resp and row.resp, a 0 denotes a negative response and 1 denotes a positive response, where the probability of an individual positive response is being modeled directly. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on; to avoid this pass a terms object as the formula.

A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second. is the workhorse function. It applies Xie's EM algorithm to the likelihood function written in terms of the unobserved individual responses. In each E step, the Gibbs sampling technique is used to estimate the conditional probabilities. Because of the large number of Gibbs samples needed to achieve convergence, the model fitting process could be quite slow, especially when multiple positive rows and columns are observed. In this case, we can either increase the Gibbs sample size to help achieve convergence or loosen the convergence criteria by increasing tol at the expense of perhaps poorer estimates. If follow-up retests are performed, the retest results going into the model will help achieve convergence faster with the same Gibbs sample size and convergence criteria. In each M step, we use to update the parameter estimates

Value returns an object of class "" which inherits from the class "gt". See later in this section. The function summary (i.e., can be used to obtain or print a summary of the results. The group testing function predict (i.e., can be used to make predictions on "" objects. An object of class "" is a list containing at least the following components:


a named vector of coefficients.


estimated Hessian matrix of the negative log likelihood function, serves as an estimate of the information matrix


the number of iterations performed in the EM algorithm.


the number of Gibbs samples generated in each E step.


the matched call.


the formula supplied.


the terms object used.


the link function used in the model.


Boan Zhang


Xie, M. (2001), Regression analysis of group testing samples, Statistics in Medicine, 20, 1957-1969.

See Also and for methods. gtreg for the group testing regression model in the simple pooling setting.


## --- Continuing the Example from  '?':
# 5*6 and 4*5 matrix
sa1a<,0.1), n.row=c(5,4), n.col=c(6,5),
 sens=0.95, spec=0.95)

## Not run: 
fit1 <- = cbind(col.resp, row.resp) ~ x, data = sa1, 
                 coln = coln, rown = rown, arrayn = arrayn, 
                 sens = 0.95, spec = 0.95, tol = 0.005, n.gibbs = 2000, trace = TRUE)

## End(Not run)

## Here is an example of how long this fitting process may take. For the 
## following simulated data, it takes a computer with 2.2GHZ processor and 
## 3GB RAM about 6 minutes to achieve convergence.
sa2a<,0.1), n.row=c(10,10,10,10), n.col=c(10,10,10,10), 
             sens=0.95, spec=0.95)

## Not run: 
fit2 <- = cbind(col.resp, row.resp) ~ x, data = sa2, 
                 coln = coln, rown = rown, arrayn = arrayn, retest = retest,
                 sens = 0.95, spec = 0.95, start = c(-7, 0.1), tol = 0.005)


## End(Not run)

[Package binGroup version 2.2-1 Index]