optMonteCarlo {AlgDesign} | R Documentation |
Optimal design via Monte Carlo
Description
Finds a design using the specified criterion via Federov's algorithm applied to a random subset of all possible candidate points.
Usage
optMonteCarlo(frml,data,nTrials,approximate=FALSE,criterion="D",evaluateI=FALSE,
space=NULL,mixtureSum=1,constraints=NULL,RandomStart=TRUE,nRepeats=5,nCand,
nCandNull,DFrac=1,CFrac=1,args=FALSE)
Arguments
frml |
Required: A formula starting with ~ which will be used with model.matrix() to create a model matrix. If there are mixture variables, the constant term is suppressed. |
data |
Required: A data frame with 7 or 8 columns. See details below for specifics |
nTrials |
number trials in design – must be greater than the number of terms in the model, if missing will be set to the number of model terms in the model plus five. |
approximate |
When FALSE, an exact design in nTrails will be calculated. When TRUE the proportions for an approximate theory design will be calculated. If nTrials is set, any proportion less than 1/(2*maxIteration) will be discarded before the proportions are efficiently rounded, otherwise all non-zero proportions will be shown: these are the support points. |
criterion |
"D", "A", or "I" |
evaluateI |
TRUE if I is to be evaluated in addition to the other criteria – slower because of calculations for I |
space |
If the criterion is "I" or evaluate I is true, the space over which the I criterion is to be evaluated may be input. It should be a matrix with the same column types and names as in data. If space is not input the evaluation will be done over the space described by data. |
constraints |
A function taking a vector argument with length equal to the number of variables, and returning TRUE if the vector is inside the constrained region |
mixtureSum |
The mixture variables, if any, will sum to this value. |
RandomStart |
When TRUE, the starting design will be chosen at random, otherwise nullification will be used. Note: the nullifcation used here is different and much slower than that in optFederov(). |
nRepeats |
number of times to retry the entire process |
nCand |
number of candidate points to generate, if missing, it will be 10 times the number of terms |
nCandNull |
Number of candidate points to use for nullification. If missing it will be set to nCand |
DFrac |
Fraction of design used in search: 1 uses all of them, 0 only the one with the smallest variance |
CFrac |
Fraction of candidate set searched : 1 uses all of them, 0 only the one with the largest variance |
args |
If TRUE, the actual arguments to the function including the starting random number seed will be output. |
Details
The columns of the input data frame are as follows. The columns need not be named. It is probably best to avoid naming the variables with single letters, especially "I" – use paste(), as in the examples. For each variable nLevels are randomly generated between low and high, inclusive, and then rounded with round. For integer levels, round should be set to 0.
- var:
The names of the variables.
- low:
The lower limit of the range for each variable. Ignored for mixtures.
- high:
The upper limit of the range for each variable. Ignored for mixtures.
- center:
The centering value for each variable. Ignored for mixtures.
- nLevels:
The number of levels for each variable. Ignored for mixture variables.
- round:
The number of decimal digits for the levels. The levels are randomly and uniformly chosen between low and high, and this parameter controls the number of trailing digits. The max value for mixture variables in this vector is used to round all mixture variables.
- factor:
TRUE, FALSE depending on whether or not the variable is a factor. Note: other columns will be reset to conform to a nLevels factor.
- mix:
TRUE if the variable is a mixture variable. This column may be omitted if there are no mixture variables.
Candidate lists required by optFederov()
increase with the number of
variables, and can easily exceed storage capacity and can require
excessive amounts of time to process. To overcome this problem,
optMonteCarlo()
, generates at random nCand
points from a putative
candidate list.
For non-mixture variables, optMonteCarlo()
samples from the putative
candidate list by choosing random levels inside the limits given by
low
and high
in data
. These are rounded to the
number of levels given by nLevels
in data
and to the
number of decimal digits given by round
in data
.
For mixture variables, optMonteCarlo()
samples from the putative
candidate list by choosing random levels between 0 and 1, rounded to
the maximum in the round
column of data
, and such that the sum over
all variables is equal to mixtureSum
.
If a constraint function is supplied in Constraints
, it is
applied, and results which do not meet the constraint are
discarded. The constraint function should be written to process
uncentered variables.
The above procedures are repeated until nCand
candidate points
are found.
Nullification, successively adds points to a design until n points are
found. This is the same procedure that is in optFederov
except
that each new point is selected from a new sampling of the putative
candidate points. In general, this will produce better designs that
those from a random start.
The entire process is repeated nRepeats
times, and the best
result is reported. The methodology compares favorably with an
exhaustive search where the entire candidate list is searched by
optFederov()
.
The random numbers used in these calculations are controlled by the usual R random number mechanism.
A vignette giving further details is availble. To access it, type
vignette("AlgDesign")
Value
The output is the same list as from optFederov
, but the criteria
values are relative to the randomly chosen subsets of the putative candidate
space. In general, they should not differ greatly from those obtained by an
exhaustive search.
D |
The kth root of the generalized variance: |
A |
The average coefficient variance: |
I |
The average prediction variance over X, which can be shown to be |
Ge |
The minimax normalized variance over X, expressed as an efficiency with respect to the optimal approximate
theory design. It is defined as |
Dea |
A lower bound on |
Design |
The design. |
args |
A list of the actual arguments used in this call. |
Author(s)
Bob Wheeler bwheelerg@gmail.com
Please cite this program as follows:
Wheeler, R.E. (2004). optMonteCarlo. AlgDesign. The R project for statistical computing https://www.r-project.org/
Examples
# EXAMPLE 1
# The data.frame in data might look like the following:
data<-data.frame(var=paste("X",1:6,sep=""),low=c(1,1,1,0,0,0),
high=c(3,3,3,1,1,1),center=c(2,2,2,0,0,0),nLevels=3,
round=1,factor=0,mix=c(FALSE,FALSE,FALSE,TRUE,TRUE,TRUE))
data
# and the design:
optMonteCarlo(~(X1+X2+X3)^2+X4+X5+X6,data)
# Example 2
# Standard designs will often be produced, just as
# they will with optFederov(). For example,
# a half fraction of a 2^4:
data<-data.frame(paste("X",1:4,sep=""),-1,1,0,2,0,0)
data
optMonteCarlo(~.,data,nTrials=8)
# Example 3
# optMonteCarlo() can treat much larger problems than can
# optFederov(). For example, optFederov()
# requires a candidate list of 3^20 points for
# a 20 variable, 3 level candidate list -- about
# 25 gigabytes. If the model is quadratic, this must
# be multiplied by about 12. There are other storage
# requirements internal to optFederov() which easily
# double this value. optMonteCarlo() since it only samples
# from the putative candidate list, has no difficulty
# with a problem of this size. The criterion values
# appearing in the output of optMonteCarlo() are based on
# these samples, but their values seem to be reasonable
# correct, as the following shows: (These are commented
# out for those who have a slow machine.)
dat<-gen.factorial(levels=3,nVar=8)
#desF<-optFederov(~quad(.),dat,eval=TRUE)
#desF[1:5]
data<-data.frame(paste("X",1:8,sep=""),-1,1,0,3,0,0)
#desH<-optMonteCarlo(~quad(.),data,Rand=FALSE,eval=TRUE)
#desH[1:5]
# The following is a 20 variable quadratic. Uncomment
# and wait a while, even if you have a fast machine.
# Note: nRepeats has been changed from its default.
# Note: criterion values for exact designs are often
# far from approximate theory optima; hence, Ge and De
# will be small.
data<-data.frame(paste("X",1:20,sep=""),-1,1,0,3,0,0)
#desBig<-optMonteCarlo(~quad(.),data,nRepeats=1)
# The following will produce improved criterion values
#desNBig<-optMonteCarlo(~quad(.),data,Rand=FALSE,nRepeats=1)
# EXAMPLE 4
# Practically infeasible combinations of variable are
# common. Designs may be produced which avoid such
# combinations by using a constraint function. Suppose,
# for example that one corner of a cubic box is not
# feasible, then the following will produce a design
# that makes no use of this corner.
Constraints<-function(x){!(x[1]>0.75 && x[2]>0.75)}
data<-data.frame(paste("X",1:4,sep=""),-1,1,0,3,0,0)
desC<-optMonteCarlo(~.,data,con=Constraints)
# The above just removes a corner. Increasing the
# number of levels will remove points along the
# boundary.
data<-data.frame(paste("X",1:4,sep=""),-1,1,0,11,3,0)
desC2<-optMonteCarlo(~.,data,con=Constraints)