optilevels {opticut} | R Documentation |
Optimal Number of Factor Levels
Description
Finds the optimal number of factor levels given the data and a model using a likelihood-based agglomerative algorithm.
Usage
optilevels(y, x, z = NULL, alpha = 0, dist = "gaussian", ...)
## S3 method for class 'optilevels'
bestmodel(object, ...)
Arguments
y |
vector of observations. |
x |
a factor or a matrix of proportions (i.e. the values 0 and 1 should have
consistent meaning across the columns, often through a unit sum constraint).
It is the user's responsibility to ensure that values supplied
for |
z |
a design matrix with predictor variables besides the one(s) defined
via the argument |
alpha |
numeric [0-1], weighting factor for calculating information criteria for model selection (i.e. IC = (1-alpha)*AIC + alpha*BIC, also referred to as CAIC: consistent AIC). |
dist |
character, distribution argument passed to underlying functions,
see listed on the help page of |
object |
fitted object. |
... |
other arguments passed to the underlying functions, see |
Value
An object of class 'optilevels' that is a list with the following elements:
"delta"
delta IC values along the selection path considering best models.
"ic"
IC values along the selection path considering best models.
"coef"
matrix of coefficients (linear predictor scale) corresponding to argument
x
along the selection path considering best models."zcoef"
matrix of coefficients (linear predictor scale) corresponding to argument
z
when notNULL
along the selection path considering best models, orNULL
."rank"
matrix ranks based on the coefficients along the selection path considering best models. Ranking uses the default
ties.method = "average"
inrank
."deltalist"
delta IC values along the selection path considering all competing models.
"iclist"
IC values along the selection path considering all competing models.
"coeflist"
matrix of coefficients (linear predictor scale) corresponding to argument
x
along the selection path considering all competing models."zcoeflist"
matrix of coefficients (linear predictor scale) corresponding to argument
z
when notNULL
along the selection path considering all competing models, orNULL
."ranklist"
matrix ranks based on the coefficients along the selection path considering all competing models.
"levels"
list of (merged) factor levels along the selection path considering best models.
"Y"
vector of observations (argument
y
)."X"
design matrix component corresponding to argument
x
."Z"
design matrix component corresponding to argument
z
."alpha"
weighting argument.
"dist"
distribution argument.
"factor"
logical, indicating if argument
x
is a factor (TRUE
) or a matrix (FALSE
).
bestmodel
returns the best supported model for further
manipulation (e.g. prediction).
Author(s)
Peter Solymos <psolymos@gmail.com>
See Also
opticut
and multicut
for fitting best binary and multi-level response models.
Examples
## --- Factor levels with Gaussian distribution
## simple example from Legendre 2013
## Indicator Species: Computation, in
## Encyclopedia of Biodiversity, Volume 4
## https://dx.doi.org/10.1016/B978-0-12-384719-5.00430-5
gr <- as.factor(paste0("X", rep(1:5, each=5)))
spp <- cbind(Species1=rep(c(4,6,5,3,2), each=5),
Species2=c(rep(c(8,4,6), each=5), 4,4,2, rep(0,7)),
Species3=rep(c(18,2,0,0,0), each=5))
rownames(spp) <- gr
## must add some noise to avoid perfect fit
spp[6, "Species1"] <- 7
spp[1, "Species3"] <- 17
spp
ol <- optilevels(spp[,"Species3"], gr)
ol[c("delta", "coef", "rank", "levels")]
## get the final factor level
gr1 <- gr
levels(gr1) <- ol$level[[length(ol$level)]]
table(gr, gr1)
## compare the models
o0 <- lm(spp[,"Species3"] ~ gr - 1)
o1 <- lm(spp[,"Species3"] ~ gr1 - 1)
data.frame(AIC(o0, o1), delta=AIC(o0, o1)$AIC - AIC(o0))
ol$delta # should be identical
## --- Proportions with Poisson distribution
## simulation
set.seed(123)
n <- 500 # number of observations
k <- 5 # number of habitat types
b <- c(-1, -0.2, -0.2, 0.5, 1)
names(b) <- LETTERS[1:k]
x <- replicate(k, exp(rnorm(n)))
x <- x / rowSums(x) # proportions
X <- model.matrix(~.-1, data=data.frame(x))
lam <- exp(drop(crossprod(t(X), b)))
y <- rpois(n, lam)
z <- optilevels(y, x, dist="poisson")
## best model refit
bestmodel(z)
## estimates
plogis(z$coef)
plogis(b)
## optimal classification
z$rank
## get the final matrix
x1 <- mefa4::groupSums(x, 2, z$levels[[length(z$levels)]])
head(x)
head(x1)
## compare the models
m0 <- glm(y ~ x - 1, family="poisson")
m1 <- glm(y ~ x1 - 1, family="poisson")
data.frame(AIC(m0, m1), delta=AIC(m0, m1)$AIC - AIC(m0))
z$delta # should be identical
## Not run:
## dolina example with factor
data(dolina)
dolina$samp$stratum <- as.integer(dolina$samp$stratum)
y <- dolina$xtab[dolina$samp$method == "Q", "ppyg"]
x <- dolina$samp$mhab[dolina$samp$method == "Q"]
z <- scale(model.matrix(~ stratum + lmoist - 1,
dolina$samp[dolina$samp$method == "Q",]))
## without additional covariates
dol1 <- optilevels(y, x, z=NULL, dist="poisson")
dol1$rank
summary(bestmodel(dol1))
## with additional covariates
dol2 <- optilevels(y, x, z, dist="poisson")
dol2$rank
summary(bestmodel(dol2))
## compare the two models
AIC(bestmodel(dol1), bestmodel(dol2))
## End(Not run)