mixcov {CAMAN}R Documentation

Fitting mixture models with covariates

Description

The function mixcov can be used to estimate the parameters of covariate adjusted mixture models. This is done using the EM algorithm. The function first performs the necessary data augmentation and then applies the EM-algorithm. Covariates may be included as fixed and random effects into the model.

Thus, the EM algorithm for covariate adjusted mixture models implies to perform first the necessary data augmentation, and then based on starting values for p[j] and beta[j] the computation of posterior probabilities e[ij]. This is the E-step. In the M-step new mixing weights p[j] and regression coefficients beta[j] are computed.

Then the algorithm performs as follows:
\ Step 0: Let P be any vector of starting values
Step 1: Compute the E-step, that is estimate the probability of component membership for each observation
Step 2: Compute the M-step, that is compute new mixing weights and model coefficients
Proceed until convergence is met.

Usage

mixcov(dep, fixed, random="", data, k, weight=NULL, pop.at.risk=NULL, 
       var.lnOR=NULL, family="gaussian", maxiter=50, 
       acc=10^-7, returnHomogeneousModel = FALSE) 
    

Arguments

dep

dependent variable. Vector or colname of data. Must be specified!

fixed

fixed effects. Vector or colname of data. Must be specified!

random

random effects. Vector or colname of data. Must be specified!

k

number of components.

weight

weights of the data. Vector or colname of data. Default is NULL.

family

the underlying type density function as a character ("gaussian" or "poisson")!

data

an optional data frame. obs, weights, pop.at.risk and var.lnOR can be specified as column name of the data frame.

pop.at.risk

population at risk: An offset that could be used to determine a mixture model for Poisson data from unequally large populations at risk. Vector or colname of data. Default isNULL.

var.lnOR

variances of the data: These variances might be given when working with meta analyses! Vector or colname of data. Default is NULL.

maxiter

parameter to control the maximal number of iterations in the VEM and EM loops. Default is 5000.

acc

convergence criterion. VEM and EM loops stop when deltaLL<acc. Default is 10^(-7).

returnHomogeneousModel

boolean to indicate whether the homogeneous model (simple glm) should be returned too. Default is FALSE.

Value

The function returns a CAMAN.glm.object (S4-class), describing a mixture model with covariates. The main information about the mixture model is printed by just typing the <object>. Additional information is given in summary(object). Single attributes can be accessed using the @, e.g. mix@LL.

dat

(input) data


family

underlying type density function

LL

Likelihood of the final (best) iteration

BIC

Likelihood of the final (best) iteration

num.k

number of components obtained

p

probability of each component

t

parameter of distribution (normal distr. -> mean, poisson distr. -> lambda, binomial distr. -> prob)

coefMatrix

complete coefficient matrix

prob

probabilies, belonging to each component

steps

number of steps performed (EM).

commonEffect

common effects

hetvar

heterogeneity variance

commonEffect

common effects

classification

classification labels for each observation (which.max of prob).

cl

the matched call.

fittedObs

predictions of the fitted model, given the observations

Author(s)

Peter Schlattmann and Johannes Hoehne

Examples

### Toy data: simulate subjects with a different relationship between age and salariy
  grps = sample(1:3,70, replace=TRUE) #assign each person to one group 
  salary=NULL
  age = round(runif(70) * 47 + 18)
  #random effects: age has a different influence (slope) on the salary
  salary[grps == 1] = 2000 + 12 * age[grps==1]
  salary[grps == 2] = 4000 + 4 * age[grps==2]
  salary[grps == 3] = 3200 + (-15) * age[grps==3]
  salary = salary + rnorm(70)*30  #some noise
  sex =sample(c("m","w"), 70, replace=TRUE)
  salary[sex=="m"] = salary[sex=="m"] * 1.2 #men earn 20 percent more than women
  salaryData = data.frame(salary=salary, age=age, sex=sex)
  tstSalary <- mixcov(dep="salary", fixed="sex", random="age" ,data=salaryData,
                      k=3,family="gaussian", acc=10^-3)
 

### POISSON data:
data(NoP)
ames3 <- mixcov(dep="count",fixed=c("dose", "logd"),random="",data=NoP,
                k=3,family="poisson")

### Gaussian data
data(betaplasma)
beta4 <- mixcov(dep="betacaro", fixed=c("chol","sex","bmi"), random="betadiet",
                data=betaplasma, k=4, family="gaussian")

[Package CAMAN version 0.78 Index]