R: Starting values for parameters

start_em {mult.latent.reg}

R Documentation

Starting values for parameters

Description

The starting values for parameters used for the EM algorithm in the functions: mult.em_1level, mult.em_2level, mult.reg_1level and mult.reg_2level.

Arguments

`data`	A data set object; we denote the dimension of a data set to be `m`.
`v`	Covariate(s); we denote the dimension of it to be `r`.
`K`	Number of mixture components, the default is `K = 2`.
`steps`	Number of iterations. This will only be used when using `option = 2` for both the 1-level model and the 2-level model. It should also be used when using `option = 3` and `option = 4` for the 1-level model, provided `var_fun` is set to either 3 or 4; the default is `steps = 20`.
`option`	Four options for selecting the starting values for the parameters. The default is `option = 1`. When `option = 1`: `\pi_k` = `\frac{1}{K}`, `z_k` ~ rnorm(`K`, mean = 0, sd=1), `\alpha` = column means, `\beta` = a random row minus alpha, `\Gamma` = coefficient estimates from separate linear models, `\Sigma` is diagonal matrix where the diagonals take the value of column standard deviations over `K`; when `option = 2`: use a short run (`steps = 5`) of the EM function which uses `option = 1` with `var_fun = 1` and use the estimates as the starting values for all the parameters; when `option = 3`: the starting value of `\beta` is the first principal component, and the starting values for the rest of the parameters are the same as described when `option = 1`; when `option = 4`: first, take the scores of the first principal component of the data and perform `K`-means, `\pi_k` is the proportion of the clustering assignments, and `z_k` take the values of the `K`-means centers, and the starting values for the rest of the parameters are the same as described when `option = 1`.
`var_fun`	The four variance specifications. When `var_fun = 1`, the same diagonal variance specification to all `K` components of the mixture; `var_fun = 2`, different diagonal variance matrices for different components. `var_fun = 3`, the same full (unrestricted) variance for all components. `var_fun = 4`, different full (unrestricted) variance matrices for different components. If unspecified, `var_fun = 2`. Note that for application propose, in two-level models, `var_fun` can only take values of 1 or 2.
`p`	optional; specifies starting values for `\pi_k`, it is input as a `K`-dimensional vector.
`z`	optional; specifies starting values for `z_k`, it is input as a `K`-dimensional vector.
`beta`	optional; specifies starting values for `\beta`, it is input as an `m`-dimensional vector.
`alpha`	optional; specifies starting values for `\alpha`, it is input as an `m`-dimensional vector.
`sigma`	optional; specifies starting values for `\Sigma_k` (`\Sigma`, when `var_fun = 1` or `var_fun = 3`), when `var_fun = 1`, it is input as an `m`-dimensional vector, when `var_fun = 2`, it is input as a list (of length `K`) of `m`-dimensional vectors, when `var_fun = 3`, it is input as an `m \times m` matrix, when `var_fun = 4`, it is input as a list (of length `K`) of `m \times m` matrices.
`gamma`	optional; the coefficients for the covariates; specifies starting values for `\Gamma`, it is input as an `m \times r` matrix.

Value

The starting values (in a list) for parameters in the models x_{i} = \alpha + \beta z_k + \Gamma v_i + \varepsilon_i (Zhang and Einbeck, 2024) and x_{ij} = \alpha + \beta z_k + \Gamma v_{ij} + \varepsilon_{ij} (Zhang et al., 2023) used in the four fucntions: mult.em_1level, mult.em_2level, mult.reg_1level and mult.reg_2level.

`p`	The starting value for the parameter `\pi_k`, which is a vector of length `K`.
`alpha`	The starting value for the parameter `\alpha`, which is a vector of length `m`.
`z`	The starting value for the parameter `z_k`, which is a vector of length `K`.
`beta`	The starting value for the parameter `\beta`, which is a vector of length `m`.
`gamma`	The starting value for the parameter `\Gamma`, which is a matrix.
`sigma`	The starting value for the parameter `\Sigma_k`. When `var_fun = 1`, `\Sigma_k` is a diagonal matrix and `\Sigma_k = \Sigma`, and we obtain a vector of the diagonal elements; When `var_fun = 2`, `\Sigma_k` is a diagonal matrix, and we obtain `K` vectors of the diagonal elements; When `var_fun = 3`, `\Sigma_k` is a full variance-covariance matrix, `\Sigma_k = \Sigma`, and we obtain a matrix `\Sigma`; When `var_fun = 4`, `\Sigma_k` is a full variance-covariance matrix, and we obtain `K` different matrices `\Sigma_k`.

References

Zhang, Y., Einbeck, J. and Drikvandi, R. (2023). A multilevel multivariate response model for data with latent structures. In: Proceedings of the 37th International Workshop on Statistical Modelling, pages 343-348. Link on RG: https://www.researchgate.net/publication/375641972_A_multilevel_multivariate_response_model_for_data_with_latent_structures.

Zhang, Y. and Einbeck, J. (2024). A Versatile Model for Clustered and Highly Correlated Multivariate Data. J Stat Theory Pract 18(5).doi:10.1007/s42519-023-00357-0

Examples

##example for the faithful data.
data(faithful)
start <- start_em(faithful, option = 1)

[Package mult.latent.reg version 0.1.7 Index]