R: Adaptive Fence model selection (Restricted Fence)

RF {fence}

R Documentation

Adaptive Fence model selection (Restricted Fence)

Description

Adaptive Fence model selection (Restricted Fence)

Usage

RF(full, data, groups, B = 100, grid = 101, bandwidth = NA,
  plot = FALSE, method = c("marginal", "conditional"), id = "id",
  cpus = parallel::detectCores())

Arguments

`full`	formula of full model
`data`	data
`groups`	A list of formulas of (full) model in each bins (groups) of variables
`B`	number of bootstrap sample, parametric for lmer
`grid`	grid for c
`bandwidth`	bandwidth for kernel smooth function
`plot`	Plot object
`method`	either marginal (GEE) or conditional approach is selected
`id`	Subject or cluster id variable
`cpus`	Number of parallel computers

Details

In Jiang et. al (2008), the adaptive c value is chosen from the highest peak in the p* vs. c plot. In Jiang et. al (2009), 95% CI is taken into account while choosing such an adaptive choice of c. In Thuan Nguyen et. al (2014), the adaptive c value is chosen from the first peak. This approach works better in the moderate sample size or weak signal situations. Empirically, the first peak becomes highest peak when sample size increases or signals become stronger

Value

`models`	list all model candidates in the model space
`B`	list the number of bootstrap samples that have been used
`lack_of_fit_matrix`	list a matrix of Qs for all model candidates (in columns). Each row is for each bootstrap sample
`Qd_matrix`	list a matrix of QM - QM.tilde for all model candidates. Each row is for each bootrap sample
`bandwidth`	list the value of bandwidth
`model_mat`	list a matrix of selected models at each c values in grid (in columns). Each row is for each bootstrap sample
`freq_mat`	list a matrix of coverage probabilities (frequency/smooth_frequency) of each selected models for a given c value (index)
`c`	list the adaptive choice of c value from which the parsimonious model is selected
`sel_model`	list the selected (parsimonious) model given the adaptive c value

Note

bandwidth = (cs[2] - cs[1]) * 3. So it's chosen as 3 times grid between two c values.

References

Jiang J., Rao J.S., Gu Z., Nguyen T. (2008), Fence Methods for Mixed Model Selection. The Annals of Statistics, 36(4): 1669-1692
Jiang J., Nguyen T., Rao J.S. (2009), A Simplified Adaptive Fence Procedure. Statistics and Probability Letters, 79, 625-629
Thuan Nguyen, Jiming Jiang (2012), Restricted fence method for covariate selection in longitudinal data analysis. Biostatistics, 13(2), 303-314
Thuan Nguyen, Jie Peng, Jiming Jiang (2014), Fence Methods for Backcross Experiments. Statistical Computation and Simulation, 84(3), 644-662

Examples

## Not run: 
r =1234; set.seed(r)
n = 100; p=15; rho = 0.6
beta = c(1,1,1,0,1,1,0,1,0,0,1,0,0,0,0)  # non-zero beta 1,2,3,V6,V7,V9,V12
id = rep(1:n,each=3)
V.1 = rep(1,n*3)
I.1 = rep(c(1,-1),each=150)
I.2a = rep(c(0,1,-1),n)
I.2b = rep(c(0,-1,1),n)
x = matrix(rnorm(n*3*11), nrow=n*3, ncol=11)
x = cbind(id,V.1,I.1,I.2a,I.2b,x)
R = diag(3)
for(i in 1:3){
 for(j in 1:3){
   R[i,j] = rho^(abs(i-j))
 }
} 
e=as.vector(t(mvrnorm(n, rep(0, 3), R)))  
y = as.vector(x[,-1]%*%beta) + e
data = data.frame(x,y)
raw = "y ~ V.1 + I.1 + I.2a +I.2b"
for (i in 6:16) { raw = paste0(raw, "+V", i)}; full = as.formula(raw)
bin1="y ~ V.1 + I.1 + I.2a +I.2b"
for (i in 6:8) { bin1 = paste0(bin1, "+V", i)}; bin1 = as.formula(bin1)
bin2="y ~ V9"
for (i in 10:16){ bin2 = paste0(bin2, "+V", i)}; bin2 = as.formula(bin2)
# May take longer than 30 min since there are two stages in this RF procedure
obj1.RF = RF(full = full, data = data, groups = list(bin1,bin2), method="conditional")
obj1.RF$sel_model
obj2.RF = RF(full = full, data = data, groups = list(bin1,bin2), B=100, method="marginal")
obj2.RF$sel_model

## End(Not run)

[Package fence version 1.0 Index]