preprocess {IPCAPS} | R Documentation |
(Internal function) Perform the pre-processing step of IPCAPS
Description
(Internal function) Perform the pre-processing step of IPCAPS
Usage
preprocess(
files,
label.file,
lab.col,
rdata.infile,
bed.infile,
cate.list,
result.dir,
threshold,
min.fst,
max.thread = NA,
reanalysis = FALSE,
method = "mix",
min.in.group = 20,
datatype = "snp",
nonlinear = FALSE,
missing.char = NA,
regression.file = NA,
regression.col.first = NA,
regression.col.last = NA,
reg.method = "linear",
plot.as.pdf = NA,
no.plot = NA
)
Arguments
files |
IPCAPS supports SNPs encoded as 0, 1 and 2 (dosage encoding). Rows represent SNPs and columns represent subjects. Each column needs to be separated by a space or a tab. A big text file should be divided into smaller files to load faster. For instance, to input 3 files, use as: files=c( 'input1.txt', 'input2.txt', 'input3.txt'). |
label.file |
An additional useful information (called "labels" in IPCAPS) related subject, for example, geographic location or disease phenotype. These labels (one at a time) are used in displaying the clustering outcome of IPCAPS. A label file must contain at least one column. However, it may contain more than one column in which case each column need to be separated by a space or a tab. |
lab.col |
The label in the label file to be used in the tree-like display of IPCAPS clustering results. |
rdata.infile |
In case of re-analysis, it is convenient to run IPCAPS using the file rawdata.RData generated by IPCAPS. This file contains a matrix of SNPs (raw.data) and a vector of labels (label). |
bed.infile |
A PLINK binary format consists of 3 files; bed, bim, and fam. To generate these files from PLINK, use option –make-bed. See more details at: http://zzz.bwh.harvard.edu/plink/data.shtml. |
cate.list |
(Unimplemented) A list of categorical input file (text). For instance, to input 3 files, use as: files=c('input1.txt', 'input2.txt', 'input3.txt'). |
result.dir |
To set an absolute path for IPCAPS output. If the specified output directory already exists, result files are saved in sub-directories cluster_out, cluster_out1, cluster_out2, etc. |
threshold |
Cutoff value for EigenFit. |
min.fst |
Minimum Fst between a pair of subgroups. |
max.thread |
(Require the parallelization patch) Maximum number of concurrent threads. |
reanalysis |
(Unimplemented) To specify whether it is re-analysis or not. If TRUE, it is re-analysis, otherwise it is not. Default = FALSE. |
method |
The internal clustering method. It can be set to "mix" (rubikclust & mixmod), "mixmod" (Lebret et al., 2015), "clara" (R: Clustering Large Applications), "pam" (R: Partitioning Around Medoids (PAM) Object), "meanshift" (Wang, 2016), "apcluster" (Bodenhofer et al., 2016), and "hclust" (R: Hierarchical Clustering). Default = "mix". |
min.in.group |
Minimum number of individuals to constitute a cluster or subgroup. |
datatype |
To specify whether the input data are 'snp' or other type. Defalut = 'snp'. |
nonlinear |
(Unimplemented) To specify whether linear or non-linear method is used for IPCAPS analysis. If TRUE, non-linear method is used, otherwise linear method is used. Default = FALSE. |
missing.char |
Symbol used for missing genotypes. Default = NA. |
regression.file |
A file of covariates; one covariate per column. SNPs can be adjusted for these covariates via regression modeling and residual computation. |
regression.col.first |
Refer to a covariate file, the first covariate to be considered as confounding variable. |
regression.col.last |
Refer to a covariate file, the last covariate to be considered as confounding variable. All the variables in between the cov.col.first and cov.col.last will be considered in the adjustment process. |
reg.method |
(Fixed) Specify a method for regression analysis. Default = 'linear'. |
plot.as.pdf |
To export plots as PDF. When omitted, plots are saved as PNG. |
no.plot |
No plot is generated if this option is TRUE. This option is useful when the system does not support X Windows in the unix based system. Default = FALSE. |
Value
A data frame of input data.