haplinSlide {Haplin} | R Documentation |
Run haplin analysis in a series of sliding windows over a sequence of markers/SNPs
Description
Produces a list, each element of which is an object of class haplin
, which is the result of fitting the log-linear haplin
models to the data one "window" at a time.
Usage
haplinSlide( data, markers = "ALL", winlength = 1,
strata = NULL, table.output = TRUE, cpus = 1, para.env = NULL, slaveOutfile = "",
printout = FALSE, verbose = FALSE, ...)
Arguments
data |
R-object of class "haplin.ready", which is e.g., output from |
markers |
Default is "ALL", which means haplinSlide uses all available markers in the data set in the analysis. Alternatively, the relevant markers can be specified by giving a vector or numbers (e.g., |
winlength |
Length of the sliding, overlapping windows to be run along the markers. See details. |
strata |
A single numeric value specifying which data column contains the stratification variable. |
table.output |
If |
cpus |
|
para.env |
The user can choose parallel environment to use — "parallel" (default) or "Rmpi" (for use on clusters); this option is used only when |
slaveOutfile |
Character. To be used when |
printout |
Default is FALSE. If TRUE, provides a full summary of each |
verbose |
Same as for |
... |
Remaining arguments to be used by |
Details
haplinSlide
runs haplin
on a series of overlapping windows of the chosen markers. Except for the markers
and winlength
arguments, all arguments are used exactly as in haplin
itself. For instance, if markers = c(1, 3, 4, 5, 7, 8)
and winlength = 4
, haplinSlide
will run haplin
on first the markers c(1, 3, 4, 5)
, then on c(3, 4, 5, 7)
, and finally on c(4, 5, 7, 8)
. The results are returned in a list. The elements are named "1-3-4-5" etc., and can be extracted with, say, summary(res[["1-3-4-5"]])
etc., where res
is the saved result. Or the output can be examined by, for instance, using lapply(res, summary)
and lapply(res, plot)
.
When running haplinSlide
on a large number of markers, the output can become prohibitively large. In that case table.output
should be set to TRUE
, and haplinSlide
will return a list of summary "haptables". This list can then be stacked into a single dataframe using toDataFrame
. To avoid exessive memory use, the default is table.output = TRUE
.
When multiple cores are available, set the cpus
to the number of cores that should be used. This will run haplinSlide
in parallel on the chosen number of cores. Note that feedback is provided by each of the cores separately, and some cores may start working on markers far out in the sequence.
Value
A list of objects of class haplin is returned.
Note
Further information is found on the web page.
Author(s)
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
hakon.gjessing@uib.no
References
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
See Also
haplin
, summary.haplin
, plot.haplin
, haptable
, toDataFrame
Examples
## Not run:
# (Almost) all standard haplin runs can be done with haplinSlide.
# Below is an illustration. See the haplin help page for more
# examples.
#
# 1. Read the data:
my.haplin.data <- genDataRead( file.in = "HAPLIN.trialdata.txt", file.out =
"trial_data1", dir.out = tempdir( check = TRUE ), format = "haplin", n.vars = 0 )
# 2. Run pre-processing:
haplin.data.prep <- genDataPreprocess( data.in = my.haplin.data,
format = "haplin", design = "triad", file.out = "trial_data1_prep",
dir.out = tempdir( check = TRUE ) )
# 3. Analyze:
# Analyzing the effect of fetal genes, including triads with missing data,
# using a multiplicative response model. When winlength = 1, separate
# markers are used. To make longer windows, winlength can be increased
# correspondingly:
result.1 <- haplinSlide( haplin.data.prep, use.missing = T, response = "mult",
reference = "ref.cat", winlength = 1, table.output = F)
# Provide summary of separate results:
lapply(result.1, summary)
# Plot results:
par(ask = T)
lapply(result.1, plot)
## End(Not run)