R: Run haplin analysis in a series of sliding windows over a...

haplinSlide {Haplin}

R Documentation

Run haplin analysis in a series of sliding windows over a sequence of markers/SNPs

Description

Produces a list, each element of which is an object of class haplin, which is the result of fitting the log-linear haplin models to the data one "window" at a time.

Usage

haplinSlide( data, markers = "ALL", winlength = 1, 
strata = NULL, table.output = TRUE, cpus = 1, para.env = NULL, slaveOutfile = "", 
printout = FALSE, verbose = FALSE, ...)

Arguments

`data`	R-object of class "haplin.ready", which is e.g., output from `genDataPreprocess` or `genDataLoad`, and contains covariate and genetic data.
`markers`	Default is "ALL", which means haplinSlide uses all available markers in the data set in the analysis. Alternatively, the relevant markers can be specified by giving a vector or numbers (e.g., `markers = c(1, 3:10)` will use the 10 first markers except marker 2) or characters (e.g., `markers = c("m1", "m3", "rs35971")`). `haplinSlide` will then run haplin on a series of windows selected from the supplied `markers`. The `winlength` argument decides the length of the windows. See details.
`winlength`	Length of the sliding, overlapping windows to be run along the markers. See details.
`strata`	A single numeric value specifying which data column contains the stratification variable.
`table.output`	If `TRUE`, the `haptable` function will be applied to each result after estimation, greatly reducing the size of the output. If `FALSE`, each element of the output list is a standard `haplin` object. To preserve memory, default is set to TRUE.
`cpus`	`haplinSlide` allows parallel processing of its analyses. The `cpus` argument should preferably be set to the number of available cpu's. If set lower, it will save some capacity for other processes to run. Setting it too high should not cause any serious problems.
`para.env`	The user can choose parallel environment to use — "parallel" (default) or "Rmpi" (for use on clusters); this option is used only when `cpus` argument is larger than 1.
`slaveOutfile`	Character. To be used when `cpus > 1`. If `slaveOutfile = ""` (default), output from all running cores will be printed in the standard R session window. Alternatively, the output can be saved to a file by specifying the file path and name.
`printout`	Default is FALSE. If TRUE, provides a full summary of each `haplin` result during the run of `haplinSlide`.
`verbose`	Same as for `haplin`, but defaults to FALSE to reduce output size.
`...`	Remaining arguments to be used by `haplin` in each run.

Details

haplinSlide runs haplin on a series of overlapping windows of the chosen markers. Except for the markers and winlength arguments, all arguments are used exactly as in haplin itself. For instance, if markers = c(1, 3, 4, 5, 7, 8) and winlength = 4, haplinSlide will run haplin on first the markers c(1, 3, 4, 5), then on c(3, 4, 5, 7), and finally on c(4, 5, 7, 8). The results are returned in a list. The elements are named "1-3-4-5" etc., and can be extracted with, say, summary(res[["1-3-4-5"]]) etc., where res is the saved result. Or the output can be examined by, for instance, using lapply(res, summary) and lapply(res, plot).
When running haplinSlide on a large number of markers, the output can become prohibitively large. In that case table.output should be set to TRUE, and haplinSlide will return a list of summary "haptables". This list can then be stacked into a single dataframe using toDataFrame. To avoid exessive memory use, the default is table.output = TRUE.
When multiple cores are available, set the cpus to the number of cores that should be used. This will run haplinSlide in parallel on the chosen number of cores. Note that feedback is provided by each of the cores separately, and some cores may start working on markers far out in the sequence.

Value

A list of objects of class haplin is returned.

Note

Further information is found on the web page.

Author(s)

Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
hakon.gjessing@uib.no

References

Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.

Web Site: https://haplin.bitbucket.io

Examples


## Not run: 
# (Almost) all standard haplin runs can be done with haplinSlide. 
# Below is an illustration. See the haplin help page for more 
# examples.
# 

# 1. Read the data:
my.haplin.data <- genDataRead( file.in = "HAPLIN.trialdata.txt", file.out =
  "trial_data1", dir.out = tempdir( check = TRUE ), format = "haplin", n.vars = 0 )

# 2. Run pre-processing:
haplin.data.prep <- genDataPreprocess( data.in = my.haplin.data,
  format = "haplin", design = "triad", file.out = "trial_data1_prep",
  dir.out = tempdir( check = TRUE ) )

# 3. Analyze:
# Analyzing the effect of fetal genes, including triads with missing data,
# using a multiplicative response model. When winlength = 1, separate
# markers are used. To make longer windows, winlength can be increased
# correspondingly:
result.1 <- haplinSlide( haplin.data.prep, use.missing = T, response = "mult",
reference = "ref.cat", winlength = 1, table.output = F)
# Provide summary of separate results:
lapply(result.1, summary)
# Plot results:
par(ask = T)
lapply(result.1, plot)


## End(Not run)

[Package Haplin version 7.3.1 Index]