pbat {pbatR} | R Documentation |
PBAT Graphical and Command Line Interface
Description
The following routines are for the graphical and command line pbat
interface. The command line interfaces are listed in an order of
suggested usage. Most users of the command line will only want to use
pbat.m
.
pbat
runs a GUI (Graphical User Interface) for pbat.
pbat.last
returns an object of class pbat
of the last
command file run from running pbat()
. Note this is also
returned from pbat
.
However, this command is provided because rerunning a command in
pbat can be a very time-consuming process).
pbat.last.rawResults
prints out the raw text file of the
output (particularly useful if the output of pbat cannot be parsed
properly, in the unexpected event the output could not be parsed
correctly). This should work even with the new option of not loading
the output in.
pbat.m
runs pbat according to an expression, from phe
class
(phenotype information), ped
class (pedigree information), and
various options.
pbat.obj
runs pbat with a ped
class object (pedigree
information), a ‘phe’ class object (phenotype information), and
various other options.
pbat.files
runs pbat according to a set of filenames and
commands.
pbat.create.commandfile
creates a command file for Christoph
Lange's pbat software with respect to two files on disk (.phe, .ped).
Some options are only available for the respective pbat-gee (G), pbat-pc (P), pbat-logrank (L). If a parameter is ‘R’equired for a specific version, it will be denoted, for example, by (G-R).
Usage
pbat()
pbat.last()
pbat.last.rawResults()
pbat.m(
formula, phe, ped, fbat="",
max.pheno=1, min.pheno=1,
null="no linkage, no association", alpha=0.05,
trans.pheno="none", trans.pred="none", trans.inter="none",
scan.pred="all", scan.inter="all",
scan.genetic="additive",
offset="gee",
screening="conditional power", distribution="default",
logfile="",
max.gee=1,
max.ped=14, min.info=0,
incl.ambhaplos=TRUE, infer.mis.snp=FALSE,
sub.haplos=FALSE, length.haplos=2, adj.snps=TRUE,
overall.haplo=FALSE, cutoff.haplo=FALSE,
output="normal",
max.mating.types=10000,
commandfile="",
future.expansion=NULL,
LOAD.OUTPUT=TRUE,
monte=0,
mminsnps=NULL, mmaxsnps=NULL,
mminphenos=NULL, mmaxphenos=NULL,
env.cor.adjust=FALSE,
gwa=FALSE,
snppedfile=FALSE,
extended.pedigree.snp.fix=FALSE,
new.ped.algo=FALSE,
cnv.intensity=2, cnv.intensity.num=3 )
pbat.obj(
phe, ped, file.prefix, phenos="",
offset="gee", LOAD.OUTPUT=TRUE, ...)
pbat.files(
pedfile, phefile, fbat="gee",
commandfile="",
logrank.outfile="",
preds="", preds.order="",
max.pheno=1,
LOAD.OUTPUT=TRUE,
...)
pbat.create.commandfile(
pedfile, phefile="",
snps="",
phenos="", time="", # (set only one)
preds="", preds.order="",
inters="",
groups.var="", groups="",
fbat="gee",
censor="",
max.pheno=1, min.pheno=1,
null="no linkage, no association", alpha=0.05,
trans.pheno="none", trans.pred="none", trans.inter="none",
scan.pred="all", scan.inter="all",
scan.genetic="additive",
offset="gee",
screening="conditional power", distribution="default",
logfile="",
max.gee=1,
max.ped=7, min.info=0,
haplos=NULL, incl.ambhaplos=TRUE, infer.mis.snp=FALSE,
sub.haplos=FALSE, length.haplos=2, adj.snps=TRUE,
overall.haplo=FALSE, cutoff.haplo=FALSE,
output="normal",
max.mating.types=10000,
commandfile="",
future.expansion=NULL,
LOGFILE.OVERRIDE=TRUE,
monte=0,
mminsnps=NULL, mmaxsnps=NULL,
mminphenos=NULL, mmaxphenos=NULL,
env.cor.adjust=FALSE,
gwa=FALSE,
snppedfile=FALSE,
extended.pedigree.snp.fix=FALSE,
new.ped.algo=FALSE,
cnv.intensity=2, cnv.intensity.num=3 )
Arguments
formula |
Symbolic expression describing what should be processed. See ‘examples’ for more information. |
phe |
‘phe’ object as described in |
ped |
‘ped’ object as described in |
file.prefix |
Prefix of the output datafile (phe & ped must match) |
pedfile |
Name of the pedigree file (.ped/.pped/.cped) in PBAT-format (extension ‘.ped’ is optional). |
phefile |
Name of the phenotype file (.phe) in PBAT-format.
The default assumes the same prefix as that in 'pedfile'. Leave
empty or set to the empty string "" if you do not have a phenotype
file (i.e. you are only using AffecitonStatus). In the case of no
phenotype file, one must be created; it will be in |
... |
Options in higher level functions to be passed to 'pbat.create.commandfile'. |
fbat |
Selects the fbat statistic used the data analysis.
|
max.pheno |
(G,P) The maximum number of phenotypes that will be analyzed in the FBAT-statistic. |
min.pheno |
(G,P) The minimum number of phenotypes that will be analyzed in the FBAT-statistic. |
null |
Specification of the null-hypothesis.
|
alpha |
Specification of the significance level. |
trans.pheno |
Transformation of the selected phenotypes.
The default choice is |
trans.pred |
Transformation of the selected predictor variables/covariates:
The default choice is |
trans.inter |
Transformation of the selected interaction variables
The default choice is |
scan.pred |
(G,P) Computation of all covariate sub-models:
|
scan.inter |
(G,P) Computation of all interaction sub-models:
|
scan.genetic |
Specification of the mode of inheritance:
|
offset |
Specification of the covariate/predictor variables adjustment:
|
screening |
Specification of the screening methods to handle the multiple comparison problem for multiple SNPs/haplotypes and a set of phenotypes.
|
distribution |
Screening specification of the empirical phenotypic distribution
|
logfile |
Specification of the log-file. By default, PBAT selects an unique file-name for the log-file, i.e. "pbatlog...". |
max.gee |
(G) Specification of the maximal number of iterations in the GEE-estimation procedure. |
max.ped |
Specification of the maximal number of proband in one extended pedigrees. |
min.info |
Specification of the minimum number of informative families required for the computation of the FBAT-statistics. |
incl.ambhaplos |
This command defines the handling of ambiguous haplotypes in the haplotypes analysis. Choices:
|
infer.mis.snp |
Handling of missing genotype information in the haplotypes analysis.
|
sub.haplos |
|
length.haplos |
Defines the haplotype length when subhaplos= |
adj.snps |
Takes effect when subhaplos=
|
overall.haplo |
Specification of an overall haplotypes test. When
this command is included in the batch-file, only one level of the
|
cutoff.haplo |
The minimum haplotypes frequency so that a haplotypes is included in the overall test. |
output |
|
max.mating.types |
Maximal number of mating types in the haplotype analysis. |
commandfile |
Name of the temporary command file that will be created to send to the pbat. It is suggested to leave this blank, and an appropriate name will be chosen with a time stamp. |
future.expansion |
(Only included for future expansion of pbat.) A vector of strings for extra lines to write to the batchfile for pbat. |
logrank.outfile |
(L) Name of the file to store the R source code to generate the plots for logrank analysis. |
snps |
Vector of strings for the SNPs to process. Default processes all of the SNPs. |
phenos |
(G,P) Vector of strings for the phenotypes/traits for the analysis. If none are specified, then all are analyzed. (Note: this must be left empty for logrank analysis, instead specify the time to onset with the time variable. |
time |
(L-R) Time to onset variable. ‘phenos’ cannot be specified when this is used, but it must be set for logrank. |
preds |
Vector of strings for the covariates for the test statistic. |
preds.order |
Vector of integers indicating the order of 'preds' - the order for the vector of covariates for the test statistic. |
inters |
Vector of strings for the interaction variables. |
groups.var |
String for the grouping variable. |
groups |
Vector of strings corresponding to the groups of the grouping variable (groupsVar). |
censor |
(L-R) String of the censoring variables. In the corresponding data, this variable has to be binary. |
haplos |
List of string vectors representing the haplotype
blocks for the haplotype analysis.
For example,
|
LOGFILE.OVERRIDE |
When using the 'sym' option in read.ped and read.phe, when this is set to TRUE (default), the PBAT logfile is put in the current working directory; if FALSE, then it is put in the same directory as the datafile. |
LOAD.OUTPUT |
When TRUE, loads the output into R (generally recommended). When FALSE, it leaves it in the output left from PBAT (in case output is too large to load into memory). |
monte |
When this is nonzero, monte-carlo based methods are used to compute the p-values instead, according to the number of iterations supplied. 1000 iterations is suggested. |
mminsnps |
Multi-marker multi-phenotype tests: the minimum number of snps to be tested. |
mmaxsnps |
Multi-marker multi-phenotype tests: the maximum number of snps to be tested. |
mminphenos |
Multi-marker multi-phenotype tests: the minimum number of phenotypes to be tested. |
mmaxphenos |
Multi-marker multi-phenotype tests: the maximum number of phenotypes to be tested. |
env.cor.adjust |
Whether to adjust for environmental correlation. |
gwa |
Whether to use (g)enome (w)ide (a)cceleration mode. This is faster for genome-wide association tests, and has slightly less output. |
snppedfile |
Whether the pedigree file contains just snps. When this is true, it employs a more optimal storage technique and uses much less memory. It is especially advantageous for genome-wide studies. |
extended.pedigree.snp.fix |
Set to TRUE when you are using a dataset with large extended pedigrees. This will not work with any mode but ‘single’ mode currently [see pbat.set(...)]. This is also sometimes necessary for multi-allelic markers (i.e. not binary markers). |
new.ped.algo |
Set to TRUE (default is FALSE) to use the new, 10-100 times faster and more memory efficient algorithm. Somewhat experimental with extended pedigrees, so use with caution. |
cnv.intensity |
The CNV intensity number that should be analyzed. |
cnv.intensity.num |
The number of CNV intensities per CNV in the .cped file. |
Details
IF YOU ARE HAVING PROBLEMS: Try setting extended.pedigree.snp.fix, the slowest but most robust method.
INTERPRITING THE OUTPUT:
1) I make every attempt to try to properly header the output, but sometimes this is not possible. You will often see a warning message to this regards, which is generally safe to ignore.
2) 'a'=additive, 'd'=dominant, 'r'=recessive, 'h'=heterozygous advantage
FURTHER USEFUL COMMENTS:
These commands require ‘pbatdata.txt’ to be in the working directory; if not found, the program will attempt to (1) copy the file from the directory where pbat is, (2) copy it from anywhere in the path, or (3) error and exit.
Linux warning: the file ‘pbatdata.txt’ appears not to have shipped with the current (as of writing this) linux version; to fix this just download the windows version as well and copy the file from there to the same directory as pbat.
It is recommended to set 'LOAD.OUTPUT' to 'FALSE' when dealing with very large numbers of SNPs.
These commands will also generate a lot of output files in the current working directory when interfacing with pbat. These files will be time-stamped so concurrent analysis in the same directory can be run. Race condition: if two logrank analysis finish at exactly the same time, then the plots for one might be lost and/or get linked to the wrong analysis. This should be a rather rare occurence, and is an unpreventable result of pbat always sending this output to only one filename. Workaround to race condition: create another directory and use that as your current working directory instead.
Note that multi-marker / multi-phenotype mode is not supported in
parallel at this time, so if you are having problems try running the
command pbat.setmode("single")
, or setting it to single from
the graphical interface before running these tests.
WARNING: Note the 'extended.pedigree.snp.fix' option, which is important for getting more accurate results in very extended pedigrees. It uses a slower but more accurate pedigree reconstruction method.
Value
‘pbat’, ‘pbat.last’, ‘pbat.m’, ‘pbat.obj’, and ‘pbat.files’
return an object of class pbat
. Methods supported by this include
plot(...)
, summary(...)
, and print(...)
. Follow
the first three links in the 'see also' section of this file for more
details.
References
This was taken with only slight modification to accomodate the interface from Christoph Lange's description of the commands for the pbat program, (which was available with the software at the time of this writing).
P2BAT webpage.
FBAT webpage (lists a lot of references in relation to both of these programs).
More pbat references:
Hoffmann, T. and Lange, C. (2006) P2BAT: a massive parallel implementation of PBAT for genome-wide association studies in R. Bioinformatics. Dec 15;22(24):3103-5.
Jiang, H., et al. (2006) Family-based association test for time-to-onset data with time-dependent differences between the hazard functions. Genet. Epidemiol, 30, 124-132.
Laird, N.M. and Lange, C. (2006) Family-based designs in the age of large-scale gene-association studies. Nat. Rev. Genet, 7.
Lange, C., et al. (2003) Using the noninformative families in family-based association tests: a powerful new testing strategy. Am. J. Hum. Genet, 73, 801-811.
Lange, C., et al. (2004a) A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Stat. Appl. Genet. Mol. Biol, 3.
Lange, C., et al. (2004b) Family-based association tests for survival and times-to-onset analysis. Stat. Med, 23, 179-189.
Van Steen, K., et al. (2005) Genomic screening and replication using the same data set in family-based association testing. Nat. Genet, 37, 683-691.
See Also
summary.pbat
,
plot.pbat
,
print.pbat
,
Examples
##########################
## pbat.m(...) examples ##
##########################
## Not run:
## Note, when you run the example (or anything else) you will generally
## get a warning message that the column headers were guessed.
## This means they were guessed, and while I've tried to catch most
## cases, the warning stands for ones I might have missed.
## These cannot be run verbatim, and are just meant to be examples.
##############################
## Further formula examples ##
##############################
# load in the data
# Here we assume that:
# data.phe contains 'preds1', 'preds2', 'preds3', 'time',
# 'censor', 'phenos1', ... 'phenos4'
# data.ped contains 'snp1', 'snp2', 'snp3',
# 'block1snp1','block1snp2',
# 'block2snp1','block2snp2'
data.phe <- read.phe( "data" )
data.ped <- read.ped( "data" )
# This model does just the affection status (always given as
# AffectionStatus) as the phenotype, no predictor covariates, and all
# the snps for a snps analysis.
# Since affection status is dichotomous, we additionally set
# distribution='categorical'
# offset='none'
# NONE is a special keyword to indicate none, and can be only used in
# this case (note that it is _case_ _sensative_);
# otherwise one specifies values from the phenotype object, after and
# including AffectionStatus.
res <- pbat.m( AffectionStatus ~ NONE, phe, ped, fbat="gee",
distribution='categorical', offset='none', ... )
summary( res )
res # equivalent to print(res)
# basic model with one phenotype, does all snps (if none specified)
pbat.m( phenos1 ~ preds1, phe, ped, fbat="gee" )
# same model, but with more phenotypes; here we test them all at once
pbat.m( phenos1 + phenos2 + phenos3 ~ preds1, phe, ped, fbat="gee" )
# same model as just before, but now supposing that these phenotypes are
# instead from a longitudinal study
pbat.m( phenos1 + phenos2 + phenos3 ~ preds1, phe, ped, fbat="pc" )
# like our second model, but the mi() tells it should be a marker
# interaction
pbat.m( phenos1 ~ mi(preds1), phe, ped, fbat="gee" )
# logrank analysis - fbat need not be set
# uses more than one predictor variable
res <- pbat.m( time & censor ~ preds1 + preds2 + preds3, phe, ped )
plot( res )
# single snp analysis (because each snp is seperated by a vertical bar
# '|'), and stratified by group (presence of censor auto-indicates
# log-rank analysis). Note that the group is at the end of the
# expression, and _must_ be at the end of the expression
res <- pbat.m( time & censor ~ preds1^3 + preds2 | snp1 | snp2 |
snp3 / group, temp )
plot( res )
# haplotype analysis, stratified by group
res <- pbat.m( time & censor ~ preds1^2 + preds2^3 | block1snp1
+ block1snp2 | block2snp1 + block2snp2 / group, temp )
# set any of the various options
res <- pbat.m( phenos ~ preds, phe, ped, fbat="pc",
null="linkage, no association", alpha=0.1 )
## New multimarker test (as described above)
# mmaxphenos and mmaxsnps are set to the minimum if not specified
res <- pbat.m( phenos1 + phenos2 + phenos3 ~ preds | m1 | m2 | m3 | m4,
phe, ped, fbat="pc", mminphenos=2, mminsnps=2 )
## And the top markers by conditional power
top( res )
## End(Not run)
############################
## pbat.obj(...) examples ##
############################
## Not run:
# These will not function; they only serve as examples.
# ... just indicates there are various options to be put here!
res <- pbat.obj("pedfile", snps=c("snp1,snp2"), preds="pred1", ... )
summary(res)
res
# plot is only available for "logrank"
res <- pbat.obj(..., fbat="logrank")
plot( res )
## End(Not run)
##############################
## pbat.files(...) examples ##
##############################
## Not run:
# These will not function, but only serve as examples.
# Note in the following example, both "pedfile.ped" and "pedfile.phe"
# must exist. If the names differed, then you must specify the
# option 'phe="phefile.phe"', for example.
res <- pbat.files( "pedfile", phenos=c("phenos1","phenos2"),
screening="conditional power" )
summary(res)
res
## End(Not run)