AM {Eagle}R Documentation

multiple-locus Association Mapping

Description

AM performs association mapping within a multiple-locus linear mixed model framework. AM finds the best set of marker loci in strongest association with a trait while simultaneously accounting for any fixed effects and the genetic background.

Usage

AM(
  trait = NULL,
  fformula = NULL,
  geno = NULL,
  pheno = NULL,
  map = NULL,
  Zmat = NULL,
  ncpu = detectCores(),
  ngpu = 0,
  quiet = TRUE,
  maxit = 40,
  fixit = FALSE,
  lambda = 1
)

Arguments

trait

the name of the column in the phenotype data file that contains the trait data. The name is case sensitive and must match exactly the column name in the phenotype data file.

fformula

the right hand side formula for the fixed effects. See below for details. If not specified, only an overall mean will be fitted.

geno

the R object obtained from running ReadMarker. This must be specified.

pheno

the R object obtained from running ReadPheno. This must be specified.

map

the R object obtained from running ReadMap. If not specified, a generic map will be assumed.

Zmat

the R object obtained from running ReadZmat. If not specified, an identity matrix will be assumed.

ncpu

a integer value for the number of CPU that are available for distributed computing. The default is to determine the number of CPU automatically.

ngpu

a integer value for the number of gpu available for computation. The default is to assume there are no gpu available. This option has not yet been implemented.

quiet

a logical value. If set to FALSE, additional runtime output is printed. This is useful for error checking and monitoring the progress of a large analysis.

maxit

an integer value for the maximum number of forward steps to be performed. This will rarely need adjusting.

fixit

a boolean value. If TRUE, then maxit iterations are performed, regardless of the value of the model fit value extBIC. If FALSE, then the model building process is stopped when extBIC increases in value.

lambda

a value between 0 and 1 for the regularization parameter for the extBIC. Values close to 0 lead to an anti-conservative test. Values close to 1 lead to a more conservative test. If this value is left unspecified, a default value of 1 is assumed. See FPR4AM for an empirical approach for setting the lambda value.

Details

This function is used to perform genome-wide association mapping. The phenotypic and SNP data should already be read in prior to running this function (see below for examples). AM builds the linear mixed model iteratively, via forward selection. It is through this model building process that we identify the SNP-trait associations. We use the extended BIC (extBIC) to decide on the 'best' model and when to stop looking for a better model. The conservativeness of extBIC can be adjusted. If the lambda parameter is left at is default setting, then AM is run in its most conservative state (i.e. false positives are minimized but this also decreases the chance of true positives).

When interested in running AM at a certain false positive rate, use FPR4AM. This function uses permutation to find the lambda value for a desired false positive rate for AM.

Below are some examples of how to use AM for genome-wide association mapping of data.

How to perform a basic AM analysis

Suppose,

To analyse these data, we would use the following three functions (the parameters can be specified in any order, as well as the functions as long as AM is run last):

  geno_obj <-  ReadMarker(filename='geno.txt', AA=0, AB=1, BB=2, type="text", missing='X')
  
  pheno_obj <- ReadPheno(filename='pheno.txt')

 # since lambda is not specified, this will run AM conservatively (where the false positive rate is lowest).
  res <- AM(trait='y', geno=geno_obj, pheno=pheno_obj)

A table of results is printed to the screen and saved in the R object res.

How to perform a more complicated AM analysis where the false positive rate is 5%

Suppose,

To analyse these data, we would run the following:

  geno_obj <-  ReadMarker(filename='/my/dir/geno.ped', type='PLINK', availmemGb=32)
  
  pheno_obj <- ReadPheno(filename='/my/dir/pheno.txt', missing=99)

  map_obj   <- ReadMap(filename='/my/dir/map.txt')

  # FPR4AM calculates the lambda value corresponding to a desired false positive rate of 5%
  ans <- FPR4AM(falseposrate=0.05, numreps=200, trait='y2', fformula=c('cov1 + cov2 + pc1 + pc2'), 
            geno=geno_obj, pheno=pheno_obj, map=map_obj)

  # performs association mapping with a 5% false positive rate
  res <- AM(trait='y2', fformula=c('cov1 + cov2 + pc1 + pc2'), 
            geno=geno_obj, pheno=pheno_obj, map=map_obj,  lambda=ans$setlambda)

A table of results is printed to the screen and saved in the R object res.

How to perform an analysis where individuals have multiple observations

Suppose,

To analyse these data, we would run the following:

  geno_obj <-  ReadMarker(filename='/my/dir/geno.ped', type='PLINK', availmemGb=32)
  
  pheno_obj <- ReadPheno(filename='/my/dir/pheno.txt', missing=99)

  map_obj   <- ReadMap(filename='/my/dir/map.txt')

  Zmat_obj  <- ReadZmat(filename='/my/dir/Zmatrix.txt')

  res <- AM(trait='y2', fformula=c('cov1 + cov2 + pc1 + pc2'), 
            geno=geno_obj, pheno=pheno_obj, map=map_obj, Zmat=Zmat_obj )

A table of results is printed to the screen and saved in the R object res.

Dealing with missing marker data

AM can tolerate some missing marker data. However, ideally, a specialized genotype imputation program such as 'BEAGLE' or 'PHASE2', should be used to impute the missing marker data before being read into 'Eagle'.

Dealing with missing trait data

AM deals automatically with individuals with missing trait data. These individuals are removed from the analysis and a warning message is generated.

Dealing with missing explanatory variable values

AM deals automatically with individuals with missing explanatory variable values. These individuals are removed from the analysis and a warning message is generated

Error Checking

Most errors occur when reading in the data. However, as an extra precaution, if quiet=FALSE, then additional output is printed during the running of AM. If AM is failing, then this output can be useful for diagnosing the problem.

Value

A list with the following components:

trait:

column name of the trait being used by 'AM'.

fformula:

the fixed effects part of the linear mixed model.

indxNA_pheno:

a vector containing the row indexes of the phenotyic data that have been removed from the analysis.

indxNA_geno:

a vector containing the row indexes of those genotypes that have been removed from the analysis due to missing data.

Mrk:

a vector with the names of the snp in strongest and significant association with the trait.If no loci are found to be significant, then this component is NA.

Chr:

the chromosomes on which the identified snp lie.

Pos:

the map positions for the identified snp.

Indx:

the column indexes in the marker file of the identified snp.

ncpu:

number of cpu used for the calculations.

availmemGb:

amount of RAM in gigabytes that has been set by the user.

quiet:

boolean value of the parameter.

extBIC:

numeric vector with the extended BIC values for the loci found to be in significant association with the trait.

lambda

the numeric value of the parameter.

See Also

FPR4AM , ReadMarker, ReadPheno, ReadZmat, and ReadMap

Examples

  ## Not run:  
  # Since the following code takes longer than 5 seconds to run, it has been tagged as dontrun. 
  # However, the code can be run by the user. 
  #

  #-------------------------
  #  Example  
  #------------------------

  # read the map 
  #~~~~~~~~~~~~~~
  
  # File is a plain space separated text file with the first row 
  # the column headings
  complete.name <- system.file('extdata', 'map.txt', 
                                   package='Eagle')
  map_obj <- ReadMap(filename=complete.name) 

  # read marker data
  #~~~~~~~~~~~~~~~~~~~~
  # Reading in a PLINK ped file 
  # and setting the available memory on the machine for the reading of the data to 8  gigabytes
  complete.name <- system.file('extdata', 'geno.ped', 
                                     package='Eagle')
  geno_obj <- ReadMarker(filename=complete.name,  type='PLINK', availmemGb=8) 
 
  # read phenotype data
  #~~~~~~~~~~~~~~~~~~~~~~~

  # Read in a plain text file with data on a single trait and two covariates
  # The first row of the text file contains the column names y, cov1, and cov2. 
  complete.name <- system.file('extdata', 'pheno.txt', package='Eagle')
  
  pheno_obj <- ReadPheno(filename=complete.name)
           

 # Performing multiple-locus genome-wide association mapping with a model 
 #    with fixed effects cov1 and cov2 and an intercept. The intercept 
 #    need not be specified as it is assumed. 
 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
  res <- AM(trait = 'y',
                           fformula=c('cov1+cov2'),
                           map = map_obj,
                           pheno = pheno_obj,
                           geno = geno_obj )

## End(Not run)


[Package Eagle version 2.5 Index]