lod_QC {lodGWAS}R Documentation

Checking the coding of phenotype and outsideLOD values

Description

It is important that the phenotype and outsideLOD values in the phenotype file are coded correctly. The function lod_QC checks the quality of the phenotype file, and provides a number of descriptive statistics about the phenotype, as well as warnings about suspected problems.

Usage

lod_QC(phenofile,
      pheno_name,
      filedirectory = getwd(),
      outputfile = "lodQC",
      lower_limit = NA, upper_limit = NA,
      stop_if_error = FALSE,
      reprint_warnings = FALSE)

Arguments

phenofile

either a data frame containing the phenotype (and covariate) values, or the filename (including the extension) of a data file containing the same. For details on the required format of the phenotype file, see the section on File Format in the description of lod_GWAS.

pheno_name

the name of the column in phenofile contain the phenotype values.

filedirectory

the directory that contains the phenotype file and where the output file will be saved. Please note that R uses forward slash (/) where Windows uses backslash (\). The default setting is current R working directory.

outputfile

the name for the output file.

lower_limit, upper_limit

two numeric values specifying the lower and upper LOD of the assay. Defaults are NA, in which case lod_QC only checks for gaps between measured and censored values.

stop_if_error

logical; determines whether the function aborts or continues when it encounters errors in the dataset. Default is FALSE.

reprint_warnings

logical; this argument is used by lod_GWAS and isn't relevant for users. Setting it to TRUE will cause certain errors to be reported via warning (in addition to the console and the log file). Default is FALSE.

Details

The user can specify values for the lower and upper LOD. The function will then compare the values in the column outsideLOD with the phenotype values, conditional on the user-specified LOD. For details on the required format of the phenotype file, see the section on File Format in the description of lod_GWAS.

Also, lod_QC will check if there is a gap between the smallest/largest values within the LOD (i.e. truly measured values, where outsideLOD is 1), and the censored values. If the censored values are far smaller/larger than smallest/largest measured value, lod_QC will print a warning. An example of such an erroneous coding of censored values is when the phenotype values below the lower LOD have been set to -9, while the measured values are >0.

Value

The function lod_QC returns an invisible NULL. The real output is a log file containing information on the distribution of the phenotype, and any problems it encountered. These warnings are also displayed in the R console.

Note

The function lod_QC only checks and reports. It does not correct the phenotype or outsideLOD values.

Also, the function assumes that there is a single lower and/or upper LOD. If multiple LODs are used in the file, the function may produce warnings when both censored and real values do not match the specified limits. In that case it is up to the user to ensure the quality of the phenotype file.

See Also

lod_GWAS for the complete GWAS function.

Examples

phenos <- data.frame(FID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                     IID = c("female1", "male2", "male3", "female4", "female5",
                             "male6", "female7", "male8", "male9", "male10"),
                     sex = c(0, 1, 1, 0, 0, 1, 0, 1, 1, 1),
                     outcome1 = c(0.3, 0.5, 0.9, 0.7, 2, 2, 0.1, 1.1, 2, 0.7),
                     outsideLOD = as.integer(c(1, 1, 1, 1, 1, 0, 2, 1, 0, 1)),
                     stringsAsFactors = FALSE)

lod_QC(phenofile = phenos, pheno_name = "outcome1",
       outputfile = "Sample_QC", lower_limit = 0.1, upper_limit = 2)

[Package lodGWAS version 1.0-7 Index]