R: gene annotation counts

gene_annot_counts {GARCOM}

R Documentation

gene annotation counts

Description

The function returns a matrix with allelic counts per gene per individual for annotated SNPs

Usage

gene_annot_counts(dt_gen,dt_snpgene,keep_indiv=NULL,
extract_SNP=NULL,filter_gene=NULL,
impute_missing=FALSE,impute_method="mean")

Arguments

`dt_gen`	a dataframe for genetic data that follows PLINK format (.raw)
`dt_snpgene`	a dataframe that contains SNP and annotated gene with SNP and GENE as column name
`keep_indiv`	an option to specify individuals to retain. Mutation counts will be provided for individuals included in the list only. Default is all individuals. Provide list of individuals in a vector.
`extract_SNP`	an option to specify SNPs for which mutation counts are needed. Mutation counts will be provided for SNPs provided in the list only. Default all SNPs are used. Provide list of SNPs in a vector.
`filter_gene`	an option to filter in a list of Genes. Mutation counts will be provided for genes specifed in the list only. Default is all genes. Provide list of genes in a vector.
`impute_missing`	an option to impute missing genotypes. Default is FALSE.
`impute_method`	an option to specify imptuation method. Default method is imputation to the mean. Alternatively imputation can be carried out by median. Function accepts method in quotes: "mean" or "median". Data are rounded to the second decimal places (e.g. 0.1234 will become 0.12).

Details

Inputs needed are recoded genetic data formatted in PLINK format (.raw) and SNP-gene annotation data. The first six columns of the input genetic data follow standard PLINK .raw format. Column names as FID, IID, PAT, MAT, SEX and PHENOTYPE followed by SNP information as recoded by the PLINK software. SNP-gene data has two columns: GENE and SNP names. The function returns allelic counts per gene per sample (where each row represents a gene and each column represents an individual starting with the second column where first column contains gene information).

Value

Returns an object of data.table class as an output with allelic gene counts within each sample where each row corresponds to gene and column to individual IDs from column second. The first column contains gene names.

Author(s)

Sanjeev Sariya

Examples


#Package provides sample data that are loaded with package loading. 

data(recodedgen) #PLINK raw formatted data of 10 individiduals with 10 SNPs

data(snpgene) #SNP and its respective GENE annotated. 
#Here 10 SNPs are shown annotated in five genes. 
#A SNP can be annotated in multiple genes. 

gene_annot_counts(recodedgen,snpgene) #run the function

#subset Genes
gene_annot_counts(recodedgen,snpgene,filter_gene=c("GENE1","GENE2"))

#Subset individuals
gene_annot_counts(recodedgen, snpgene,keep_indiv=c("IID_sample1","IID_sample8"))

#subset with genes and samples
gene_annot_counts(recodedgen,snpgene,filter_gene=c("GENE1","GENE2"),
keep_indiv=c("IID_sample1","IID_sample8"))

#impute missing using default method. 

gene_annot_counts(recodedgen,snpgene,impute_missing=TRUE)

#Subset on individuals and impute for missing values. Default as mean
gene_annot_counts(recodedgen,snpgene,impute_missing=TRUE,
keep_indiv=c("IID_sample1","IID_sample2","IID_sample10"))

#impute using median method
gene_annot_counts(recodedgen,snpgene,impute_missing=TRUE,impute_method="median")

#end not RUN

[Package GARCOM version 1.2.2 Index]