R: Convert columns in a dataframe to genotypes or haplotypes

makeGenotypes {genetics}

R Documentation

Convert columns in a dataframe to genotypes or haplotypes

Description

Convert columns in a dataframe to genotypes or haplotypes.

Usage

makeGenotypes(data, convert, sep = "/", tol = 0.5, ..., method=as.genotype)
makeHaplotypes(data, convert, sep = "/", tol = 0.9, ...)

Arguments

`data`	Dataframe containing columns to be converted
`convert`	Vector or list of pairs specifying which columns contain genotype/haplotype data. See below for details.
`sep`	Genotype separator
`tol`	See below.
`...`	Optional arguments to as.genotype function
`method`	Function used to perform the conversion.

Details

The functions makeGenotypes and makeHaplotypes allow the conversion of all of the genetic variables in a dataset to genotypes or haplotypes in a single step.

The parameter convert may be missing, a vector of column names, indexes or true/false indictators, or a list of column name or index pairs.

When the argument convert is not provided, the function will look for columns where at least tol*100% of the records contain the separator character sep ('/' by default). These columns will then be assumed to contain both of the genotype/haplotype alleles and will be converted in-place to genotype variables.

When the argument convert is a vector of column names, indexes or true/false indictators, the corresponding columns will be assumed to contain both of the genotype/haplotype alleles and will be converted in-place to genotype variables.

When the argument convert is a list containing column name or index pairs, the two elements of each pair will be assumed to contain the individual alleles of a genotype/haplotype. The first column specified in each pair will be replaced with the new genotype/haplotype variable named name1 + sep + name2. The second column will be removed.

Note that the method argument may be used to supply a non-standard conversion function, such as as.genotype.allele.count, which converts from [0,1,2] to ['A/A','A/B','A/C'] (or the specified allele names). See the example below.

Value

Dataframe containing converted genotype/haplotype variables. All other variables will be unchanged.

Author(s)

Gregory R. Warnes greg@warnes.net

Examples

## Not run: 
# common case
data <- read.csv(file="genotype_data.csv")
data <- makeGenotypes(data)

## End(Not run)

# Create a test data set where there are several genotypes in columns
# of the form "A/T".
test1 <- data.frame(Tmt=sample(c("Control","Trt1","Trt2"),20, replace=TRUE),
                G1=sample(c("A/T","T/T","T/A",NA),20, replace=TRUE),
                N1=rnorm(20),
                I1=sample(1:100,20,replace=TRUE),
                G2=paste(sample(c("134","138","140","142","146"),20,
                                replace=TRUE),
                         sample(c("134","138","140","142","146"),20,
                                replace=TRUE),
                         sep=" / "),
                G3=sample(c("A /T","T /T","T /A"),20, replace=TRUE),
                comment=sample(c("Possible Bad Data/Lab Error",""),20,
                               rep=TRUE)
                )
test1

# now automatically convert genotype columns
geno1 <- makeGenotypes(test1)
geno1

# Create a test data set where there are several haplotypes with alleles
# in adjacent columns.
test2 <- data.frame(Tmt=sample(c("Control","Trt1","Trt2"),20, replace=TRUE),
                    G1.1=sample(c("A","T",NA),20, replace=TRUE),
                    G1.2=sample(c("A","T",NA),20, replace=TRUE),
                    N1=rnorm(20),
                    I1=sample(1:100,20,replace=TRUE),
                    G2.1=sample(c("134","138","140","142","146"),20,
                                replace=TRUE),
                    G2.2=sample(c("134","138","140","142","146"),20,
                                replace=TRUE),
                    G3.1=sample(c("A ","T ","T "),20, replace=TRUE),
                    G3.2=sample(c("A ","T ","T "),20, replace=TRUE),
                    comment=sample(c("Possible Bad Data/Lab Error",""),20,
                                   rep=TRUE)
                   ) 
test2

# specifly the locations of the columns to be paired for haplotypes
makeHaplotypes(test2, convert=list(c("G1.1","G1.2"),6:7,8:9))

# Create a test data set where the data is coded as numeric allele
# counts (0-2).
test3 <- data.frame(Tmt=sample(c("Control","Trt1","Trt2"),20, replace=TRUE),
                    G1=sample(c(0:2,NA),20, replace=TRUE),
                    N1=rnorm(20),
                    I1=sample(1:100,20,replace=TRUE),
                    G2=sample(0:2,20, replace=TRUE),
                    comment=sample(c("Possible Bad Data/Lab Error",""),20,
                                   rep=TRUE)
                   ) 
test3

# specifly the locations of the columns, and a non-standard conversion
makeGenotypes(test3, convert=c('G1','G2'), method=as.genotype.allele.count)

[Package genetics version 1.3.8.1.3 Index]