check_IDs {canprot}R Documentation

Check UniProt IDs

Description

Find the first ID for each protein that matches a known UniProt ID.

Usage

  check_IDs(dat, IDcol, aa_file = NULL, updates_file = NULL)

Arguments

dat

data frame, protein expression data

IDcol

character, name of column that has the UniProt IDs

aa_file

character, name of file with additional amino acid compositions

updates_file

character, name of file with old to new ID mappings

Details

check_IDs is used to check for known UniProt IDs and to update obsolete IDs. The source IDs should be provided in the IDcol column of dat; multiple IDs for one protein can be separated by a semicolon.

The function keeps the first “known” ID for each protein, which must be present in one of these groups:

Value

dat is returned with possibly changed values in the column designated by IDcol; old IDs are replaced with new ones, the first known ID for each protein is kept, then proteins with no known IDs are assigned NA.

See Also

This function is used by the pdat_ functions, where it is called before cleanup.

Examples

# Make up some data for this example
ID <- c("P61247;PXXXXX", "PYYYYY;P46777;P60174", "PZZZZZ")
dat <- data.frame(ID = ID, stringsAsFactors = FALSE)
# Get the first known ID for each protein; the third one is NA
check_IDs(dat, "ID")

# Update an old ID
dat <- data.frame(Entry = "P50224", stringsAsFactors = FALSE)
check_IDs(dat, "Entry")

[Package canprot version 1.1.0 Index]