check_IDs {canprot}R Documentation

Check UniProt IDs


Find the first ID for each protein that matches a known UniProt ID.


  check_IDs(dat, IDcol, aa_file = NULL, updates_file = NULL)



data frame, protein expression data


character, name of column that has the UniProt IDs


character, name of file with additional amino acid compositions


character, name of file with old to new ID mappings


check_IDs is used to check for known UniProt IDs and to update obsolete IDs. The source IDs should be provided in the IDcol column of dat; multiple IDs for one protein can be separated by a semicolon.

The function keeps the first “known” ID for each protein, which must be present in one of these groups:


dat is returned with possibly changed values in the column designated by IDcol; old IDs are replaced with new ones, the first known ID for each protein is kept, then proteins with no known IDs are assigned NA.

See Also

This function is used by the pdat_ functions, where it is called before cleanup.


# Make up some data for this example
ID <- c("P61247;PXXXXX", "PYYYYY;P46777;P60174", "PZZZZZ")
dat <- data.frame(ID = ID, stringsAsFactors = FALSE)
# Get the first known ID for each protein; the third one is NA
check_IDs(dat, "ID")

# Update an old ID
dat <- data.frame(Entry = "P50224", stringsAsFactors = FALSE)
check_IDs(dat, "Entry")

