find_keycol {countries} | R Documentation |
Find a set of columns that uniquely identifies table entries
Description
This function takes a data frame as argument and returns the column names (or indices) of a set of columns that uniquely identify the table entries (i.e. table key). It can be used to automate the search of table keys. Since the function was designed for country data, it will first search for columns containing country names and dates/years. These columns will be given priority in the search for keys. Next, the function prioritises left-most columns in the table. For time efficiency, the function does not test all possible combination of columns, it just tests the most likely combinations. The function will look for the most common country data formats (e.g. cross-sectional, time-series, panel data, dyadic, etc.) and searches for up to 2 additional key columns beyond country and time columns.
Usage
find_keycol(
x,
return_index = FALSE,
search_only = NA,
sample_size = 1000,
allow_NA = FALSE
)
Arguments
x |
A data frame object |
return_index |
A logical value indicating whether the function should return the index of country columns instead of the column names. Default is |
search_only |
This parameter can be used to restrict the search of table keys to a subset of columns. The default is |
sample_size |
Either |
allow_NA |
Logical value indicating whether to allow key columns to have |
Value
Returns a vector of column names (or indices) that uniquely identify the entries in the table. If no key is found, the function will return NULL
. The output is a named vector indicating whether the identified key columns contain country names ("country"
), year and dates ("time"
), or other type of information ("other"
).
See Also
find_timecol, find_countrycol, is_keycol
Examples
example <-data.frame(nation=rep(c("FRA","ALB","JOR"),3),
year=c(rep(2000,3),rep(2005,3),rep(2010,3)),
var=runif(9))
find_keycol(x=example)