nearest_datasets {pmlbr} | R Documentation |
Select nearest datasets given input 'x'.
Description
If 'x' is a data.frame object, computes dataset characteristics. If 'x' is a character object specifying dataset name from PMLB, use the already computed dataset statistics/characteristics in 'summary_stats'.
Usage
nearest_datasets(x, ...)
## Default S3 method:
nearest_datasets(x, ...)
## S3 method for class 'character'
nearest_datasets(
x,
n_neighbors = 5,
dimensions = c("n_instances", "n_features"),
target_name = "target",
...
)
## S3 method for class 'data.frame'
nearest_datasets(
x,
y = NULL,
n_neighbors = 5,
dimensions = c("n_instances", "n_features"),
task = c("classification", "regression"),
target_name = "target",
...
)
Arguments
x |
Character string of dataset name from PMLB, or data.frame of n_samples x n_features(or n_features+1 with a target column) |
... |
Further arguments passed to each method. |
n_neighbors |
Integer. The number of dataset names to return as neighbors. |
dimensions |
Character vector specifying dataset characteristics to include in similarity calculation. Dimensions must correspond to numeric columns of [all_summary_stats.tsv](https://github.com/EpistasisLab/pmlb/blob/master/pmlb/all_summary_stats.tsv). If 'all' (default), uses all numeric columns. |
target_name |
Character string specifying column of target/dependent variable. |
y |
Vector of target column. Required when 'x“ does not contain the target column. |
task |
Character string specifying classification or regression for summary stat generation. |
Value
Character string of names of most similar datasets to df, most similar dataset first.
Examples
nearest_datasets('penguins')
nearest_datasets(fetch_data('penguins'))