R: Select nearest datasets given input 'x'.

nearest_datasets {pmlbr}

R Documentation

Select nearest datasets given input 'x'.

Description

If 'x' is a data.frame object, computes dataset characteristics. If 'x' is a character object specifying dataset name from PMLB, use the already computed dataset statistics/characteristics in 'summary_stats'.

Usage

nearest_datasets(x, ...)

## Default S3 method:
nearest_datasets(x, ...)

## S3 method for class 'character'
nearest_datasets(
  x,
  n_neighbors = 5,
  dimensions = c("n_instances", "n_features"),
  target_name = "target",
  ...
)

## S3 method for class 'data.frame'
nearest_datasets(
  x,
  y = NULL,
  n_neighbors = 5,
  dimensions = c("n_instances", "n_features"),
  task = c("classification", "regression"),
  target_name = "target",
  ...
)

Arguments

`x`	Character string of dataset name from PMLB, or data.frame of n_samples x n_features(or n_features+1 with a target column)
`...`	Further arguments passed to each method.
`n_neighbors`	Integer. The number of dataset names to return as neighbors.
`dimensions`	Character vector specifying dataset characteristics to include in similarity calculation. Dimensions must correspond to numeric columns of [all_summary_stats.tsv](https://github.com/EpistasisLab/pmlb/blob/master/pmlb/all_summary_stats.tsv). If 'all' (default), uses all numeric columns.
`target_name`	Character string specifying column of target/dependent variable.
`y`	Vector of target column. Required when 'x“ does not contain the target column.
`task`	Character string specifying classification or regression for summary stat generation.

Value

Character string of names of most similar datasets to df, most similar dataset first.

Examples

nearest_datasets('penguins')
nearest_datasets(fetch_data('penguins'))

[Package pmlbr version 0.2.1 Index]