knn_domain_score {viraldomain}R Documentation

Calculate the K-Nearest Neighbor model domain applicability score

Description

This function fits a K-Nearest Neighbor (KNN) model to the provided data and computes a domain applicability score based on PCA distances.

Usage

knn_domain_score(
  featured,
  train_data,
  knn_hyperparameters,
  test_data,
  threshold_value
)

Arguments

featured

The name of the response variable to predict.

train_data

The training dataset containing predictor variables and the response variable.

knn_hyperparameters

A list of hyperparameters for the KNN model, including:

  • neighbors: The number of neighbors to consider.

  • weight_func: The weight function to use.

  • dist_power: The distance power parameter.

test_data

The test dataset for making predictions.

threshold_value

The threshold value used for computing domain scores.

Value

A data frame containing the computed domain scores for each observation in the test dataset.

Examples

set.seed(123)
library(dplyr)
featured <- "cd_2022"
# Adding jitter to original features
train_data = viral |>
transmute(cd_2022 = jitter(cd_2022), vl_2022 = jitter(vl_2022))
test_data = sero |>
transmute(cd_2022 = jitter(cd_2022), vl_2022 = jitter(vl_2022))
knn_hyperparameters <- list(neighbors = 5, weight_func = "optimal", dist_power = 0.3304783)
threshold_value <- 0.99
# Call the function
knn_domain_score(featured, train_data, knn_hyperparameters, test_data, threshold_value)

[Package viraldomain version 0.0.3 Index]