data_sampling {GaMaBioMD}R Documentation

Samples data from each SampleID group if specified, otherwise uses the final data.

Description

This function takes a data frame with 'SampleID' and 'SequenceID' columns and either returns the original data frame (if sample_proportion is NULL) or samples a specified proportion from each SampleID group.

Usage

data_sampling(final_data, sample_proportion = NULL)

Arguments

final_data

A data frame with 'SampleID' and 'SequenceID' columns.

sample_proportion

Proportion of data to sample from each SampleID group. If NULL, the original data frame is returned.

Value

A data frame either with the original data or sampled data based on the specified proportion.

Examples


accession_ranges <- list(
  SRU1 = "AJ240966 to AJ240970",
  STU2 = "AB015240 to AB015245",
  WPU13 = "L11934 to L11939",
  INU20 = c("AF277467 to AF277470", "AF333080 to AF333085")
)

# Use the function to expand accession ranges
sam_acc <- expand_accession_ranges(accession_ranges)
print(sam_acc)

# 2 get_sequence_information
accessions_to_query <- sam_acc$accession
seq_info <- get_sequence_information(accessions_to_query, remove_dot_1 = TRUE)
print(seq_info)
result <- preprocess_for_alignment(sam_acc, seq_info)

# Access the resulting data frames
merged_data <- result$merged_data
main_data <- result$main_data
final_data <- result$final_data # use final_data

# If you want to sample 10% from each SampleID group:
sampled_data <- data_sampling(final_data, sample_proportion = 0.1)


[Package GaMaBioMD version 0.2.0 Index]