data_sampling {GaMaBioMD} | R Documentation |
Samples data from each SampleID group if specified, otherwise uses the final data.
Description
This function takes a data frame with 'SampleID' and 'SequenceID' columns and either returns the original data frame (if sample_proportion is NULL) or samples a specified proportion from each SampleID group.
Usage
data_sampling(final_data, sample_proportion = NULL)
Arguments
final_data |
A data frame with 'SampleID' and 'SequenceID' columns. |
sample_proportion |
Proportion of data to sample from each SampleID group. If NULL, the original data frame is returned. |
Value
A data frame either with the original data or sampled data based on the specified proportion.
Examples
accession_ranges <- list(
SRU1 = "AJ240966 to AJ240970",
STU2 = "AB015240 to AB015245",
WPU13 = "L11934 to L11939",
INU20 = c("AF277467 to AF277470", "AF333080 to AF333085")
)
# Use the function to expand accession ranges
sam_acc <- expand_accession_ranges(accession_ranges)
print(sam_acc)
# 2 get_sequence_information
accessions_to_query <- sam_acc$accession
seq_info <- get_sequence_information(accessions_to_query, remove_dot_1 = TRUE)
print(seq_info)
result <- preprocess_for_alignment(sam_acc, seq_info)
# Access the resulting data frames
merged_data <- result$merged_data
main_data <- result$main_data
final_data <- result$final_data # use final_data
# If you want to sample 10% from each SampleID group:
sampled_data <- data_sampling(final_data, sample_proportion = 0.1)
[Package GaMaBioMD version 0.2.0 Index]