sampling {tosca} | R Documentation |
Sample Texts
Description
Sample texts from different subsets to minimize variance of the recall estimator
Usage
sampling(id, corporaID, label, m, randomize = FALSE, exact = FALSE)
Arguments
id |
Character: IDs of all texts in the corpus. |
corporaID |
List of Character: Each list element is a character vector and
contains the IDs belonging to one subcorpus. Each ID has to be in |
label |
Named Logical: Labeling result for already labeled texts. Could be empty, if no labeled data exists. The algorithm sets |
m |
Integer: Number of new samples. |
randomize |
Logical: If |
exact |
Logical: If |
Value
Character vector of IDs, which should be labeled next.
Examples
id <- paste0("ID", 1:1000)
corporaID <- list(sample(id, 300), sample(id, 100), sample(id, 700))
label <- sample(as.logical(0:1), 150, replace=TRUE)
names(label) <- c(sample(id, 100), sample(corporaID[[2]], 50))
m <- 100
sampling(id, corporaID, label, m)
[Package tosca version 0.3-2 Index]