errorCorrection {genBaRcode} | R Documentation |
Error Correction
Description
Corrects a list of equally long (barcode) sequences. Based on calculated hamming distances as a measure of similarity, highly similar sequences are clustered together and the cluster label will be the respective sequence with the highest read count.
Usage
errorCorrection(
BC_dat,
maxDist,
save_it = FALSE,
cpus = 1,
strategy = "sequential",
m = "hamming",
type = "standard",
only_EC_BCs = TRUE,
EC_analysis = FALSE,
start_small = TRUE
)
Arguments
BC_dat |
one or a list of BCdat objects, containing the necessary sequences. |
maxDist |
an integer value representing the maximal hamming distance for which it is allowed to cluster two sequences together. |
save_it |
a logical value. If TRUE the data will be saved as csv-file. |
cpus |
an integer value, in case multiple BCdat objects are provided a CPU number greater than one would allow for a parallelized calculation (one CPU per BCdat object). |
strategy |
since the future package is used for parallelisation a strategy has to be stated, the default is "sequential" (cpus = 1) and "multiprocess" (cpus > 1). It is not necessary to chose a certain strategy, since it will be adjusted accordingly to the number of cpus which were choosen. For further information please read future::plan() R-Documentation. |
m |
a character string, Method for distance calculation, default value is Hamming distance. Possible values are "osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex" (see stringdist function of the stringdist-package for more information) |
type |
there are different error correction strategies avalable ("standard", "connectivity based", "graph based", "clustering"). |
only_EC_BCs |
a logical value. If TRUE only informations about barcodes which are still present after error correction will be saved. Only meaningful if EC_analysis is set to TRUE. |
EC_analysis |
a logical value. If TRUE additional error correction details will be returned, which can also be visualised with the respective "error correction" plots. |
start_small |
a logical value. If TRUE, the error correcton type "standard" will cluster always the smallest highly similar BC with the BC of interest. IF FALSE, the error correcton type "standard" will adapt its cluster strategy and cluster always BC of interest with the most frequent highly similar BC. |
Examples
data(BC_dat)
BC_dat_EC <- errorCorrection(BC_dat, maxDist = 8, save_it = FALSE, m = "hamming")