| twobit_seqstats {Rtwobitlib} | R Documentation |
Extract sequence lengths and letter counts from a .2bit file
Description
Extract the lengths and letter counts of the DNA sequences stored
in a .2bit file.
Usage
twobit_seqstats(filepath)
twobit_seqlengths(filepath)
Arguments
filepath |
A single string (character vector of length 1) containing a path
to a |
Details
twobit_seqlengths(filepath) is a shortcut for
twobit_seqstats(filepath)[ , "seqlengths"] that is also a
much more efficient way to get the sequence lengths as it does not
need to load the sequence data in memory.
Value
For twobit_seqstats(): An integer matrix with one row per sequence
in the .2bit file and 6 columns. The rownames on the matrix are the
sequence names and the colnames are: seqlengths, A, C,
G, T, N. Columns A, C, G, T,
and N contain the letter count for each sequence.
For twobit_seqlengths(): A named integer vector where the names
are the sequence names and the values the corresponding lengths.
References
A quick overview of the 2bit format: https://genome.ucsc.edu/FAQ/FAQformat.html#format7
See Also
twobit_read and twobit_write to read/write a
character vector representing DNA sequences from/to a file in 2bit
format.
Examples
filepath <- system.file(package="Rtwobitlib", "extdata", "sacCer2.2bit")
twobit_seqstats(filepath)
twobit_seqlengths(filepath)
## Sanity checks:
sacCer2_seqstats <- twobit_seqstats(filepath)
stopifnot(
identical(sacCer2_seqstats[ , 1], twobit_seqlengths(filepath)),
all.equal(rowSums(sacCer2_seqstats[ , -1]), sacCer2_seqstats[ , 1])
)