twobit_seqstats {Rtwobitlib} | R Documentation |
Extract sequence lengths and letter counts from a .2bit file
Description
Extract the lengths and letter counts of the DNA sequences stored
in a .2bit
file.
Usage
twobit_seqstats(filepath)
twobit_seqlengths(filepath)
Arguments
filepath |
A single string (character vector of length 1) containing a path
to a |
Details
twobit_seqlengths(filepath)
is a shortcut for
twobit_seqstats(filepath)[ , "seqlengths"]
that is also a
much more efficient way to get the sequence lengths as it does not
need to load the sequence data in memory.
Value
For twobit_seqstats()
: An integer matrix with one row per sequence
in the .2bit
file and 6 columns. The rownames on the matrix are the
sequence names and the colnames are: seqlengths
, A
, C
,
G
, T
, N
. Columns A
, C
, G
, T
,
and N
contain the letter count for each sequence.
For twobit_seqlengths()
: A named integer vector where the names
are the sequence names and the values the corresponding lengths.
References
A quick overview of the 2bit format: https://genome.ucsc.edu/FAQ/FAQformat.html#format7
See Also
twobit_read
and twobit_write
to read/write a
character vector representing DNA sequences from/to a file in 2bit
format.
Examples
filepath <- system.file(package="Rtwobitlib", "extdata", "sacCer2.2bit")
twobit_seqstats(filepath)
twobit_seqlengths(filepath)
## Sanity checks:
sacCer2_seqstats <- twobit_seqstats(filepath)
stopifnot(
identical(sacCer2_seqstats[ , 1], twobit_seqlengths(filepath)),
all.equal(rowSums(sacCer2_seqstats[ , -1]), sacCer2_seqstats[ , 1])
)