twobit_roundtrip {Rtwobitlib} | R Documentation |
Read/write a .2bit file
Description
Read/write a character vector representing DNA sequences from/to a file in 2bit format.
Usage
twobit_read(filepath)
twobit_write(x, filepath, use.long=FALSE, skip.dups=FALSE)
Arguments
filepath |
A single string (character vector of length 1) containing a path to the file to read or write. |
x |
A named character vector representing DNA sequences. The names on the vector should be unique and the sequences should only contain A's, C's, G's, T's, or N's, in uppercase or lowercase. |
use.long |
By default the 2bit format cannot store more than 4Gb of sequence
data in total. Set |
skip.dups |
By default duplicate sequence names are an error. By setting
|
Value
For twobit_read()
: A named character vector containing the DNA
sequences loaded from the file.
For twobit_write()
: filepath
returned invisibly.
References
A quick overview of the 2bit format: https://genome.ucsc.edu/FAQ/FAQformat.html#format7
See Also
twobit_seqstats
and twobit_seqlengths
to
extract the sequence lengths and letter counts from a .2bit
file.
Examples
## Read:
inpath <- system.file(package="Rtwobitlib", "extdata", "sacCer2.2bit")
dna <- twobit_read(inpath)
names(dna)
nchar(dna)
## Write:
outpath <- twobit_write(dna, tempfile())
## Sanity checks:
library(tools)
stopifnot(md5sum(inpath) == md5sum(outpath))
stopifnot(identical(nchar(dna), twobit_seqlengths(inpath)))