twobit_roundtrip {Rtwobitlib}R Documentation

Read/write a .2bit file

Description

Read/write a character vector representing DNA sequences from/to a file in 2bit format.

Usage

twobit_read(filepath)

twobit_write(x, filepath, use.long=FALSE, skip.dups=FALSE)

Arguments

filepath

A single string (character vector of length 1) containing a path to the file to read or write.

x

A named character vector representing DNA sequences. The names on the vector should be unique and the sequences should only contain A's, C's, G's, T's, or N's, in uppercase or lowercase.

use.long

By default the 2bit format cannot store more than 4Gb of sequence data in total. Set use.long to TRUE if your sequence data is bigger than that.

skip.dups

By default duplicate sequence names are an error. By setting skip.dups to FALSE, sequences with a duplicated name will be skipped with a warning.

Value

For twobit_read(): A named character vector containing the DNA sequences loaded from the file.

For twobit_write(): filepath returned invisibly.

References

A quick overview of the 2bit format: https://genome.ucsc.edu/FAQ/FAQformat.html#format7

See Also

twobit_seqstats and twobit_seqlengths to extract the sequence lengths and letter counts from a .2bit file.

Examples

## Read:
inpath <- system.file(package="Rtwobitlib", "extdata", "sacCer2.2bit")
dna <- twobit_read(inpath)
names(dna)
nchar(dna)

## Write:
outpath <- twobit_write(dna, tempfile())

## Sanity checks:
library(tools)
stopifnot(md5sum(inpath) == md5sum(outpath))
stopifnot(identical(nchar(dna), twobit_seqlengths(inpath)))

[Package Rtwobitlib version 0.3.6 Index]