field {nc} | R Documentation |
Capture a field
Description
Capture a field with a pattern of the form
list("field.name", between.pattern
,
field.name=list(...
)) – see examples.
Usage
field(field.name, between.pattern,
...)
Arguments
field.name |
Field name, used as a pattern and as a capture |
between.pattern |
Pattern to match after |
... |
Pattern(s) for matching field value. |
Value
Pattern list which can be used in capture_first_vec
,
capture_first_df
, or capture_all_str
.
Author(s)
Toby Hocking <toby.hocking@r-project.org> [aut, cre]
Examples
## Two ways to create the same pattern.
str(list("Alignment ", Alignment="[0-9]+"))
## To avoid typing Alignment twice use:
str(nc::field("Alignment", " ", "[0-9]+"))
## An example with lots of different fields.
info.txt.gz <- system.file(
"extdata", "SweeD_Info.txt.gz", package="nc")
info.vec <- readLines(info.txt.gz)
info.vec[24:40]
## For each Alignment there are many fields which have a similar
## pattern, and occur in the same order. One way to capture these
## fields is by coding a pattern that says to look for all of those
## fields in that order. Each field is coded using this helper
## function.
g <- function(name, fun=identity, suffix=list()){
list(
"\t+",
nc::field(name, ":\t+", ".*"),
fun,
suffix,
"\n+")
}
nc::capture_all_str(
info.vec,
nc::field("Alignment", " ", "[0-9]+"),
"\n+",
g("Chromosome"),
g("Sequences", as.integer),
g("Sites", as.integer),
g("Discarded sites", as.integer),
g("Processing", as.integer, " seconds"),
g("Position", as.integer),
g("Likelihood", as.numeric),
g("Alpha", as.numeric))
## Another example where field is useful.
trackDb.txt.gz <- system.file(
"extdata", "trackDb.txt.gz", package="nc")
trackDb.vec <- readLines(trackDb.txt.gz)
cat(trackDb.vec[101:115], sep="\n")
int.pattern <- list("[0-9]+", as.integer)
cell.sample.type <- list(
cellType="[^ ]*?",
"_",
sampleName=list(
"McGill",
sampleID=int.pattern),
dataType="Coverage|Peaks")
## Each block in the trackDb file begins with track, followed by a
## space, followed by the track name. That pattern is coded below,
## using field:
track.pattern <- nc::field(
"track",
" ",
cell.sample.type,
"|",
"[^\n]+")
nc::capture_all_str(trackDb.vec, track.pattern)
## Each line in a block has the same structure (field name, space,
## field value). Below we use the field function to extract the
## color line, along with columns for each of the three channels
## (red, green, blue).
any.lines.pattern <- "(?:\n[^\n]+)*"
nc::capture_all_str(
trackDb.vec,
track.pattern,
any.lines.pattern,
"\\s+",
nc::field(
"color", " ",
red=int.pattern, ",",
green=int.pattern, ",",
blue=int.pattern))
[Package nc version 2024.2.21 Index]