dfm_subset {quanteda} | R Documentation |
Extract a subset of a dfm
Description
Returns document subsets of a dfm that meet certain conditions,
including direct logical operations on docvars (document-level variables).
dfm_subset
functions identically to subset.data.frame()
,
using non-standard evaluation to evaluate conditions based on the
docvars in the dfm.
Usage
dfm_subset(
x,
subset,
min_ntoken = NULL,
max_ntoken = NULL,
drop_docid = TRUE,
...
)
Arguments
x |
dfm object to be subsetted. |
subset |
logical expression indicating the documents to keep: missing values are taken as false. |
min_ntoken , max_ntoken |
minimum and maximum lengths of the documents to extract. |
drop_docid |
if |
... |
not used |
Details
To select or subset features, see dfm_select()
instead.
When select
is a dfm, then the returned dfm will be equal in
document dimension and order to the dfm used for selection. This is the
document-level version of using dfm_select()
where
pattern
is a dfm: that function matches features, while
dfm_subset
will match documents.
Value
dfm object, with a subset of documents (and docvars) selected according to arguments
See Also
Examples
corp <- corpus(c(d1 = "a b c d", d2 = "a a b e",
d3 = "b b c e", d4 = "e e f a b"),
docvars = data.frame(grp = c(1, 1, 2, 3)))
dfmat <- dfm(tokens(corp))
# selecting on a docvars condition
dfm_subset(dfmat, grp > 1)
# selecting on a supplied vector
dfm_subset(dfmat, c(TRUE, FALSE, TRUE, FALSE))