conc {mclm} | R Documentation |
Build a concordance for the matches of a regex
Description
This function builds a concordance for the matches of a regular expression. The result is a
dataset that can be written to a file with the function write_conc()
.
It mimics the behavior of the concordance tool in the program AntConc.
Usage
conc(
x,
pattern,
c_left = 200,
c_right = 200,
perl = TRUE,
re_drop_line = NULL,
line_glue = "\n",
re_cut_area = NULL,
file_encoding = "UTF-8",
as_text = FALSE
)
Arguments
x |
A character vector determining which text is to be used as corpus. If If |
pattern |
Character string containing the regular expression that serves as search term for the concordancer. |
c_left |
Number. How many characters to the left of each match must be included in the result as left co-text of the match. |
c_right |
Number. How many characters to the right of each match must be included in the result as right co-text of the match. |
perl |
If |
re_drop_line |
Character vector or |
line_glue |
Character vector or |
re_cut_area |
Character vector or |
file_encoding |
File encoding for reading each corpus file. Ignored if
|
as_text |
Logical.
If If |
Details
In order to make sure that the columns left
, match
,
and right
in the output of conc
do not contain any TAB or NEWLINE
characters, whitespace in these items is being 'normalized'.
More particularly, each stretch of whitespace, i.e. each uninterrupted
sequences of whitespace characters, is replaced by a single SPACE character.
The values in the items the glob_id
and id
in the output
of conc
are always identical in a dataset that is the output of the
function conc
. The item glob_id
only becomes useful when later,
for instance, one wants to merge two datasets.#'
Value
Object of class conc
, a kind of data frame with as its rows
the matches and with the following columns:
-
glob_id
: Number indicating the position of the match in the overall list of matches. -
id
: Number indicating the position of the match in the list of matches for one specific query. -
source
: Either the filename of the file in which the match was found (in case of the settingas_text = FALSE
), or the string '-' (in case of the settingas_text = TRUE
). -
left
: The left-hand side co-text of each match. -
match
: The actual match. -
right
: The right-hand side co-text of each match.
It also has additional attributes and methods such as:
base
as_data_frame()
andprint()
methods, as well as aprint_kwic()
function,an
explore()
method.
An object of class conc
can be merged with another by means of merge_conc()
.
It can be written to file with write_conc()
and then
read with read_conc()
. It is also possible to import concordances created
by means other than write_conc()
with import_conc()
.
Examples
(conc_data <- conc('A very small corpus.', '\\w+', as_text = TRUE))
print(conc_data)
print_kwic(conc_data)