AlcesteSource {tm.plugin.alceste}R Documentation

Alceste Source

Description

Construct a source for an input containing a set of texts saved in the Alceste format in a single text file.

Usage

  AlcesteSource(x, encoding = "auto")

Arguments

x

Either a character identifying the file or a connection.

encoding

A character string: if non-empty declares the encoding used when reading the file, so the character data can be re-encoded. See the ‘Encoding’ section of the help for file. The default, “auto”, uses stri_enc_detect to try to guess the encoding; this may fail, in which case the native encoding is used.

Details

Several texts are saved in a single Alceste-formatted file, separated by lines starting with “***” or digits, followed by starred variables (see links below). These variables are set as document meta-data that can be accessed via the meta function.

Currently, “theme” lines starting with “-*” are ignored.

Value

An object of class AlcesteSource which extends the class Source representing set of articles from Alceste.

Author(s)

Milan Bouchet-Valat

See Also

https://image-zafar.com/sites/default/files/telechargements/formatage_alceste.pdf (in French) about the Alceste format

readAlceste for the function actually parsing individual articles.

getSources to list available sources.

Examples

    library(tm)
    file <- system.file("texts", "alceste_test.txt", 
                        package = "tm.plugin.alceste")
    corpus <- Corpus(AlcesteSource(file))

    # See the contents of the documents
    inspect(corpus)

    # See meta-data associated with first article
    meta(corpus[[1]])

[Package tm.plugin.alceste version 1.1.1 Index]