tm.plugin.europresse-package {tm.plugin.europresse}R Documentation

A plug-in for the tm text mining framework to import articles from Europresse

Description

This package provides a tm Source to create corpora from articles exported from the Europresse content provider as HTML files.

Details

Typical usage is to create a corpus from HTML files exported from Europresse (here called myEuropresseArticles.html). Frequently, it is necessary to specify the encoding of the texts via link{EuropresseSource}'s encoding argument.

    # Import corpus
    source <- EuropresseSource("myEuropresseArticles.html")
    corpus <- Corpus(source)

    # See how many articles were imported
    corpus

    # See the contents of the first article and its meta-data
    inspect(corpus[1])
    meta(corpus[[1]])
  

See link{EuropresseSource} for more details and real examples.

Author(s)

Milan Bouchet-Valat <nalimilan@club.fr>

References

http://www.europresse.com/


[Package tm.plugin.europresse version 1.4 Index]