CNAE2 {deepMOU} | R Documentation |
CNAE dataset on classes 4 and 9
Description
The data set CNAE2
is a subset of the original CNAE-9 data, that
comprises 1080 documents categorized into 9 topics of free text business
descriptions of Brazilian companies.
Specifically, CNAE2
contains only the documents belonging to topics "4" and "9".
The data set is already pre-processed and provides the bag-of-words representation of
the documents; the columns with null counts are removed leading to a matrix with 240 documents
on a vocabulary with cardinality 357. This data set is highly sparse
(98
Class labels are stored in cl_CNAE
Usage
data(CNAE2)
Format
A matrix for the bag-of-words representation of the CNAE2 dataset.
Source
Examples
x = data(CNAE2)
print(head(x))
[Package deepMOU version 0.1.1 Index]