CNAE2 {deepMOU}R Documentation

CNAE dataset on classes 4 and 9

Description

The data set CNAE2 is a subset of the original CNAE-9 data, that comprises 1080 documents categorized into 9 topics of free text business descriptions of Brazilian companies.

Specifically, CNAE2 contains only the documents belonging to topics "4" and "9". The data set is already pre-processed and provides the bag-of-words representation of the documents; the columns with null counts are removed leading to a matrix with 240 documents on a vocabulary with cardinality 357. This data set is highly sparse (98

Class labels are stored in cl_CNAE

Usage

data(CNAE2)

Format

A matrix for the bag-of-words representation of the CNAE2 dataset.

Source

Original CNAE9 dataset

Examples

x = data(CNAE2)
print(head(x))

[Package deepMOU version 0.1.1 Index]