R: Extract and Count Specific Parts of Speech

f7 {MadanText}

R Documentation

Extract and Count Specific Parts of Speech

Description

This function extracts tokens of a specified part of speech (POS) from the given data frame and counts their frequency.

Usage

f7(UPIP, type)

Arguments

`UPIP`	A data frame with columns 'upos' (POS tags) and 'lemma' (lemmatized tokens).
`type`	A string representing the POS to filter (e.g., 'NOUN', 'VERB').

Value

Returns a data frame where each row corresponds to a unique lemma of the specified POS type. The data frame has two columns: 'key', which contains the lemma, and 'freq', which contains the frequency count of that lemma in the data. The rows are ordered in decreasing frequency of occurrence. This format is useful for quickly identifying the most common terms of a particular POS in the data.

Examples

data <- data.frame(upos = c('NOUN', 'VERB'), lemma = c('house', 'run'))
noun_freq <- f7(data, 'NOUN')

[Package MadanText version 0.1.0 Index]