R: Detect Language

identify_language {labourR}

R Documentation

Detect Language

Description

This function performs language detection by using Compact Language Detector 2 from CRAN library cld2. It is vectorised and guesses the language of each string. Note that it is not designed to do well on very short text, lists of proper names, part numbers, etc. CLD2 has the highest F1 score and is an order of magnitude faster than CLD3.

Usage

identify_language(text)

Arguments

text

A string with text to classify or a connection to read from.

cld2: Probabilistically (Naïve Bayesian classifier) detects over 80 languages in plain text.

Value

A character vector with ISO-639-1 two-letter language codes.

Examples

txt <- c("English is a West Germanic language ", "In espaniol, le lingua castilian")
identify_language(txt)

[Package labourR version 1.0.0 Index]