R: Language Identification using fastText

language_identification {fastText}

R Documentation

Language Identification using fastText

Description

Language Identification using fastText

Usage

language_identification(
  input_obj,
  pre_trained_language_model_path,
  k = 1,
  th = 0,
  threads = 1,
  verbose = FALSE
)

Arguments

`input_obj`	either a valid character string to a valid path where each line represents a different text extract or a vector of text extracts
`pre_trained_language_model_path`	a valid character string to the pre-trained language identification model path, for more info see https://fasttext.cc/docs/en/language-identification.html
`k`	predict top k labels (1 by default)
`th`	probability threshold (0.0 by default)
`threads`	an integer specifying the number of threads to run in parallel. This parameter applies only if k > 1
`verbose`	if TRUE then information will be printed out in the console

Value

an object of class data.table which includes two or more columns with the names 'iso_lang_N' and 'prob_N' where 'N' corresponds to 1 to 'k' input parameter

References

https://fasttext.cc/docs/en/language-identification.html https://becominghuman.ai/a-handy-pre-trained-model-for-language-identification-cadd89db9db8

Examples


library(fastText)

vec_txt = c("Incapaz de distinguir la luna y la cara de esta chica,
             Las estrellas se ponen nerviosas en el cielo",
             "Unable to tell apart the moon and this girl's face,
             Stars are flustered up in the sky.")

file_pretrained = system.file("language_identification/lid.176.ftz", package = "fastText")

dtbl_out = language_identification(input_obj = vec_txt,
                                   pre_trained_language_model_path = file_pretrained,
                                   k = 3,
                                   th = 0.0,
                                   verbose = TRUE)
dtbl_out

[Package fastText version 1.0.4 Index]