textaDetectLanguages {mscstexta4r} | R Documentation |
Detects the languages used in documents.
Description
This function returns the language detected in a sentence or documents along with a confidence score between 0 and 1. A scores equal to 1 indicates 100
Internally, this function invokes the Microsoft Cognitive Services Text Analytics REST API documented at https://www.microsoft.com/cognitive-services/en-us/text-analytics/documentation.
You MUST have a valid Microsoft Cognitive Services account and an API key for this function to work properly. See https://www.microsoft.com/cognitive-services/en-us/pricing for details.
Usage
textaDetectLanguages(documents, numberOfLanguagesToDetect = 1L)
Arguments
documents |
(character vector) Vector of sentences or documents on which to perform language detection. |
numberOfLanguagesToDetect |
(integer) Number of languages to detect. Set to 1 by default. Use a higher value if individual documents contain a mix of languages. |
Value
An S3 object of the class texta
. The results are stored
in the results
dataframe inside this object. The dataframe contains
the original sentences or documents, the name of the detected language, the
ISO 639-1 code of the detected language, and a confidence score. If an error
occurred during processing, the dataframe will also have an error
column that describes the error.
Author(s)
Phil Ferriere pferriere@hotmail.com
Examples
## Not run:
docsText <- c(
"The Louvre or the Louvre Museum is the world's largest museum.",
"Le musee du Louvre est un musee d'art et d'antiquites situe au centre de Paris.",
"El Museo del Louvre es el museo nacional de Francia.",
"Il Museo del Louvre a Parigi, in Francia, e uno dei piu celebri musei del mondo.",
"Der Louvre ist ein Museum in Paris."
)
tryCatch({
# Detect languages used in documents
docsLanguage <- textaDetectLanguages(
documents = docsText, # Input sentences or documents
numberOfLanguagesToDetect = 1L # Number of languages to detect
)
# Class and structure of docsLanguage
class(docsLanguage)
#> [1] "texta"
str(docsLanguage, max.level = 1)
#> List of 3
#> $ results:'data.frame': 5 obs. of 4 variables:
#> $ json : chr "{\"documents\":[{\"id\":\"B6e4C\",\"detectedLanguages\": __truncated__ }]}
#> $ request:List of 7
#> ..- attr(*, "class")= chr "request"
#> - attr(*, "class")= chr "texta"
# Print results
docsLanguage
#> texta [https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/lan __truncated__ ]
#>
#> -----------------------------------------------------------
#> text name iso6391Name score
#> ----------------------------- ------- ------------- -------
#> The Louvre or the Louvre English en 1
#> Museum is the world's largest
#> museum.
#>
#> Le musee du Louvre est un French fr 1
#> musee d'art et d'antiquites
#> situe au centre de Paris.
#>
#> El Museo del Louvre es el Spanish es 1
#> museo nacional de Francia.
#>
#> Il Museo del Louvre a Parigi, Italian it 1
#> in Francia, e uno dei piu
#> celebri musei del mondo.
#>
#> Der Louvre ist ein Museum in German de 1
#> Paris.
#> -----------------------------------------------------------
}, error = function(err) {
# Print error
geterrmessage()
})
## End(Not run)