weblmCalculateJointProbability {mscsweblm4r}R Documentation

Calculates the joint probability that a sequence of words will appear together.

Description

This function calculates the joint probability that a particular sequence of words will appear together. The input string must be in ASCII format.

Internally, this function invokes the Microsoft Cognitive Services Web Language Model REST API documented at https://www.microsoft.com/cognitive-services/en-us/web-language-model-api/documentation.

You MUST have a valid Microsoft Cognitive Services account and an API key for this function to work properly. See https://www.microsoft.com/cognitive-services/en-us/pricing for details.

Usage

weblmCalculateJointProbability(inputWords, modelToUse = "body",
  orderOfNgram = 5L)

Arguments

inputWords

(character vector) Vector of character strings for which to calculate the joint probability. Must be in ASCII format.

modelToUse

(character) Which language model to use, supported values: "title", "anchor", "query", or "body" (optional, default: "body")

orderOfNgram

(integer) Which order of N-gram to use, supported values: 1L, 2L, 3L, 4L, or 5L (optional, default: 5L)

Value

An S3 object of the class weblm. The results are stored in the results dataframe inside this object. The dataframe contains the word sequences and their log(probability).

Author(s)

Phil Ferriere pferriere@hotmail.com

Examples

## Not run: 
 tryCatch({

   # Calculate joint probability a particular sequence of words will appear together
   jointProbabilities <- weblmCalculateJointProbability(
     inputWords = c("where", "is", "San", "Francisco", "where is",
                    "San Francisco", "where is San Francisco"),  # ASCII only
     modelToUse = "query",                     # "title"|"anchor"|"query"(default)|"body"
     orderOfNgram = 4L                         # 1L|2L|3L|4L|5L(default)
   )

   # Class and structure of jointProbabilities
   class(jointProbabilities)
   #> [1] "weblm"

   str(jointProbabilities, max.level = 1)
   #> List of 3
   #>  $ results:'data.frame':  7 obs. of  2 variables:
   #>  $ json   : chr "{"results":[{"words":"where","probability":-3.378}, __truncated__ ]}
   #>  $ request:List of 7
   #>   ..- attr(*, "class")= chr "request"
   #>  - attr(*, "class")= chr "weblm"

   # Print results
   pandoc.table(jointProbabilities$results)
   #> ------------------------------------
   #>         words           probability
   #> ---------------------- -------------
   #>         where             -3.378
   #>
   #>           is              -2.607
   #>
   #>          san              -3.292
   #>
   #>       francisco           -4.051
   #>
   #>        where is           -3.961
   #>
   #>     san francisco         -4.086
   #>
   #> where is san francisco    -7.998
   #> ------------------------------------

 }, error = function(err) {

   # Print error
   geterrmessage()

 })

## End(Not run)

[Package mscsweblm4r version 0.1.2 Index]