R: R Client for the Microsoft Cognitive Services Web Language...

mscsweblm4r {mscsweblm4r}

R Documentation

R Client for the Microsoft Cognitive Services Web Language Model REST API

Description

mscsweblm4r is a client/wrapper/interface for the Microsoft Cognitive Services (MSCS) Web Language Model (Web LM) REST API. To use this package, you MUST have a valid account with https://www.microsoft.com/cognitive-services. Once you have an account, Microsoft will provide you with a (free) API key you can use with this package.

The MSCS Web LM REST API

Microsoft Cognitive Services – formerly known as Project Oxford – are a set of APIs, SDKs and services that developers can use to add AI features to their apps. Those features include emotion and video detection; facial, speech and vision recognition; and speech and language understanding.

The Web Language Model REST API provides tools for natural language processing and is documented at https://www.microsoft.com/cognitive-services/en-us/web-language-model-api/documentation. Per Microsoft's website, this API uses smoothed backoff N-gram language models (supporting Markov order up to 5) that were trained on four web-scale American English corpora collected by Bing (web page body, title, anchor and query).

The MSCS Web LM REST API supports the following lookup operations:

Insert spaces into a string of words adjoined together without any spaces (hashtags, URLs, etc.).
Calculate the joint probability that a sequence of words will appear together.
Compute the conditional probability that a specific word will follow an existing sequence of words.
Get the list of words (completions) most likely to follow a given sequence of words.
Retrieve the list of supported language models.

mscsweblm4r Functions

The following five mscsweblm4r core functions are used to wrap the MSCS Web LM REST API:

Word breaking - weblmBreakIntoWords function
Joint probability - weblmCalculateJointProbability function
Conditional probability - weblmCalculateConditionalProbability function
Sequence completions - weblmGenerateNextWords function
Models list - weblmListAvailableModels function

The weblmInit configuration function is used to set the REST API URL and the private API key. It needs to be called only once, after package load, or the core functions will not work properly.

Package Loading and Configuration

After loading the mscsweblm4r package with the library() function, you must call the weblmInit before you can call any of the core mscsweblm4r functions.

The weblmInit configuration function will first check to see if the variable MSCS_WEBLANGUAGEMODEL_CONFIG_FILE exists in the system environment. If it does, the package will use that as the path to the configuration file.

If MSCS_WEBLANGUAGEMODEL_CONFIG_FILE doesn't exist, it will look for the file .mscskeys.json in the current user's home directory (that's ~/.mscskeys.json on Linux, and something like C:/Users/Phil/Documents/.mscskeys.json on Windows). If the file is found, the package will load the API key and URL from it.

If using a file, please make sure it has the following structure:

{
  "weblanguagemodelurl": "https://api.projectoxford.ai/text/weblm/v1.0/",
  "weblanguagemodelkey": "...MSCS Web Language Model API key goes here..."
}

If no configuration file is found, weblmInit will attempt to pick up its configuration information from two Sys env variables instead:

MSCS_WEBLANGUAGEMODEL_URL - the URL for the Web LM REST API.

MSCS_WEBLANGUAGEMODEL_KEY - your personal Web LM REST API key.

S3 Object of the Class `weblm`

The five core functions of the mscsweblm4r package return S3 objects of the class weblm. Those objects expose formatted results, the REST API JSON response, and the HTTP request.

Error Handling

The MSCS Web LM API is a REST API. HTTP requests over a network and the Internet can fail because of congestion, because the web site is down for maintenance, because of firewall configuration issues, etc.

The API can also fail if you've exhausted your call volume quota or are exceeding the API calls rate limit. Unfortunately, MSCS does not expose an API you can query to check if you're about to exceed your quota for instance. The only way you'll know for sure is by looking at the error code returned after an API call has failed.

To help with error handling, we recommend the systematic use of tryCatch() when calling mscsweblm4r's core functions. Its mechanism may appear a bit daunting at first, but it is well documented at http://www.inside-r.org/r-doc/base/signalCondition. We use it in many of the code examples.

Author(s)

Phil Ferriere pferriere@hotmail.com

[Package mscsweblm4r version 0.1.2 Index]