mscstexta4r {mscstexta4r} | R Documentation |
R Client for the Microsoft Cognitive Services Text Analytics REST API
Description
mscstexta4r is a client/wrapper/interface for the Microsoft Cognitive Services (MSCS) Text Analytics (Text Analytics) REST API. To use this package, you MUST have a valid account with https://www.microsoft.com/cognitive-services. Once you have an account, Microsoft will provide you with a (free) API key you can use with this package.
The MSCS Text Analytics REST API
Microsoft Cognitive Services – formerly known as Project Oxford – are a set of APIs, SDKs and services that developers can use to add AI features to their apps. Those features include emotion and video detection; facial, speech and vision recognition; as well as speech and NLP.
The Text Analytics REST API provides tools for NLP and is documented at https://www.microsoft.com/cognitive-services/en-us/text-analytics/documentation. This API supports the following operations:
Sentiment analysis - Is a sentence or document generally positive or negative?
Topic detection - What's being discussed across a list of documents/reviews/articles?
Language detection - What language is a document written in?
Key talking points extraction - What's being discussed in a single document?
mscstexta4r Functions
The following mscstexta4r core functions are used to wrap the MSCS Text Analytics REST API:
Sentiment analysis -
textaSentiment
functionTopic detection -
textaDetectTopics
andtextaDetectTopicsStatus
functionsLanguage detection -
textaDetectLanguages
functionExtraction of key talking points -
textaKeyPhrases
function
The textaInit
configuration function is used to set the REST
API URL and the private API key. It needs to be called only once,
after package load, or the core functions will not work properly.
Prerequisites
To use the mscstexta4r R package, you MUST have a valid account with Microsoft Cognitive Services (see https://www.microsoft.com/cognitive-services/en-us/pricing for details). Once you have an account, Microsoft will provide you with an API key listed under your subscriptions. After you've configured mscstexta4r with your API key (as explained in the next section), you will be able to call the Text Analytics REST API from R, up to your maximum number of transactions per month and per minute.
Package Loading and Configuration
After loading the mscstexta4r package with the library()
function,
you must call the textaInit
before you can call any of
the core mscstexta4r functions.
The textaInit
configuration function will first check to see
if the variable MSCS_TEXTANALYTICS_CONFIG_FILE
exists in the system
environment. If it does, the package will use that as the path to the
configuration file.
If MSCS_TEXTANALYTICS_CONFIG_FILE
doesn't exist, it will look for
the file .mscskeys.json
in the current user's home directory (that's
~/.mscskeys.json
on Linux, and something like C:/Users/Phil/Documents/.mscskeys.json
on Windows). If the file is found, the package will load the API key and URL
from it.
If using a file, please make sure it has the following structure:
{ "textanalyticsurl": "https://westus.api.cognitive.microsoft.com/texta/analytics/v2.0/", "textanalyticskey": "...MSCS Text Analytics API key goes here..." }
If no configuration file is found, textaInit
will attempt to
pick up its configuration information from two Sys env variables instead:
MSCS_TEXTANALYTICS_URL
- the URL for the Text Analytics REST API.
MSCS_TEXTANALYTICS_KEY
- your personal Text Analytics REST API key.
Synchronous vs Asynchronous Execution
All but ONE core text analytics functions execute exclusively in
synchronous mode: textaDetectTopics
is the only function that
can be executed either synchronously or asynchronously. Why? Because topic
detection is typically a "batch" operation meant to be performed on thousands
of related documents (product reviews, research articles, etc.).
What's the difference?
When textaDetectTopics
executes synchronously, you must wait
for it to finish before you can move on to the next task. When
textaDetectTopics
executes asynchronously, you can move on to
something else before topic detection has completed. In the latter case, you
will need to call textaDetectTopicsStatus
periodically yourself
until the Microsoft Cognitive Services server complete topic detection and
results become available.
When to run which mode?
If you're performing topic detection in batch mode (from an R script), we
recommend using the textaDetectTopics
function in synchronous
mode, in which case it will return only after topic detection has completed.
IMPORTANT NOTE: If you're calling textaDetectTopics
in
synchronous mode within the R console REPL (interactive mode), it will
appear as if the console has hanged. This is EXPECTED. The function
hasn't crashed. It is simply in "sleep mode", activating itself periodically
and then going back to sleep, until the results have become available. In
sleep mode, even though it appears "stuck", textaDetectTopics
doesn't use any CPU resources. While the function is operating in sleep
mode, you WILL NOT be able to use the console before the function
completes. If you need to operate the console while topic detection is being
performed by the Microsoft Cognitive services servers, you should call
textaDetectTopics
in asynchronous mode and then call
textaDetectTopicsStatus
yourself repeteadly afterwards, until
results are available.
S3 Objects of the Classes texta
and textatopics
The sentiment analysis, language detection, and key talking points extraction
functions of the mscstexta4r package return S3 objects of the class
texta
. The texta
object exposes results collected
in a single dataframe, the REST API JSON response, and the original HTTP
request.
The functions textaDetectTopics
returns a S3 object of the
class textatopics
. The textatopics
object exposes
formatted results using several dataframes (documents and their IDs, topics
and their IDs, which topics are assigned to which documents), the REST API
JSON response (should you care), and the HTTP request (mostly for debugging
purposes).'
Error Handling
The MSCS Text Analytics API is a REST API. HTTP requests over a network and the Internet can fail. Because of congestion, because the web site is down for maintenance, because of firewall configuration issues, etc. There are many possible points of failure.
The API can also fail if you've exhausted your call volume quota or are exceeding the API calls rate limit. Unfortunately, MSCS does not expose an API you can query to check if you're about to exceed your quota for instance. The only way you'll know for sure is by looking at the error code returned after an API call has failed.
To help with error handling, we recommend the systematic use of
tryCatch()
when calling mscstexta4r's core functions. Its
mechanism may appear a bit daunting at first, but it is well documented at http://www.inside-r.org/r-doc/base/signalCondition.
We use it in many of the code examples.
Author(s)
Phil Ferriere pferriere@hotmail.com