R: Call Google Speech API

gl_speech {googleLanguageR}

R Documentation

Call Google Speech API

Description

Turn audio into text

Usage

gl_speech(
  audio_source,
  encoding = c("LINEAR16", "FLAC", "MULAW", "AMR", "AMR_WB", "OGG_OPUS",
    "SPEEX_WITH_HEADER_BYTE"),
  sampleRateHertz = NULL,
  languageCode = "en-US",
  maxAlternatives = 1L,
  profanityFilter = FALSE,
  speechContexts = NULL,
  asynch = FALSE,
  customConfig = NULL
)

Arguments

`audio_source`	File location of audio data, or Google Cloud Storage URI
`encoding`	Encoding of audio data sent
`sampleRateHertz`	Sample rate in Hertz of audio data. Valid values `8000-48000`. Optimal and default if left `NULL` is `16000`
`languageCode`	Language of the supplied audio as a `BCP-47` language tag
`maxAlternatives`	Maximum number of recognition hypotheses to be returned. `0-30`
`profanityFilter`	If `TRUE` will attempt to filter out profanities
`speechContexts`	An optional character vector of context to assist the speech recognition
`asynch`	If your `audio_source` is greater than 60 seconds, set this to TRUE to return an asynchronous call
`customConfig`	[optional] A `RecognitionConfig` object that will be converted from a list to JSON via `toJSON` - see RecognitionConfig documentation. The `languageCode` will be taken from this functions arguments if not present since it is required.

Details

Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone, enable command-and-control through voice, or transcribe audio files, among many other use cases. Recognize audio uploaded in the request, and integrate with your audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products.

Value

A list of two tibbles: $transcript, a tibble of the transcript with a confidence; $timings, a tibble that contains startTime, endTime per word. If maxAlternatives is greater than 1, then the transcript will return near-duplicate rows with other interpretations of the text. If asynch is TRUE, then an operation you will need to pass to gl_speech_op to get the finished result.

AudioEncoding

Audio encoding of the data sent in the audio message. All encodings support only 1 channel (mono) audio. Only FLAC and WAV include a header that describes the bytes of audio that follow the header. The other encodings are raw audio bytes with no header. For best results, the audio source should be captured and transmitted using a lossless encoding (FLAC or LINEAR16). Recognition accuracy may be reduced if lossy codecs, which include the other codecs listed in this section, are used to capture or transmit the audio, particularly if background noise is present.

Read more on audio encodings here https://cloud.google.com/speech/docs/encoding

WordInfo

startTime - Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word.

endTime - Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word.

word - The word corresponding to this set of information.

Examples


## Not run: 

test_audio <- system.file("woman1_wb.wav", package = "googleLanguageR")
result <- gl_speech(test_audio)

result$transcript
result$timings

result2 <- gl_speech(test_audio, maxAlternatives = 2L)
result2$transcript

result_brit <- gl_speech(test_audio, languageCode = "en-GB")


## make an asynchronous API request (mandatory for sound files over 60 seconds)
asynch <- gl_speech(test_audio, asynch = TRUE)

## Send to gl_speech_op() for status or finished result
gl_speech_op(asynch)

## Upload to GCS bucket for long files > 60 seconds
test_gcs <- "gs://mark-edmondson-public-files/googleLanguageR/a-dream-mono.wav"
gcs <- gl_speech(test_gcs, sampleRateHertz = 44100L, asynch = TRUE)
gl_speech_op(gcs)

## Use a custom configuration
my_config <- list(encoding = "LINEAR16",
                  diarizationConfig = list(
                    enableSpeakerDiarization = TRUE,
                    minSpeakerCount = 2,
                    maxSpeakCount = 3
                    ))

# languageCode is required, so will be added if not in your custom config
gl_speech(my_audio, languageCode = "en-US", customConfig = my_config)


## End(Not run)

[Package googleLanguageR version 0.3.0 Index]