gl_speech {googleLanguageR} | R Documentation |
Call Google Speech API
Description
Turn audio into text
Usage
gl_speech(
audio_source,
encoding = c("LINEAR16", "FLAC", "MULAW", "AMR", "AMR_WB", "OGG_OPUS",
"SPEEX_WITH_HEADER_BYTE"),
sampleRateHertz = NULL,
languageCode = "en-US",
maxAlternatives = 1L,
profanityFilter = FALSE,
speechContexts = NULL,
asynch = FALSE,
customConfig = NULL
)
Arguments
audio_source |
File location of audio data, or Google Cloud Storage URI |
encoding |
Encoding of audio data sent |
sampleRateHertz |
Sample rate in Hertz of audio data. Valid values |
languageCode |
Language of the supplied audio as a |
maxAlternatives |
Maximum number of recognition hypotheses to be returned. |
profanityFilter |
If |
speechContexts |
An optional character vector of context to assist the speech recognition |
asynch |
If your |
customConfig |
[optional] A |
Details
Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone, enable command-and-control through voice, or transcribe audio files, among many other use cases. Recognize audio uploaded in the request, and integrate with your audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products.
Value
A list of two tibbles: $transcript
, a tibble of the transcript
with a confidence
; $timings
, a tibble that contains startTime
, endTime
per word
. If maxAlternatives is greater than 1, then the transcript will return near-duplicate rows with other interpretations of the text.
If asynch
is TRUE, then an operation you will need to pass to gl_speech_op to get the finished result.
AudioEncoding
Audio encoding of the data sent in the audio message. All encodings support only 1 channel (mono) audio. Only FLAC and WAV include a header that describes the bytes of audio that follow the header. The other encodings are raw audio bytes with no header. For best results, the audio source should be captured and transmitted using a lossless encoding (FLAC or LINEAR16). Recognition accuracy may be reduced if lossy codecs, which include the other codecs listed in this section, are used to capture or transmit the audio, particularly if background noise is present.
Read more on audio encodings here https://cloud.google.com/speech/docs/encoding
WordInfo
startTime
- Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word.
endTime
- Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word.
word
- The word corresponding to this set of information.
See Also
https://cloud.google.com/speech/reference/rest/v1/speech/recognize
Examples
## Not run:
test_audio <- system.file("woman1_wb.wav", package = "googleLanguageR")
result <- gl_speech(test_audio)
result$transcript
result$timings
result2 <- gl_speech(test_audio, maxAlternatives = 2L)
result2$transcript
result_brit <- gl_speech(test_audio, languageCode = "en-GB")
## make an asynchronous API request (mandatory for sound files over 60 seconds)
asynch <- gl_speech(test_audio, asynch = TRUE)
## Send to gl_speech_op() for status or finished result
gl_speech_op(asynch)
## Upload to GCS bucket for long files > 60 seconds
test_gcs <- "gs://mark-edmondson-public-files/googleLanguageR/a-dream-mono.wav"
gcs <- gl_speech(test_gcs, sampleRateHertz = 44100L, asynch = TRUE)
gl_speech_op(gcs)
## Use a custom configuration
my_config <- list(encoding = "LINEAR16",
diarizationConfig = list(
enableSpeakerDiarization = TRUE,
minSpeakerCount = 2,
maxSpeakCount = 3
))
# languageCode is required, so will be added if not in your custom config
gl_speech(my_audio, languageCode = "en-US", customConfig = my_config)
## End(Not run)