transform_mel_spectrogram {torchaudio} | R Documentation |
Mel Spectrogram
Description
Create MelSpectrogram for a raw audio signal. This is a composition of Spectrogram and MelScale.
Usage
transform_mel_spectrogram(
sample_rate = 16000,
n_fft = 400,
win_length = NULL,
hop_length = NULL,
f_min = 0,
f_max = NULL,
pad = 0,
n_mels = 128,
window_fn = torch::torch_hann_window,
power = 2,
normalized = FALSE,
...
)
Arguments
sample_rate |
(int, optional): Sample rate of audio signal. (Default: |
n_fft |
(int, optional): Size of FFT, creates |
win_length |
(int or NULL, optional): Window size. (Default: |
hop_length |
(int or NULL, optional): Length of hop between STFT windows. (Default: |
f_min |
(float, optional): Minimum frequency. (Default: |
f_max |
(float or NULL, optional): Maximum frequency. (Default: |
pad |
(int, optional): Two sided padding of signal. (Default: |
n_mels |
(int, optional): Number of mel filterbanks. (Default: |
window_fn |
(function, optional): A function to create a window tensor
that is applied/multiplied to each frame/window. (Default: |
power |
(float, optional): Power of the norm. (Default: to |
normalized |
(logical): Whether to normalize by magnitude after stft (Default: |
... |
(optional): Arguments for window function. |
Details
forward param: waveform (Tensor): Tensor of audio of dimension (..., time).
Value
tensor
: Mel frequency spectrogram of size (..., n_mels
, time).
Sources
-
https://timsainb.github.io/spectrograms-mfccs-and-inversion-in-python.html
-
https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
Examples
#' Example
## Not run:
if(torch::torch_is_installed()) {
mp3_path <- system.file("sample_audio_1.mp3", package = "torchaudio")
sample_mp3 <- transform_to_tensor(tuneR_loader(mp3_path))
# (channel, n_mels, time)
mel_specgram <- transform_mel_spectrogram(sample_rate = sample_mp3[[2]])(sample_mp3[[1]])
}
## End(Not run)