model_wavernn {torchaudio} | R Documentation |
WaveRNN
Description
WaveRNN model based on the implementation from fatchord. The original implementation was introduced in "Efficient Neural Audio Synthesis". #' Pass the input through the WaveRNN model.
Usage
model_wavernn(
upsample_scales,
n_classes,
hop_length,
n_res_block = 10,
n_rnn = 512,
n_fc = 512,
kernel_size = 5,
n_freq = 128,
n_hidden = 128,
n_output = 128
)
Arguments
upsample_scales |
the list of upsample scales. |
n_classes |
the number of output classes. |
hop_length |
the number of samples between the starts of consecutive frames. |
n_res_block |
the number of ResBlock in stack. (Default: |
n_rnn |
the dimension of RNN layer. (Default: |
n_fc |
the dimension of fully connected layer. (Default: |
kernel_size |
the number of kernel size in the first Conv1d layer. (Default: |
n_freq |
the number of bins in a spectrogram. (Default: |
the number of hidden dimensions of resblock. (Default: | |
n_output |
the number of output dimensions of melresnet. (Default: |
Details
forward param:
waveform the input waveform to the WaveRNN layer (n_batch, 1, (n_time - kernel_size + 1) * hop_length)
specgram the input spectrogram to the WaveRNN layer (n_batch, 1, n_freq, n_time)
The input channels of waveform and spectrogram have to be 1. The product of
upsample_scales
must equal hop_length
.
Value
Tensor shape: (n_batch, 1, (n_time - kernel_size + 1) * hop_length, n_classes)
Examples
if(torch::torch_is_installed()) {
wavernn <- model_wavernn(upsample_scales=c(2,2,3), n_classes=5, hop_length=12)
waveform <- torch::torch_rand(3,1,(10 - 5 + 1)*12)
spectrogram <- torch::torch_rand(3,1,128,10)
# waveform shape: (n_batch, n_channel, (n_time - kernel_size + 1) * hop_length)
output <- wavernn(waveform, spectrogram)
}