R: Transplant formants

transplantFormants {soundgen}

R Documentation

Transplant formants

Description

Takes the general spectral envelope of one sound (donor) and "transplants" it onto another sound (recipient). For biological sounds like speech or animal vocalizations, this has the effect of replacing the formants in the recipient sound while preserving the original intonation and (to some extent) voice quality. Note that the amount of spectral smoothing (specified with freqWindow or blur) is a crucial parameter: too little smoothing, and noise between harmonics will be amplified, creasing artifacts; too much, and formants may be missed. The default is to set freqWindow to the estimated median pitch, but this is time-consuming and error-prone, so set it to a reasonable value manually if possible. Also ensure that both sounds have the same sampling rate.

Usage

transplantFormants(
  donor,
  recipient,
  samplingRate = NULL,
  freqWindow = NULL,
  blur = NULL,
  dynamicRange = 80,
  windowLength = 50,
  step = NULL,
  overlap = 90,
  wn = "gaussian",
  zp = 0
)

Arguments

`donor`	the sound that provides the formants (vector, Wave, or file) or the desired spectral filter (matrix) as returned by `getSpectralEnvelope`
`recipient`	the sound that receives the formants (vector, Wave, or file)
`samplingRate`	sampling rate of `x` (only needed if `x` is a numeric vector)
`freqWindow`	the width of smoothing window used to flatten the recipient's spectrum per frame. Defaults to median pitch of the donor (or of the recipient if donor is a filter matrix). If `blur` is NULL, `freqWindow` also controls the amount of smoothing applied to the donor's spectrogram
`blur`	the amount of Gaussian blur applied to the donor's spectrogram as a faster and more flexible alternative to smoothing it per bin with `freqWindow`. Provide two numbers: frequency (Hz, normally approximately equal to freqWindow), time (ms) (NA / NULL / 0 means no blurring in that dimension). See examples and `spectrogram`
`dynamicRange`	dynamic range, dB. All values more than one dynamicRange under maximum are treated as zero
`windowLength`	length of FFT window, ms
`step`	you can override `overlap` by specifying FFT step, ms (NB: because digital audio is sampled at discrete time intervals of 1/samplingRate, the actual step and thus the time stamps of STFT frames may be slightly different, eg 24.98866 instead of 25.0 ms)
`overlap`	overlap between successive FFT frames, %
`wn`	window type accepted by `ftwindow`, currently gaussian, hanning, hamming, bartlett, rectangular, blackman, flattop
`zp`	window length after zero padding, points

Details

Algorithm: makes spectrograms of both sounds, interpolates and smooths or blurs the donor spectrogram, flattens the recipient spectrogram, multiplies the spectrograms, and transforms back into time domain with inverse STFT.

Examples

## Not run: 
# Objective: take formants from the bleating of a sheep and apply them to a
# synthetic sound with any arbitrary duration, intonation, nonlinearities etc
data(sheep, package = 'seewave')  # import a recording from seewave
playme(sheep)
spectrogram(sheep, osc = TRUE)

recipient = soundgen(
  sylLen = 1200,
  pitch = c(100, 300, 250, 200),
  vibratoFreq = 9, vibratoDep = 1,
  addSilence = 180,
  samplingRate = sheep@samp.rate,  # same as donor
  invalidArgAction = 'ignore')  # force to keep the low samplingRate
playme(recipient, sheep@samp.rate)
spectrogram(recipient, sheep@samp.rate, osc = TRUE)

s1 = transplantFormants(
  donor = sheep,
  recipient = recipient,
  samplingRate = sheep@samp.rate)
playme(s1, sheep@samp.rate)
spectrogram(s1, sheep@samp.rate, osc = TRUE)

# The spectral envelope of s1 will be similar to sheep's on a frequency scale
# determined by freqWindow. Compare the spectra:
par(mfrow = c(1, 2))
seewave::meanspec(sheep, dB = 'max0', alim = c(-50, 20), main = 'Donor')
seewave::meanspec(s1, f = sheep@samp.rate, dB = 'max0',
                  alim = c(-50, 20), main = 'Processed recipient')
par(mfrow = c(1, 1))

# if needed, transplant amplitude envelopes as well:
s2 = transplantEnv(donor = sheep, samplingRateD = sheep@samp.rate,
                   recipient = s1, windowLength = 10)
playme(s2, sheep@samp.rate)
spectrogram(s2, sheep@samp.rate, osc = TRUE)

# using "blur" to apply Gaussian blur to the donor's spectrogram instead of
# smoothing per frame with "freqWindow" (~2.5 times faster)
spectrogram(sheep, blur = c(150, 0))  # preview to select the amount of blur
s1b = transplantFormants(
  donor = sheep,
  recipient = recipient,
  samplingRate = sheep@samp.rate,
  freqWindow = 150,
  blur = c(150, 0))
  # blur: 150 = SD of 150 Hz along the frequency axis,
  #      0 = no smoothing along the time axis
playme(s1b, sheep@samp.rate)
spectrogram(s1b, sheep@samp.rate, osc = TRUE)

# Now we use human formants on sheep source: the sheep asks "why?"
s3 = transplantFormants(
  donor = getSpectralEnvelope(
            nr = 512, nc = 100,  # fairly arbitrary dimensions
            formants = 'uaaai',
            samplingRate = sheep@samp.rate),
  recipient = sheep,
  samplingRate = sheep@samp.rate)
playme(s3, sheep@samp.rate)
spectrogram(s3, sheep@samp.rate, osc = TRUE)

## End(Not run)