modulationSpectrum {soundgen} | R Documentation |
Modulation spectrum
Description
Produces a modulation spectrum of waveform(s) or audio file(s), with temporal
modulation along the X axis (Hz) and spectral modulation (1/KHz) along the Y
axis. A good visual analogy is decomposing the spectrogram into a sum of
ripples of various frequencies and directions. Roughness is calculated as the
proportion of energy / amplitude of the modulation spectrum within
roughRange
of temporal modulation frequencies. The frequency of
amplitude modulation (amMsFreq, Hz) is calculated as the highest peak in the
smoothed AM function, and its purity (amMsPurity, dB) as the ratio of this
peak to the median AM over amRange
. For relatively short and steady
sounds, set amRes = NULL
and analyze the entire sound. For longer
sounds and when roughness or AM vary over time, set amRes
to get
multiple measurements over time (see examples).
Usage
modulationSpectrum(
x,
samplingRate = NULL,
scale = NULL,
from = NULL,
to = NULL,
amRes = 5,
maxDur = 5,
logSpec = FALSE,
windowLength = 15,
step = NULL,
overlap = 80,
wn = "hanning",
zp = 0,
power = 1,
normalize = TRUE,
roughRange = c(30, 150),
amRange = c(10, 200),
returnMS = TRUE,
returnComplex = FALSE,
summaryFun = c("mean", "median", "sd"),
averageMS = FALSE,
reportEvery = NULL,
cores = 1,
plot = TRUE,
savePlots = NULL,
logWarpX = NULL,
logWarpY = NULL,
quantiles = c(0.5, 0.8, 0.9),
kernelSize = 5,
kernelSD = 0.5,
colorTheme = c("bw", "seewave", "heat.colors", "...")[1],
col = NULL,
main = NULL,
xlab = "Hz",
ylab = "1/KHz",
xlim = NULL,
ylim = NULL,
width = 900,
height = 500,
units = "px",
res = NA,
...
)
Arguments
x |
path to a folder, one or more wav or mp3 files c('file1.wav', 'file2.mp3'), Wave object, numeric vector, or a list of Wave objects or numeric vectors |
samplingRate |
sampling rate of |
scale |
maximum possible amplitude of input used for normalization of
input vector (only needed if |
from , to |
if NULL (default), analyzes the whole sound, otherwise from...to (s) |
amRes |
target resolution of amplitude modulation, Hz. If |
maxDur |
sounds longer than |
logSpec |
if TRUE, the spectrogram is log-transformed prior to taking 2D FFT |
windowLength |
length of FFT window, ms |
step |
you can override |
overlap |
overlap between successive FFT frames, % |
wn |
window type accepted by |
zp |
window length after zero padding, points |
power |
raise modulation spectrum to this power (eg power = 2 for ^2, or "power spectrum") |
normalize |
if TRUE, the modulation spectrum of each analyzed fragment
|
roughRange |
the range of temporal modulation frequencies that constitute the "roughness" zone, Hz |
amRange |
the range of temporal modulation frequencies that we are interested in as "amplitude modulation" (AM), Hz |
returnMS |
if FALSE, only roughness is returned (much faster) |
returnComplex |
if TRUE, returns a complex modulation spectrum (without normalization and warping) |
summaryFun |
functions used to summarize each acoustic characteristic, eg "c('mean', 'sd')"; user-defined functions are fine (see examples); NAs are omitted automatically for mean/median/sd/min/max/range/sum, otherwise take care of NAs yourself |
averageMS |
if TRUE, the modulation spectra of all inputs are averaged into a single output; if FALSE, a separate MS is returned for each input |
reportEvery |
when processing multiple inputs, report estimated time left every ... iterations (NULL = default, NA = don't report) |
cores |
number of cores for parallel processing |
plot |
if TRUE, plots the modulation spectrum of each sound |
savePlots |
if a valid path is specified, a plot is saved in this folder (defaults to NA) |
logWarpX , logWarpY |
numeric vector of length 2: c(sigma, base) of pseudolog-warping the modulation spectrum, as in function pseudo_log_trans() from the "scales" package |
quantiles |
labeled contour values, % (e.g., "50" marks regions that contain 50% of the sum total of the entire modulation spectrum) |
kernelSize |
the size of Gaussian kernel used for smoothing (1 = no smoothing) |
kernelSD |
the SD of Gaussian kernel used for smoothing, relative to its size |
colorTheme |
black and white ('bw'), as in seewave package ('seewave'),
or any palette from |
col |
actual colors, eg rev(rainbow(100)) - see ?hcl.colors for colors in base R (overrides colorTheme) |
xlab , ylab , main , xlim , ylim |
graphical parameters |
width , height , units , res |
parameters passed to
|
... |
other graphical parameters passed on to |
Details
Algorithm: prepare a spectrogram, take its logarithm (if logSpec =
TRUE
), center, perform a 2D Fourier transform (see also
spectral::spec.fft()), take the upper half of the resulting symmetric matrix,
and raise it to power
. The result is returned as $original
. For
plotting purposes, the modulation matrix can be smoothed with Gaussian blur
(see gaussianSmooth2D
) and log-warped (if logWarp
is a
positive number). This processed modulation spectrum is returned as
$processed
. If the audio is long enough, multiple windows are
analyzed, resulting in a vector of roughness values. For multiple inputs,
such as a list of waveforms or path to a folder with audio files, the
ensemble of modulation spectra can be interpolated to the same spectral and
temporal resolution and averaged (if averageMS
).
Value
Returns a list with the following components:
-
$original
modulation spectrum prior to blurring and log-warping, but after squaring ifpower = TRUE
, a matrix of nonnegative values. Rownames are spectral modulation frequencies (cycles/KHz), and colnames are temporal modulation frequencies (Hz). -
$processed
modulation spectrum after blurring and log-warping -
$complex
untransformed complex modulation spectrum (returned only if returnComplex = TRUE) -
$roughness
proportion of energy / amplitude of the modulation spectrum withinroughRange
of temporal modulation frequencies, % - a vector if amRes is numeric and the sound is long enough, a single number otherwise -
$amMsFreq
frequency of the highest peak, withinamRange
, of the folded AM function (average AM across all FM bins for both negative and positive AM frequencies), where a peak is a local maximum overamRes
Hz. Likeroughness
,amMsFreq
andamMsPurity
can be single numbers or vectors, depending on whether the sound is analyzed as a whole or in chunks -
$amMsPurity
ratio of the peak at amMsFreq to the median AM overamRange
, dB -
$summary
dataframe with summaries of roughness, amMsFreq, and amMsPurity
References
Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. The Journal of the Acoustical Society of America, 114(6), 3394-3411.
See Also
Examples
# White noise
ms = modulationSpectrum(runif(16000), samplingRate = 16000,
logSpec = FALSE, power = TRUE,
amRes = NULL) # analyze the entire sound, giving a single roughness value
str(ms)
# Harmonic sound
s = soundgen(amFreq = 25, amDep = 50)
ms = modulationSpectrum(s, samplingRate = 16000, amRes = NULL)
ms[c('roughness', 'amMsFreq', 'amMsPurity')] # a single value for each
ms1 = modulationSpectrum(s, samplingRate = 16000, amRes = 5)
ms1[c('roughness', 'amMsFreq', 'amMsPurity')]
# measured over time (low values of amRes mean more precision, so we analyze
# longer segments and get fewer values per sound)
# Embellish
ms = modulationSpectrum(s, samplingRate = 16000,
xlab = 'Temporal modulation, Hz', ylab = 'Spectral modulation, 1/KHz',
colorTheme = 'heat.colors', main = 'Modulation spectrum', lty = 3)
## Not run:
# A long sound with varying AM and a bit of chaos at the end
s_long = soundgen(sylLen = 3500, pitch = c(250, 320, 280),
amFreq = c(30, 55), amDep = c(20, 60, 40),
jitterDep = c(0, 0, 2))
playme(s_long)
ms = modulationSpectrum(s_long, 16000)
# plot AM over time
plot(x = seq(1, 1500, length.out = length(ms$amMsFreq)), y = ms$amMsFreq,
cex = 10^(ms$amMsPurity/20) * 10, xlab = 'Time, ms', ylab = 'AM frequency, Hz')
# plot roughness over time
spectrogram(s_long, 16000, ylim = c(0, 4),
extraContour = list(ms$roughness / max(ms$roughness) * 4000, col = 'blue'))
# As with spectrograms, there is a tradeoff in time-frequency resolution
s = soundgen(pitch = 500, amFreq = 50, amDep = 100,
samplingRate = 44100, plot = TRUE)
# playme(s, samplingRate = 44100)
ms = modulationSpectrum(s, samplingRate = 44100,
windowLength = 50, step = 50, amRes = NULL) # poor temporal resolution
ms = modulationSpectrum(s, samplingRate = 44100,
windowLength = 5, step = 1, amRes = NULL) # poor frequency resolution
ms = modulationSpectrum(s, samplingRate = 44100,
windowLength = 15, step = 3, amRes = NULL) # a reasonable compromise
# customize the plot
ms = modulationSpectrum(s, samplingRate = 44100,
windowLength = 15, overlap = 80, amRes = NULL,
kernelSize = 17, # more smoothing
xlim = c(-70, 70), ylim = c(0, 4), # zoom in on the central region
quantiles = c(.25, .5, .8), # customize contour lines
col = rev(rainbow(100)), # alternative palette
power = 2) # ^2
# Note the peaks at FM = 2/KHz (from "pitch = 500") and AM = 50 Hz (from
# "amFreq = 50")
# Input can be a wav/mp3 file
ms = modulationSpectrum('~/Downloads/temp/16002_Faking_It_Large_clear.wav')
# Input can be path to folder with audio files. Each file is processed
# separately, and the output can contain an MS per file...
ms1 = modulationSpectrum('~/Downloads/temp', kernelSize = 11,
plot = FALSE, averageMS = FALSE)
ms1$summary
names(ms1$original) # a separate MS per file
# ...or a single MS can be calculated:
ms2 = modulationSpectrum('~/Downloads/temp', kernelSize = 11,
plot = FALSE, averageMS = TRUE)
plotMS(ms2$original)
ms2$summary
# Input can also be a list of waveforms (numeric vectors)
ss = vector('list', 10)
for (i in 1:length(ss)) {
ss[[i]] = soundgen(sylLen = runif(1, 100, 1000), temperature = .4,
pitch = runif(3, 400, 600))
}
# lapply(ss, playme)
# MS of the first sound
ms1 = modulationSpectrum(ss[[1]], samplingRate = 16000, scale = 1)
# average MS of all 10 sounds
ms2 = modulationSpectrum(ss, samplingRate = 16000, scale = 1, averageMS = TRUE, plot = FALSE)
plotMS(ms2$original)
# A sound with ~3 syllables per second and only downsweeps in F0 contour
s = soundgen(nSyl = 8, sylLen = 200, pauseLen = 100, pitch = c(300, 200))
# playme(s)
ms = modulationSpectrum(s, samplingRate = 16000, maxDur = .5,
xlim = c(-25, 25), colorTheme = 'seewave',
power = 2)
# note the asymmetry b/c of downsweeps
# "power = 2" returns squared modulation spectrum - note that this affects
# the roughness measure!
ms$roughness
# compare:
modulationSpectrum(s, samplingRate = 16000, maxDur = .5,
xlim = c(-25, 25), colorTheme = 'seewave',
power = 1)$roughness # much higher roughness
# Plotting with or without log-warping the modulation spectrum:
ms = modulationSpectrum(soundgen(), samplingRate = 16000, plot = TRUE)
ms = modulationSpectrum(soundgen(), samplingRate = 16000,
logWarpX = c(2, 2), plot = TRUE)
# logWarp and kernelSize have no effect on roughness
# because it is calculated before these transforms:
modulationSpectrum(s, samplingRate = 16000, logWarpX = c(1, 10))$roughness
modulationSpectrum(s, samplingRate = 16000, logWarpX = NA)$roughness
modulationSpectrum(s, samplingRate = 16000, kernelSize = 17)$roughness
# Log-transform the spectrogram prior to 2D FFT (affects roughness):
modulationSpectrum(s, samplingRate = 16000, logSpec = FALSE)$roughness
modulationSpectrum(s, samplingRate = 16000, logSpec = TRUE)$roughness
# Complex modulation spectrum with phase preserved
ms = modulationSpectrum(soundgen(), samplingRate = 16000,
returnComplex = TRUE)
plotMS(abs(ms$complex)) # note the symmetry
# compare:
plotMS(ms$original)
## End(Not run)