R: MFCC Calculation

melfcc {tuneR}

R Documentation

MFCC Calculation

Description

Calculate Mel-frequency cepstral coefficients.

Usage

melfcc(samples, sr = samples@samp.rate, wintime = 0.025, 
    hoptime = 0.01, numcep = 12, lifterexp = 0.6, htklifter = FALSE,
    sumpower = TRUE, preemph = 0.97, dither = FALSE,
    minfreq = 0, maxfreq = sr/2, nbands = 40, bwidth = 1, 
    dcttype = c("t2", "t1", "t3", "t4"), 
    fbtype = c("mel", "htkmel", "fcmel", "bark"), usecmp = FALSE, 
    modelorder = NULL, spec_out = FALSE, frames_in_rows = TRUE)

Arguments

`samples`	Object of Wave-class or WaveMC-class. Only the first channel will be used.
`sr`	Sampling rate of the signal.
`wintime`	Window length in sec.
`hoptime`	Step between successive windows in sec.
`numcep`	Number of cepstra to return.
`lifterexp`	Exponent for liftering; 0 = none.
`htklifter`	Use HTK sin lifter.
`sumpower`	If `sumpower = TRUE` the frequency scale transformation is based on the powerspectrum, if `sumpower = FALSE` it is based on its squareroot (absolute value of the spectrum) and squared afterwards.
`preemph`	Apply pre-emphasis filter [1 -preemph] (0 = none).
`dither`	Add offset to spectrum as if dither noise.
`minfreq`	Lowest band edge of mel filters (Hz).
`maxfreq`	Highest band edge of mel filters (Hz).
`nbands`	Number of warped spectral bands to use.
`bwidth`	Width of spectral bands in Bark/Mel.
`dcttype`	Type of DCT used - 1 or 2 (or 3 for HTK or 4 for feacalc).
`fbtype`	Auditory frequency scale to use: `"mel"`, `"bark"`, `"htkmel"`, `"fcmel"`.
`usecmp`	Apply equal-loudness weighting and cube-root compression (PLP instead of LPC).
`modelorder`	If `modelorder > 0`, fit a linear prediction (autoregressive-) model of this order and calculation of cepstra out of `lpcas`.
`spec_out`	Should matrices of the power- and the auditory-spectrum be returned.
`frames_in_rows`	Return time frames in rows instead of columns (original Matlab code).

Details

Calculation of the MFCCs imlcudes the following steps:

Preemphasis filtering
Take the absolute value of the STFT (usage of Hamming window)
Warp to auditory frequency scale (Mel/Bark)
Take the DCT of the log-auditory-spectrum
Return the first ‘ncep’ components

Value

`cepstra`	Cepstral coefficients of the input signal (one time frame per row/column)
`aspectrum`	Auditory spectrum (spectrum after transformation to Mel/Bark scale) of the signal
`pspectrum`	Power spectrum of the input signal.
`lpcas`	If `modelorder > 0`, the linear prediction coefficients (LPC/PLP).

Note

The following non-default values nearly duplicate Malcolm Slaney's mfcc (i.e.

melfcc(d, 16000, wintime=0.016, lifterexp=0, minfreq=133.33, 
       maxfreq=6855.6, sumpower=FALSE)

=~= log(10) * 2 * mfcc(d, 16000) in the Auditory toolbox for Matlab).

The following non-default values nearly duplicate HTK's MFCC (i.e.

melfcc(d, 16000, lifterexp=22, htklifter=TRUE, nbands=20, maxfreq=8000, 
    sumpower=FALSE, fbtype="htkmel", dcttype="t3")

=~= 2 * htkmelfcc(:,[13,[1:12]]) where HTK config has ‘PREEMCOEF = 0.97’, ‘NUMCHANS = 20’, ‘CEPLIFTER = 22’, ‘NUMCEPS = 12’, ‘WINDOWSIZE = 250000.0’, ‘USEHAMMING = T’, ‘TARGETKIND = MFCC_0’).

For more detail on reproducing other programs' outputs, see https://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/mfccs.html

Author(s)

Sebastian Krey krey@statistik.tu-dortmund.de

References

Daniel P. W. Ellis: https://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/

Examples

  testsound <- normalize(sine(400) + sine(1000) + square(250), "16")
  m1 <- melfcc(testsound)

  #Use PLP features to calculate cepstra and output the matrices like the
  #original Matlab code (note: modelorder limits the number of cepstra)
  m2 <- melfcc(testsound, numcep=9, usecmp=TRUE, modelorder=8, 
    spec_out=TRUE, frames_in_rows=FALSE)

[Package tuneR version 1.4.7 Index]