R: Extract audio features

extract_features {voice}

R Documentation

Extract audio features

Description

Extracts features from WAV audio files.

Usage

extract_features(
  x,
  features = c("f0", "fmt", "rf", "rpf", "rcf", "rfc", "mfcc"),
  filesRange = NULL,
  sex = "u",
  windowShift = 10,
  numFormants = 8,
  numcep = 12,
  dcttype = c("t2", "t1", "t3", "t4"),
  fbtype = c("mel", "htkmel", "fcmel", "bark"),
  resolution = 40,
  usecmp = FALSE,
  mc.cores = 1,
  full.names = TRUE,
  recursive = FALSE,
  check.mono = FALSE,
  stereo2mono = FALSE,
  overwrite = FALSE,
  freq = 44100,
  round.to = NULL,
  verbose = FALSE,
  pycall = "~/miniconda3/envs/pyvoice38/bin/python3.8"
)

Arguments

`x`	A vector containing either files or directories of audio files in WAV format.
`features`	Vector of features to be extracted. (Default: `'f0','fmt','rf','rcf','rpf','rfc','mfcc'`). The `'fmt_praat'` feature may take long time processing. The following features may contain a variable number of columns: `'cep'`, `'dft'`, `'css'` and `'lps'`.
`filesRange`	The desired range of directory files (Default: `NULL`, i.e., all files). Should only be used when all the WAV files are in the same folder.
`sex`	`= <code>` set sex specific parameters where <code> = `'f'`[emale], `'m'`[ale] or `'u'`[nknown] (Default: `'u'`). Used as 'gender' by `wrassp::ksvF0`, `wrassp::forest` and `wrassp::mhsF0`.
`windowShift`	`= <dur>` set analysis window shift to <dur>ation in ms (Default: `5.0`). Used by `wrassp::ksvF0`, `wrassp::forest`, `wrassp::mhsF0`, `wrassp::zcrana`, `wrassp::rfcana`, `wrassp::acfana`, `wrassp::cepstrum`, `wrassp::dftSpectrum`, `wrassp::cssSpectrum` and `wrassp::lpsSpectrum`.
`numFormants`	`= <num>` <num>ber of formants (Default: `8`). Used by `wrassp::forest`.
`numcep`	Number of Mel-frequency cepstral coefficients (cepstra) to return (Default: `12`). Used by `tuneR::melfcc`.
`dcttype`	Type of DCT used. `'t1'` or `'t2'`, `'t3'` for HTK `'t4'` for feacalc (Default: `'t2'`). Used by `tuneR::melfcc`.
`fbtype`	Auditory frequency scale to use: `'mel'`, `'bark'`, `'htkmel'`, `'fcmel'` (Default: `'mel'`). Used by `tuneR::melfcc`.
`resolution`	`= <freq>` set FFT length to the smallest value which results in a frequency resolution of <freq> Hz or better (Default: `40.0`). Used by `wrassp::cssSpectrum`, `wrassp::dftSpectrum` and `wrassp::lpsSpectrum`.
`usecmp`	Logical. Apply equal-loudness weighting and cube-root compression (PLP instead of LPC) (Default: `FALSE`). Used by `tuneR::melfcc`.
`mc.cores`	Number of cores to be used in parallel processing. (Default: `1`)
`full.names`	Logical. If `TRUE`, the directory path is prepended to the file names to give a relative file path. If `FALSE`, the file names (rather than paths) are returned. (Default: `TRUE`) Used by `base::list.files`.
`recursive`	Logical. Should the listing recursively into directories? (Default: `FALSE`) Used by `base::list.files`.
`check.mono`	Logical. Check if the WAV file is mono. (Default: `TRUE`)
`stereo2mono`	(Experimental) Logical. Should files be converted from stereo to mono? (Default: `TRUE`)
`overwrite`	(Experimental) Logical. Should converted files be overwritten? If not, the file gets the suffix `_mono`. (Default: `FALSE`)
`freq`	Frequency in Hz to write the converted files when `stereo2mono=TRUE`. (Default: `44100`)
`round.to`	Number of decimal places to round to. (Default: `NULL`)
`verbose`	Logical. Should the running status be showed? (Default: `FALSE`)
`pycall`	Python call. See https://github.com/filipezabala/voice for details.

Details

The feature 'df' corresponds to 'formant dispersion' (df2:df8) by Fitch (1997), 'pf' to formant position' (pf1:pf8) by Puts, Apicella & Cárdena (2011), 'rf' to 'formant removal' (rf1:rf8) by Zabala (2023), 'rcf' to 'formant cumulated removal' (rcf2:rcf8) by Zabala (2023) and 'rpf' to 'formant position removal' (rpf2:rpf8) by Zabala (2023).

Value

A Media data frame containing the selected features.

References

Levinson N. (1946). The Wiener (root mean square) error criterion in filter design and prediction. Journal of Mathematics and Physics, 25(1-4), 261–278. (doi:10.1002/SAPM1946251261)

Durbin J. (1960). “The fitting of time-series models.” Revue de l’Institut International de Statistique, pp. 233–244. (https://www.jstor.org/stable/1401322)

Cooley J.W., Tukey J.W. (1965). “An algorithm for the machine calculation of complex Fourier series.” Mathematics of computation, 19(90), 297–301. (https://www.ams.org/journals/mcom/1965-19-090/S0025-5718-1965-0178586-1/)

Wasson D., Donaldson R. (1975). “Speech amplitude and zero crossings for automated identification of human speakers.” IEEE Transactions on Acoustics, Speech, and Signal Processing, 23(4), 390–392. (https://ieeexplore.ieee.org/document/1162690)

Allen J. (1977). “Short term spectral analysis, synthesis, and modification by discrete Fourier transform.” IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(3), 235– 238. (https://ieeexplore.ieee.org/document/1162950)

Schäfer-Vincent K. (1982). "Significant points: Pitch period detection as a problem of segmentation." Phonetica, 39(4-5), 241–253. (doi:10.1159/000261665 )

Schäfer-Vincent K. (1983). "Pitch period detection and chaining: Method and evaluation." Phonetica, 40(3), 177–202. (doi:10.1159/000261691)

Ephraim Y., Malah D. (1984). “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator.” IEEE Transactions on acoustics, speech, and signal processing, 32(6), 1109–1121. (https://ieeexplore.ieee.org/document/1164453)

Delsarte P., Genin Y. (1986). “The split Levinson algorithm.” IEEE transactions on acoustics, speech, and signal processing, 34(3), 470–478. (https://ieeexplore.ieee.org/document/1164830)

Jackson J.C. (1995). "The Harmonic Sieve: A Novel Application of Fourier Analysis to Machine Learning Theory and Practice." Technical report, Carnegie-Mellon University Pittsburgh PA Schoo; of Computer Science. (https://apps.dtic.mil/sti/pdfs/ADA303368.pdf)

Fitch, W.T. (1997) "Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques." J. Acoust. Soc. Am. 102, 1213 – 1222. (doi:10.1121/1.421048)

Boersma P., van Heuven V. (2001). Praat, a system for doing phonetics by computer. Glot. Int., 5(9/10), 341–347. (https://www.fon.hum.uva.nl/paul/papers/speakUnspeakPraat_glot2001.pdf)

Ellis DPW (2005). “PLP and RASTA (and MFCC, and inversion) in Matlab.” Online web resource. (https://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/)

Puts, D.A., Apicella, C.L., Cardenas, R.A. (2012) "Masculine voices signal men's threat potential in forager and industrial societies." Proc. R. Soc. B Biol. Sci. 279, 601–609. (doi:10.1098/rspb.2011.0829)

Examples

library(voice)

# get path to audio file
path2wav <- list.files(system.file('extdata', package = 'wrassp'),
pattern = glob2rx('*.wav'), full.names = TRUE)

# minimal usage
M1 <- extract_features(path2wav)
M2 <- extract_features(dirname(path2wav))
identical(M1,M2)
table(basename(M1$wav_path))

# limiting filesRange
M3 <- extract_features(path2wav, filesRange = 3:6)
table(basename(M3$wav_path))

[Package voice version 0.4.21 Index]