extract_features {voice} | R Documentation |
Extract audio features
Extracts features from WAV audio files.
features = c("f0", "fmt", "rf", "rpf", "rcf", "rfc", "mfcc"),
filesRange = NULL,
sex = "u",
windowShift = 10,
numFormants = 8,
numcep = 12,
dcttype = c("t2", "t1", "t3", "t4"),
fbtype = c("mel", "htkmel", "fcmel", "bark"),
resolution = 40,
usecmp = FALSE,
mc.cores = 1,
full.names = TRUE,
recursive = FALSE,
check.mono = FALSE,
stereo2mono = FALSE,
overwrite = FALSE,
freq = 44100,
round.to = NULL,
verbose = FALSE,
pycall = "~/miniconda3/envs/pyvoice38/bin/python3.8"
x |
A vector containing either files or directories of audio files in WAV format. |
features |
Vector of features to be extracted. (Default: |
filesRange |
The desired range of directory files (Default: |
sex |
windowShift |
numFormants |
numcep |
Number of Mel-frequency cepstral coefficients (cepstra) to return (Default: |
dcttype |
Type of DCT used. |
fbtype |
Auditory frequency scale to use: |
resolution |
usecmp |
Logical. Apply equal-loudness weighting and cube-root compression (PLP instead of LPC) (Default: |
mc.cores |
Number of cores to be used in parallel processing. (Default: |
full.names |
Logical. If |
recursive |
Logical. Should the listing recursively into directories? (Default: |
check.mono |
Logical. Check if the WAV file is mono. (Default: |
stereo2mono |
(Experimental) Logical. Should files be converted from stereo to mono? (Default: |
overwrite |
(Experimental) Logical. Should converted files be overwritten? If not, the file gets the suffix |
freq |
Frequency in Hz to write the converted files when |
round.to |
Number of decimal places to round to. (Default: |
verbose |
Logical. Should the running status be showed? (Default: |
pycall |
Python call. See https://github.com/filipezabala/voice for details. |
The feature 'df' corresponds to 'formant dispersion' (df2:df8) by Fitch (1997), 'pf' to formant position' (pf1:pf8) by Puts, Apicella & Cárdena (2011), 'rf' to 'formant removal' (rf1:rf8) by Zabala (2023), 'rcf' to 'formant cumulated removal' (rcf2:rcf8) by Zabala (2023) and 'rpf' to 'formant position removal' (rpf2:rpf8) by Zabala (2023).
A Media data frame containing the selected features.
Levinson N. (1946). The Wiener (root mean square) error criterion in filter design and prediction. Journal of Mathematics and Physics, 25(1-4), 261–278. (doi:10.1002/SAPM1946251261)
Durbin J. (1960). “The fitting of time-series models.” Revue de l’Institut International de Statistique, pp. 233–244. (https://www.jstor.org/stable/1401322)
Cooley J.W., Tukey J.W. (1965). “An algorithm for the machine calculation of complex Fourier series.” Mathematics of computation, 19(90), 297–301. (https://www.ams.org/journals/mcom/1965-19-090/S0025-5718-1965-0178586-1/)
Wasson D., Donaldson R. (1975). “Speech amplitude and zero crossings for automated identification of human speakers.” IEEE Transactions on Acoustics, Speech, and Signal Processing, 23(4), 390–392. (https://ieeexplore.ieee.org/document/1162690)
Allen J. (1977). “Short term spectral analysis, synthesis, and modification by discrete Fourier transform.” IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(3), 235– 238. (https://ieeexplore.ieee.org/document/1162950)
Schäfer-Vincent K. (1982). "Significant points: Pitch period detection as a problem of segmentation." Phonetica, 39(4-5), 241–253. (doi:10.1159/000261665 )
Schäfer-Vincent K. (1983). "Pitch period detection and chaining: Method and evaluation." Phonetica, 40(3), 177–202. (doi:10.1159/000261691)
Ephraim Y., Malah D. (1984). “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator.” IEEE Transactions on acoustics, speech, and signal processing, 32(6), 1109–1121. (https://ieeexplore.ieee.org/document/1164453)
Delsarte P., Genin Y. (1986). “The split Levinson algorithm.” IEEE transactions on acoustics, speech, and signal processing, 34(3), 470–478. (https://ieeexplore.ieee.org/document/1164830)
Jackson J.C. (1995). "The Harmonic Sieve: A Novel Application of Fourier Analysis to Machine Learning Theory and Practice." Technical report, Carnegie-Mellon University Pittsburgh PA Schoo; of Computer Science. (https://apps.dtic.mil/sti/pdfs/ADA303368.pdf)
Fitch, W.T. (1997) "Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques." J. Acoust. Soc. Am. 102, 1213 – 1222. (doi:10.1121/1.421048)
Boersma P., van Heuven V. (2001). Praat, a system for doing phonetics by computer. Glot. Int., 5(9/10), 341–347. (https://www.fon.hum.uva.nl/paul/papers/speakUnspeakPraat_glot2001.pdf)
Ellis DPW (2005). “PLP and RASTA (and MFCC, and inversion) in Matlab.” Online web resource. (https://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/)
Puts, D.A., Apicella, C.L., Cardenas, R.A. (2012) "Masculine voices signal men's threat potential in forager and industrial societies." Proc. R. Soc. B Biol. Sci. 279, 601–609. (doi:10.1098/rspb.2011.0829)
# get path to audio file
path2wav <- list.files(system.file('extdata', package = 'wrassp'),
pattern = glob2rx('*.wav'), full.names = TRUE)
# minimal usage
M1 <- extract_features(path2wav)
M2 <- extract_features(dirname(path2wav))
# limiting filesRange
M3 <- extract_features(path2wav, filesRange = 3:6)