specgram {gsignal} | R Documentation |
Spectrogram
Description
Spectrogram using short-time Fourier transform.
Usage
specgram(
x,
n = min(256, length(x)),
fs = 2,
window = hanning(n),
overlap = ceiling(n/2)
)
## S3 method for class 'specgram'
plot(
x,
col = grDevices::gray(0:512/512),
xlab = "Time",
ylab = "Frequency",
...
)
## S3 method for class 'specgram'
print(
x,
col = grDevices::gray(0:512/512),
xlab = "Time",
ylab = "Frequency",
...
)
Arguments
x |
Input signal, specified as a vector. |
n |
Size of the FFT window. Default: 256 (or less if |
fs |
Sample rate in Hz. Default: 2 |
window |
Either an integer indicating the length of a Hanning window, or a vector of values representing the shape of the FFT tapering window. Default: hanning(n) |
overlap |
Overlap with previous window. Default: half the window length |
col |
Colormap to use for plotting. Default: |
xlab |
Label for x-axis of plot. Default: |
ylab |
Label for y-axis of plot. Default: |
... |
Additional arguments passed to the |
Details
Generate a spectrogram for the signal x
. The signal is chopped into
overlapping segments of length n
, and each segment is windowed and
transformed into the frequency domain using the FFT. The default segment size
is 256. If fs
is given, it specifies the sampling rate of the input
signal. The argument window
specifies an alternate window to apply
rather than the default of hanning(n)
. The argument overlap specifies
the number of samples overlap between successive segments of the input
signal. The default overlap is length (window)/2
.
When results of specgram
are printed, a spectrogram will be plotted.
As with lattice
plots, automatic printing does not work inside loops
and function calls, so explicit calls to print
or plot
are
needed there.
The choice of window defines the time-frequency resolution. In speech for example, a wide window shows more harmonic detail while a narrow window averages over the harmonic detail and shows more formant structure. The shape of the window is not so critical so long as it goes gradually to zero on the ends.
Step size (which is window length minus overlap) controls the horizontal scale of the spectrogram. Decrease it to stretch, or increase it to compress. Increasing step size will reduce time resolution, but decreasing it will not improve it much beyond the limits imposed by the window size (you do gain a little bit, depending on the shape of your window, as the peak of the window slides over peaks in the signal energy). The range 1-5 msec is good for speech.
FFT length controls the vertical scale. Selecting an FFT length greater than the window length does not add any information to the spectrum, but it is a good way to interpolate between frequency points which can make for prettier spectrograms.
AFTER you have generated the spectral slices, there are a number of decisions for displaying them. First the phase information is discarded and the energy normalized:
S <- abs(S); S <- S / max(S)
Then the dynamic range of the signal is chosen. Since information in speech is well above the noise floor, it makes sense to eliminate any dynamic range at the bottom end. This is done by taking the max of the magnitude and some minimum energy such as minE = -40dB. Similarly, there is not much information in the very top of the range, so clipping to a maximum energy such as maxE = -3dB makes sense:
S <- max(S, 10^(minE / 10)); S <- min(S, 10^(maxE / 10))
The frequency range of the FFT is from 0 to the Nyquist frequency of one half the sampling rate. If the signal of interest is band limited, you do not need to display the entire frequency range. In speech for example, most of the signal is below 4 kHz, so there is no reason to display up to the Nyquist frequency of 10 kHz for a 20 kHz sampling rate. In this case you will want to keep only the first 40 More generally, to display the frequency range from minF to maxF, you could use the following row index:
idx <- (f >= minF & f <= maxF)
Then there is the choice of colormap. A brightness varying colormap such as copper or bone gives good shape to the ridges and valleys. A hue varying colormap such as jet or hsv gives an indication of the steepness of the slopes. In the field that I am working in (neuroscience / electrophysiology) rainbow color palettes such as jet are very often used. This is an unfortunate choice mainly because (a) colors do not have a natural order, and (b) rainbow palettes are not perceptually linear. It would be better to use a grayscale palette or the 'cool-to-warm' scheme. The examples show how to do this in R.
The final spectrogram is displayed in log energy scale and by convention has low frequencies on the bottom of the image.
Value
A list of class specgram
consisting of the following elements:
- S
the complex output of the FFT, one row per slice
- f
the frequency indices corresponding to the rows of S
- t
the time indices corresponding to the columns of S
Author(s)
Paul Kienzle, pkienzle@users.sf.net.
Conversion to R by Tom Short
This conversion to R by Geert van Boxtel, G.J.M.vanBoxtel@gmail.com.
Examples
sp <- specgram(chirp(seq(-2, 15, by = 0.001), 400, 10, 100, 'quadratic'))
specgram(chirp(seq(0, 5, by = 1/8000), 200, 2, 500,
"logarithmic"), fs = 8000)
# use other color palettes than grayscale
jet <- grDevices::colorRampPalette(
c("#00007F", "blue", "#007FFF", "cyan", "#7FFF7F",
"yellow", "#FF7F00", "red", "#7F0000"))
plot(specgram(chirp(seq(0, 5, by = 1/8000), 200, 2, 500, "logarithmic"),
fs = 8000), col = jet(20))
c2w <- grDevices::colorRampPalette(colors = c("red", "white", "blue"))
plot(specgram(chirp(seq(0, 5, by = 1/8000), 200, 2, 500, "logarithmic"),
fs = 8000), col = c2w(50))