Krogsgaardhartmann2973
Minimum mean-square error (MMSE) approaches to speech enhancement are widely used in the literature. The quality of enhanced speech produced by an MMSE approach is directly impacted by the accuracy of the employed a priori signal-to-noise ratio (SNR) estimator. In this paper, the a priori SNR estimate spectral distortion (SD) level that results in a just-noticeable difference (JND) in the perceived quality of MMSE approach enhanced speech is found. The JND SD level is indicative of the accuracy that an a priori SNR estimator must exceed to have no impact on the perceived quality of MMSE approach enhanced speech. To measure the JND SD level, listening tests are conducted across five SNR levels, five noise sources, and two MMSE approaches [the MMSE short-time spectral amplitude (MMSE-STSA) estimator and the Wiener filter]. A statistical analysis of the results indicates that the JND SD level increases with the SNR level, is higher for the MMSE-STSA estimator, and is not impacted by the type of background noise. Following the literature, a significant improvement in a priori SNR estimation accuracy is required to reach the JND SD level.Studies supporting learning-induced reductions in listening-related cognitive load have lacked procedural learning controls, making it difficult to determine the extent to which effects arise from perceptual or procedural learning. Here, listeners were trained in the coordinate response measure (CRM) task under unfiltered (UT) or degraded low-pass filtered (FT) conditions. Improvements in low-pass filtered CRM performance were larger for FT. Both conditions showed training-related reductions in cognitive load as indexed by a secondary working memory task. However, only the FT condition showed a correlation between CRM improvement and secondary task performance, suggesting that effects can be driven by perceptual and procedural learning.The Reflections series takes a look back on historical articles from The Journal of the Acoustical Society of America that have had a significant impact on the science and practice of acoustics.In exterior sound field reproduction using loudspeaker arrays, such as a single circular array, there is a trade-off between the reproduction accuracy and the filter gain of the loudspeaker array. With the aim of reproducing complex sound fields with a lower filter gain, an asymmetrical array geometry with reflections between two or more rigid arrays is introduced. This paper proposes a sound field reproduction method using two rigid circular loudspeaker arrays in a circular harmonic domain. Transfer functions that consider the multiple scattering between two rigid baffles can be represented in the circular harmonic domain. By repeatedly transforming the expansion coefficient between two coordinate systems, the circular harmonic expansion was applied to the reproduced sound field in a mixed coordinate system. Then, the driving function of the loudspeaker arrays was derived through a mode expansion. Numerical simulations were conducted to verify the accuracy of the reproduced sound field.Oscillating electric currents through a wire under tension can excite transverse vibrational modes of the wire when a perpendicular static magnetic field is present and the frequency of the current is close to the natural frequency of the mode of interest. The excitation of the mode is associated with temporally oscillating Maxwell stresses on the wire, often also known as oscillating Lorentz forces. That excitation process is sometimes demonstrated in educational contexts. The investigation here concerns situations where a temporally oscillating magnetic field generated by oscillating electric currents in a cylindrical coil replaces the imposed perpendicular static magnetic field. The frequencies of the currents in the wire and in the coil are related to the frequency of the oscillating stress. In this experiment, this effect is documented for sum-frequency excitation (with input frequencies in the range of half that of the excited lowest vibrational mode of the wire) and the difference-frequency excitation (with input frequencies an order-of-magnitude larger than the mode frequency). This coupling may be useful when it is desirable to use only high-frequency currents. The experiment uses tone-burst stress excitation and a differential photodiode for detecting transverse low-amplitude wire oscillations. Signal envelopes decayed exponentially after the tone-burst.Although the first two or three formant frequencies are considered essential cues for vowel identification, certain limitations of this approach have been noted. Alternative explanations have suggested listeners rely on other aspects of the gross spectral shape. A study conducted by Ito, Tsuchida, and Yano [(2001). Cy7 DiC18 manufacturer J. Acoust. Soc. Am. 110, 1141-1149] offered strong support for the latter, as attenuation of individual formant peaks left vowel identification largely unaffected. In the present study, these experiments are replicated in two dialects of English. Although the results were similar to those of Ito, Tsuchida, and Yano [(2001). J. Acoust. Soc. Am. 110, 1141-1149], quantitative analyses showed that when a formant is suppressed, participant response entropy increases due to increased listener uncertainty. In a subsequent experiment, using synthesized vowels with changing formant frequencies, suppressing individual formant peaks led to reliable changes in identification of certain vowels but not in others. These findings indicate that listeners can identify vowels with missing formant peaks. However, such formant-peak suppression may lead to decreased certainty in identification of steady-state vowels or even changes in vowel identification in certain dynamically specified vowels.The effect of age on release from masking (RFM) was examined using cortical auditory evoked potentials (CAEPs). Two speech-in-noise paradigms [i.e., fixed speech with varying signal-to-noise ratios (SNRs) and fixed noise with varying speech levels], similar to those used in behavioral measures of RFM, were employed with competing continuous and interrupted noises. Young and older normal-hearing adults participated (N = 36). Cortical responses were evoked in the fixed speech paradigm at SNRs of -10, 0, and 10 dB. In the fixed noise paradigm, the CAEP SNR threshold was determined in both noises as the lowest SNR that yielded a measurable response. RFM was demonstrated in the fixed speech paradigm with a significant amount of missing responses, longer P1 and N1 latencies, and smaller N1 response amplitudes in continuous noise at the poorest -10 dB SNR. In the fixed noise paradigm, RFM was demonstrated with significantly lower CAEP SNR thresholds in interrupted noise. Older participants demonstrated significantly longer P2 latencies and reduced P1 and N1 amplitudes.