Parksvoss2113

Z Iurium Wiki

Experimental results on the WSJ0-SI84 corpus indicated that the proposed DeepFBE with only 4-ms latency achieved much better performance than traditional low-latency speech enhancement algorithms across several objective metrics. Listening test results further confirmed that our approach achieved higher speech quality than other methods.Substantial evidence suggests that sensitivity to the difference between the major vs minor musical scales may be bimodally distributed. Much of this evidence comes from experiments using the "3-task." On each trial in the 3-task, the listener hears a rapid, random sequence of tones containing equal numbers of notes of either a G major or G minor triad and strives (with feedback) to judge which type of "tone-scramble" it was. This study asks whether the bimodal distribution in 3-task performance is due to variation (across listeners) in sensitivity to differences in pitch. On each trial in a "pitch-difference task," the listener hears two tones and judges whether the second tone is higher or lower than the first. When the first tone is roved (rather than fixed throughout the task), performance varies dramatically across listeners with median threshold approximately equal to a quarter-tone. Strikingly, nearly all listeners with thresholds higher than a quarter-tone performed near chance in the 3-task. Across listeners with thresholds below a quarter-tone, 3-task performance was uniformly distributed from chance to ceiling; thus, the large, lower mode of the distribution in 3-task performance is produced mainly by listeners with roved pitch-difference thresholds greater than a quarter-tone.Lexical bias is the tendency to perceive an ambiguous speech sound as a phoneme completing a word; more ambiguity typically causes greater reliance on lexical knowledge. A speech sound ambiguous between /g/ and /k/ is more likely to be perceived as /g/ before /ɪft/ and as /k/ before /ɪs/. The magnitude of this difference-the Ganong shift-increases when high cognitive load limits available processing resources. The effects of stimulus naturalness and informational masking on Ganong shifts and reaction times were explored. Tokens between /gɪ/ and /kɪ/ were generated using morphing software, from which two continua were created ("giss"-"kiss" and "gift"-"kift"). In experiment 1, Ganong shifts were considerably larger for sine- than noise-vocoded versions of these continua, presumably because the spectral sparsity and unnatural timbre of the former increased cognitive load. In experiment 2, noise-vocoded stimuli were presented alone or accompanied by contralateral interferers with constant within-band amplitude envelope, or within-band envelope variation that was the same or different across bands. The latter, with its implied spectro-temporal variation, was predicted to cause the greatest cognitive load. Reaction-time measures matched this prediction; Ganong shifts showed some evidence of greater lexical bias for frequency-varying interferers, but were influenced by context effects and diminished over time.Noise in healthcare settings, such as hospitals, often exceeds levels recommended by health organizations. Although researchers and medical professionals have raised concerns about the effect of these noise levels on spoken communication, objective measures of behavioral intelligibility in hospital noise are lacking. Further, no studies of intelligibility in hospital noise used medically relevant terminology, which may differentially impact intelligibility compared to standard terminology in speech perception research and is essential for ensuring ecological validity. Here, intelligibility was measured using online testing for 69 young adult listeners in three listening conditions (i.e., quiet, speech-shaped noise, and hospital noise 23 listeners per condition) for four sentence types. Three sentence types included medical terminology with varied lexical frequency and familiarity characteristics. A final sentence set included non-medically related sentences. Results showed that intelligibility was negatively impacted by both noise types with no significant difference between the hospital and speech-shaped noise. Medically related sentences were not less intelligible overall, but word recognition accuracy was significantly positively correlated with both lexical frequency and familiarity. These results support the need for continued research on how noise levels in healthcare settings in concert with less familiar medical terminology impact communications and ultimately health outcomes.Current best-practice aircraft noise calculation models usually apply a so-called lateral attenuation term, i.e., an empirical formula to account for sound propagation phenomena in situations with grazing sound incidence. The recently developed aircraft noise model sonAIR features a physically based sound propagation core that claims to implicitly account for the phenomena condensed in this correction. The current study compares calculations for situations with grazing sound incidence of sonAIR and two best-practice models, AEDT and FLULA2, with measurements. The validation dataset includes on the one hand a large number of commercial aircraft during final approach and on the other hand departures of a jet fighter aircraft, with measurement distances up to 2.8 km. The comparisons show that a lateral attenuation term is justified for best-practice models, resulting in a better agreement with measurements. However, sonAIR yields better results than the two other models, with deviations on the order of only ±1 dB at all measurement locations. A further advantage of a physically based modeling approach, as used in sonAIR, is its ability to account for varying conditions affecting lateral attenuation, like systematic differences in the temperature stratification between day and night or ground cover other than grassland.Direction-of-arrival (DOA) estimation is widely used in underwater detection and localization. To address the high-resolution DOA estimation problem, a DenseBlock-based U-net structure is proposed in this paper. U-net is a U-shaped fully convolutional neural network, which yields a two-dimensional image. DenseBlock is a more efficient structure than typical convolutional layers. The proposed network replaces the concatenated convolutional layers in the original U-net with DenseBlocks. Through training, the network can remove the interference of sidelobes and noise in a conventional beam forming bearing-time record (BTR) and get a clean BTR; hence, this method has narrow beam width and few sidelobes. In addition, the network can be trained by simulation data and applied in actual data when the simulated and actual data are similar in BTR features, so the method has high generalization. For a multi-target problem, the network does not need to be trained on all cases with different target quantities and therefore can reduce the training set size. As a data-driven method, it does not rely on prior assumptions of the array model and possesses better robustness to array imperfections than typical model-based DOA algorithms. Simulations and experiments verify the advantages of the proposed method.In an effort to mitigate the 2019 novel coronavirus disease pandemic, mask wearing and social distancing have become standard practices. While effective in fighting the spread of the virus, these protective measures have been shown to deteriorate speech perception and sound intensity, which necessitates speaking louder to compensate. The goal of this paper is to investigate via numerical simulations how compensating for mask wearing and social distancing affects measures associated with vocal health. A three-mass body-cover model of the vocal folds (VFs) coupled with the sub- and supraglottal acoustic tracts is modified to incorporate mask and distance dependent acoustic pressure models. The results indicate that sustaining target levels of intelligibility and/or sound intensity while using these protective measures may necessitate increased subglottal pressure, leading to higher VF collision and, thus, potentially inducing a state of vocal hyperfunction, a progenitor to voice pathologies.High frequency is a solution to high data-rate underwater acoustic communications. Extensive studies have been conducted on high-frequency (>40 kHz) acoustic channels, which are strongly susceptible to surface waves. The corresponding channel statistics related to acoustic communications, however, still deserve systematic investigation. Here, an efficient channel modeling method based on statistical analysis is proposed. Three wind-associated environmental models are integrated into this hybrid model. The Texel-Marsen-Arsole spectral model is adopted to generate a three-dimensional shallow-water surface, which affects the Doppler shifts of large-scale paths. Small-scale micropaths are statistically analyzed and modeled according to the measured channels. The Hall-Novarini model is adopted to simulate the refraction and attenuation caused by wind-generated bubbles. An existing wind-generated noise model is applied to calculate the noise spectrum. The proposed model has been validated by the at-sea measurements collected in the Gulf of Mexico in 2016 and 2017. This model can be used to further analyze the channels at different carrier frequencies, bandwidths, and wind speeds for certain transmission conditions.An analysis of the plane wave reflection coefficient of the seabed, R, is developed for two upward-refracting sediment sound speed profiles the two-parameter linear and the three-parameter inverse-square, both extending to infinite depth. For the linear profile, it turns out that |R| = 1, representing total reflection for all grazing angles and all frequencies, signifying that in this special case, |R| is insensitive to the gradient. The implication is that if |R| is to return information about the shape of a profile, the gradient must change with depth, either smoothly through the presence of second- and/or higher-order depth derivatives or discontinuously at, say, an interface between sediment layers. The inverse-square is an example of a profile with a smoothly varying gradient, for which a general, closed-form expression for R is derived, valid for all grazing angles and all frequencies. When the sound speed ratio is less than unity, representative of a fine-grained sediment (mud), |R| exhibits two frequency regimes, designated high and low, separated by a transition frequency, fT. In each of these regimes, |R| exhibits a frequency-dependent angle of intromission, which exhibits high- and low-frequency limiting values, differing by approximately 3.5°, depending on the geo-acoustic parameters of the sediment.This work presents three-dimensional (3D) numerical analysis of acoustic radiation force on an elastic microsphere suspended in a viscous fluid. Acoustophoresis of finite-sized, neutrally buoyant, nearly incompressible soft particles may improve by orders of magnitude and change directions when going through resonant vibrations. These findings offer the potential to manipulate and separate microparticles based on their resonance frequency. This concept has profound implications in cell and microparticle handling, 3D printing, and enrichment in lab-on-chip applications. The existing analytical body of work can predict spheroidal harmonics of an elastic sphere and acoustic radiation force based on monopole and dipole scatter in an ideal fluid. However, little attention is given to the complex interplay of resonant fluid and solid bodies that generate acoustic radiation. The finite element method is used to find resonant modes, damping factors, and acoustic forces of an elastic sphere subject to a standing acoustic wave.

Autoři článku: Parksvoss2113 (Laursen Kristiansen)