A general strategy underlying hearing aid design is the selective amplification of portions of the input sound spectrum to compensate for a loss of hearing sensitivity. Major advances in engineering and fitting procedures have resulted in more successful hearing aid use in recent years (Kochkin, 2005). Nevertheless, the effortless understanding of speech enjoyed by people with normal hearing is not realized by many individuals with sensorineural hearing loss—even with amplification—because the effects of the loss are not limited to a reduction in sensitivity.
Reduced auditory sensitivity is an important determinant of the degree of handicap experienced by some individuals with hearing impairment, and both clinical assessments of hearing and the design and fitting of hearing aids are heavily weighted toward that factor. Amplification of speech brings some sounds into the region of audibility, but also might make the more intense portions of a sound too loud, thereby introducing a different kind of distortion. Many improvements in hearing aids are meant to correct for this problem by frequency shaping of amplification and by including graduated compression in one or more frequency bands. The development of technological advances in hearing aids, such as multichannel compression, directional microphones, and noise reduction circuits have addressed some vexing problems of abnormal loudness perception and susceptibility to background noise in people with hearing loss.
However, other important aspects of hearing, including abnormal frequency and temporal analysis, are not typically evaluated when considering a hearing aid for a patient, nor have they proven amenable to successful signal processing compensations. The most common forms of sensorineural hearing loss involve damage to outer hair cells, which produces audibility loss and a reduction or total loss of cochlear nonlinearity. Effects of abnormally linear processing by a damaged cochlea include reduced sensitivity to low-intensity sound, a loss of nonlinear gain with varying input levels, impaired frequency selectivity, and deficits in temporal processing. The contributions of these and other more central auditory processing factors to impaired speech recognition in people with hearing loss are not completely understood, perhaps accounting for some barriers to successful rehabilitation.
Plomp (1978) formalized a multifactor view of hearing loss, calling the sensitivity part "attenuation," and the remaining factor(s) "distortion." He defined the distortion component as the degree of speech understanding difficulty that remains even after adequate amplification has compensated for the attenuation component. According to Plomp, "Undoubtedly the best way to improve intelligibility would be to neutralize the ear's hearing loss of class D [distortion] by some kind of processing of the speech signal in the hearing aid." Our knowledge of impaired frequency and temporal resolution and their effects on the cochlear analyses of speech signals is as yet too limited to be a guide to hearing aid design; nevertheless, we can attempt to assess their separate and combined contributions to the problem of speech recognition, particularly in background noise.
Our approach to this goal is to analyze the known effects of hearing impairment on the neural representations that serve as the input to higher auditory processing centers for speech recognition. The internal representation of an input sound is a complex interaction of the external temporal waveform and frequency spectrum with the cochlear spectral and temporal characteristics of a given auditory system.
Individual speech sounds may be distinguished by the distribution of acoustic energy over time and across a wide frequency range. The sounds that enter the ear are temporal waveforms, reflecting changes in acoustic energy occurring over time. This temporal information is of critical importance in understanding speech, and is the source for the frequency analysis carried out in the cochlea.
Figure 1 [PDF] illustrates the temporal waveform of a sentence spoken by a female talker, along with an expanded portion of the word "shore." Note the slow amplitude modulations over the length of the sentence, as well as the temporal fine structure represented by more rapid modulations. Slow modulations of speech envelopes mark the locations of syllables and words in the speech stream, and generally occur at rates less than about 50 Hz (Rosen, 1992). This temporal speech information may not be difficult for listeners with hearing impairment to perceive in quiet, as it typically includes relatively high amplitude peaks separated by pauses or lower amplitude portions.
However, in background noise, listeners with hearing impairment may have more difficulty following the stream of speech. The noise may fill in the low-amplitude portions of the envelope so that words and syllables are less clearly separated in time. When the background noise is fluctuating, people with normal hearing can often hear speech in the regions of reduced noise, allowing them to recognize speech at less favorable signal-to-noise ratios. In contrast, people with hearing loss are not able to take advantage of the interruptions in background noise, and thus have difficulty understanding speech in noise whether it is steady or intermittent (for an excellent review of speech perception in noise, see Assmann & Summerfield, 2004).
The more rapid modulations of speech, shown clearly in the highlighted word "shore" in Figure 1 [PDF], provide information about fundamental voice frequency and code the higher frequencies necessary for consonant and vowel identification. The detection of these higher modulations and the ability to discriminate different rates of modulation are often impaired in people with hearing loss (Grant et al., 1998; Formby, 1987). A number of recent studies (reviewed in Moore, 2008) report evidence that temporal fine structure information (the higher modulation rates) cannot be used adequately by listeners with hearing loss, and this failure underlies some of their difficulty understanding speech in background noise.
The Aging Population
Deficits in temporal processing may be especially devastating in the aging population with hearing loss. Pichora-Fuller and Souza (2003) suggested that declines in speech understanding in quiet are about the same for elderly and young patients with hearing impairment, and they ascribed those difficulties to spectral processing impairments in the cochlea. However, elderly individuals with and without clinically significant hearing loss experience considerably greater difficulty than their younger counterparts in challenging acoustic environments such as background noise. Pichora-Fuller and Souza attributed these deficits to a reduction in neural synchrony or other temporal processes that add to the spectral processing difficulties typically associated with sensorineural hearing loss.
The speech signal enters the auditory system in the form of a temporal stream of modulated energy. Early in cochlear analysis this signal is separated into frequency channels by a mechanism often characterized as a bank of overlapping bandpass filters (for a review see Moore, 2003). In unimpaired ears, the bandwidths of these "auditory filters" increase with center frequency and each of the overlapping filters passes a part of the total spectrum determined by the characteristics of the input sound and of the filter. Neurons in the VIIIth cranial nerve are stimulated selectively by the outputs of the filters located near them, and the total array of neural stimulation over time is transmitted to the brain for further processing. The internal representation of a sound after processing in the cochlea may then be thought of as a convolution of the input acoustic signal and the filter response that occurs in the cochlea.
In a sensorineural-impaired cochlea, the mechanism that divides the sound into frequency channels for further processing is changed—the auditory filters are both broader in bandwidth and in many cases asymmetrical (Glasberg & Moore, 1986; Leek & Summers, 1993). This alteration produces an abnormal internal representation of an input sound and, therefore, an altered pattern of stimulation is transmitted to higher auditory-processing centers. One of the major changes in the internal representation is reduced differences in amplitude between peaks and valleys in the spectrum, making it difficult to locate the concentrations of energy that cue different speech sounds. Because the frequency location of the spectral peaks (such as formants) is a crucial cue to the identity of some speech sounds, extreme spectral flattening may result in decreased speech perception ability (Bacon & Brandt, 1982; Turner et al., 1999; Henry et al., 2005).
Figure 2 [PDF] schematically describes how this might occur. The top row illustrates a moderate-level vowel, /ε/, as found in the word "bet," processed through a normal auditory filterbank. This processing results in a smeared internal frequency representation at the output of the cochlea, called an excitation pattern [see Glasberg and Moore (1990) for discussion of excitation patterns]. The normal excitation pattern has clear peaks and valleys that show the three formants that define this sound as /ε/ (note that the first three narrow peaks are resolved harmonics, and together they make up the first formant). The bottom row shows a similar process, but for a listener with hearing loss. The input sound is amplified to overcome a sensitivity loss, as through a hearing aid, and the auditory filterbank shows broader and more overlapping auditory filters that process the input sound. The resulting excitation pattern now shows a strong representation of the first formant, but the higher formants are not well-defined. The resulting perception might correctly be /ε/, but with only this information, might just as likely be /I/ or /U/.
A number of studies have evaluated the effects of reducing the amount of contrast between peaks and valleys in the spectrum on sound identification (e.g., Summers & Leek, 1994; Henry et al., 2005). Leek, Dorman, and Summerfield (1987) reported that for accurate identification of spectral patterns, listeners with hearing impairment required about three times more peak-to-valley amplitude difference than listeners with normal hearing. These results are consistent with the hypothesis that the spectral contrast in the stimulus is degraded in the internal representations of listeners with hearing impairment by their abnormal frequency analysis.
Given the evidence that reductions in speech spectral contrast may be implicated in some deficits in speech understanding, especially in background noise, some attempts have been made to modify vowel spectra to compensate for the impoverished internal representation by changing the bandwidths of the formants (e.g., Summerfield et al., 1985; Boers, 1980; van Veen & Houtgast, 1985). However, simple spectral alteration schemes generally have not been successful in improving speech recognition in listeners with hearing impairment.
Consequences of Functional Effects
The functional effects described above, including impairments of temporal analysis, loss in frequency resolution, and loss in sensitivity, occur primarily because of damage to cochlear outer (and, for more severe losses, inner) hair cells. The deficits in speech understanding experienced by many listeners with hearing impairment may be attributed in part to this combination of effects. Consonant sounds tend to be high in frequency and low in amplitude, sometimes rendering those critical elements of speech inaudible to people with high-frequency hearing loss. Wearing a hearing aid may bring some of those sounds back into an audible range, but compression circuitry in the aid should limit the amplification of the more intense vowel sounds of speech. Unfortunately, multichannel compression hearing aids may abnormally flatten speech spectra, reducing the peak-to-valley differences, and resulting in impaired speech identification (Bor et al., 2008). The possible reductions in spectral contrast within speech sounds due to compression amplification, combined with the impaired frequency resolution characteristic of hearing loss, would serve to distort vowels and other voiced sounds. Formant peaks and transitions of speech become spread across frequency regions, interfering with the clear analysis of these speech features. These factors suggest that listeners with hearing impairment get a much more ambiguous "look" at the transitions and steady-state formants.
Individuals with hearing impairment wear hearing aids to support the successful analysis of speech waveforms as well as the awareness of sound and its localization in the environment. The combination of an amplification device plus the impaired auditory system of the wearer results in a number of potential sources of degradation or reduction of temporal and spectral information in the input speech waveform. Paradoxically, features of the hearing aid meant to improve speech understanding or provide sound information to the binaural auditory system to support localization of sound may actually contribute to a loss of spectral contrast or lack of preservation of precise temporal information. Once the impaired cochlea has contributed to the processing, distortions may also occur at higher auditory centers in the brain. Reductions in neural synchrony—due to cochlear damage or processes associated with aging—may interfere with the preservation of sound localization information, the ability to extract speech from background noise, and voice pitch information.
A vast amount of processing occurs between the cochlea and the cortex, and one question is whether the long-term effects of cochlear damage degrade that processing, or whether central auditory processing is intact but must act on a degraded cochlear output. Impaired central auditory processing, which might occur as a natural consequence of aging, would serve to further impede speech understanding.
Much research remains to be accomplished to determine compensation strategies for these speech-processing deficits that might lead to advancements in signal processing or other hearing-aid technology. Although properly fit amplification can often solve the problem of reduced audibility and impaired loudness perception, it does not yet address the impaired frequency resolution or losses of temporal processing exhibited by many people with moderate sensorineural hearing loss.
Improvements in the quality and intelligibility of hearing aid-processed speech to generate more normal auditory percepts will need either to compensate for these deficits or restore these lost functions to provide the clear understanding of speech.
Support for this work was provided by National Institutes of Health grant R01DC00626 and the Department of Veterans Affairs.