There are several measures of speech intelligibility but STI has become the most common. The Speech Transmission Index (STI) is a measure developed by Drs. T. Houtgast and H. Steeneken at the Institute for Perception in the Netherlands in the early 1970s to determine how well speech can be understood. STI has undergone a number of studies and was proven accurate and is now an international standard (IEC 268-16). STI values range from 0.0 to 1.0, and a value of 0.7 or above indicates excellent intelligibility. The STI method can be used for both amplified and unamplified speech. For amplified speech to be intelligible, the STI measure requires that the speech amplified through a sound system must pass through unchanged in terms of its modulation characteristics, such that the sound of the speech entering the sound system will be the same as the sound heard by a listener. Speech sounds are created in a range of frequencies by the larynx, and the ones from 125 to 8000 Hz are considered by the STI. That basic range is amplitude modulated at 0 to 12.5 Hz by the tongue, lips, etc. to form the speech sounds needed for communication. These modulations must pass through directly or through the sound system to the listener undiminished. If any amount of modulation in the original speech is lost as speech travels towards the listener, the intelligibility and thus the STI is reduced. The principal causes of loss of intelligibility are noise and reverberation (itself a form of noise) and the frequency response of the speech transmission system. In simple terms it is the signal to noise ratio between the speech energy and the total room noise that determines speech intelligibility.
Essentially, speech starts as noise, as displayed below.
Then the speech organs, mouth, teeth, etc modulate the
signal, varying the amplitude and timbre.
And creates understandable speech.
1) Frequency response of the system.
This affects speech intelligibility as the high frequencies of a voice are very important to the
intelligibility of speech. If there is a significant high frequency loss, then there is no signal left from
which to detect any modulation.
2) The ratio of the background noise in the room to the received speech sound.
This affects the STI as the noise in the room determines the lower limit of the modulation that can be
heard. The noise in the room will mask the lower volume part of the modulation, thus reducing the
total modulation received. As speech levels must be at comfortable levels whether amplified or unamplified,
the background noise in the room is a critical component to good intelligibility since typical
noise levels are often only 10 dBA below speech levels. A speech level of 50 dBA can be quite
typical, as is a noise level of 40 dBA, giving only a 10 dBA signal to noise ratio. Usually a signal to
noise ratio of 15 dBA is required for good intelligibility.
3) The reverberation time.
This affects the sound remaining in the room between speech modulations. Once the sound is
modulated at maximum level, the reverberation decays slowly and the speech sounds heard in the
room are a combination of the original speech and the reverberations (plus noise). The longer the
reverberation time, the worse the STI result will be.
4) Discrete reflections or echoes.
Each major sound reflection with a path difference of over 40 ms, or a 40 foot extra travel distance,
begins to reduce the STI result. What happens is that the reflected sound adds to the original sound
and reduces the lowest modulation level. Multiple discrete reflections or echoes can be caused by
either sound reflecting surfaces or multiple loud speakers. Early reflections can be beneficial while
strong late discrete arrivals are particularly harmful to speech intelligibility.
The STI test generates a sequence of non-random noise from which the impulse response is
determined.
It then measures the received signal including reverberation and noise at some point in the room.
Then the received modulations are compared to the output modulations and the difference
calculated. This is analyzed in 7 different octave bands, from 125 Hz to 8 kHz. The noise in each
octave band is analyzed in terms of its modulation loss at 14 very low frequencies from 0.63 to 12.5
Hz. The averaged modulation difference for each octave band is called the Modulation Transfer
Index, as shown below.
Table 1: Modulation frequencies and octave bands. This is what is called a modulation reduction matrix.
1.000 is equivalent to no distortion or loss meaning 100% modulation transfer; the following table
resulted in a mid to low level of speech intelligibility.
These averaged modulation differences (or MTI) are then weighted and scaled to arrive at the single STI
number. There are a number of different intelligibility scales and methods that are based on these
modulation transfer indices, each with small differences.
When using STI, the scale should not be interpreted as a percentage of intelligibility from 0 to 1.0, but
rather needs to be related to descriptors as shown below. The following figures show descriptors that
human listeners would assign to the STI values measured or predicted. It should be noted that any
intelligibility above 0.7 is to be considered excellent and almost equivalent to 100% intelligibility.
Any STI below 0.3 is to be considered unacceptable or bad. An STI below 0.45 is to be considered poor
or poor to fair and could be reported as ‘questionable’. An STI between 0.45 and 0.55 is between fair
and fair to good, and reported as ‘acceptable’. In order to be considered ‘good’ an STI should be 0.6 or
above. An STI in the range of 0.7 or above can be considered excellent. These numbers are based on
% of word intelligibility using real persons having been compared to measured STI values in the same
rooms.
Another aspect of speech intelligibility is what is known as Articulation Loss of Consonants, or ALcons.
ALcons is a percentage value, ranging from 100 to 0%. The percentage is of lost consonants therefore
100% is equivalent to total consonant loss.
For example is you had a STI value of 1, you would have an ALcons percentage of 0%. ALcons was
originally empirically based on the number of consonants that could be heard at various locations by
listeners. A formula relating STI to consonant loss is as follows:
Consonant loss is very important to speech intelligibility but the ALcons value is the most useful when
compared to other ALcons values. ALcons values do not take into account many factors that strongly
affect speech intelligibility; therefore STI is generally more useful.
Some other related scales and methods of measuring speech intelligibility include, Common Intelligibility
Scale (CIS), Rapid Speech Transmission Index (RaSTI), and Speech Intelligibility Index for
Telecommunications (STITEL) and STIPA (STI for Public Address).
CIS was simply made popular because some people did not like that you only needed an STI of 0.7 or
70% to get excellent intelligibility. CIS provides nothing new that STI provides, since CIS is directly
obtained from STI values by re-scaling the STI values. To derive the CIS value, the same procedures are
used to test for STI, the same modulation frequencies and octave bands. However a different weighting
is used to derive a single CIS value from the average modulation loss in each octave band. CIS
compares to STI as shown in the following diagram.
As shown on the above figure, an STI value of 0.50 will yield a CIS value of 0.70. CIS values are used
to comply with the National Fire Alarm Code minimum speech intelligibility rules for a voice alarm
system.
The appendix of National Fire Alarm Code document NFPA 72 2007 Edition calls for a minimum
intelligibility of 0.70 on the CIS scale, which corresponds to an STI of 0.50 (fair – fair to good) for a
voice alarm system. A voice alarm system performing at or above either of these values constitutes a
compliant system. It should be noted that a CIS of 0.70 is far from perfect intelligibility. It corresponds
to about 80% word intelligibility, and about 95% sentence intelligibility, which has been shown to be
only slightly higher than what is required to reliably and accurately transmit an emergency message.
When testing RaSTI fewer modulation frequencies are used, and fewer octave bands are weighted to
derive the single number RaSTI value. While the test is faster it is also less accurate than STI.
Another method of testing intelligibility is STITEL, which is similar to both RaSTI and STI. STITEL
considers each of the seven octave bands used with STI, but only uses a single, octave specific,
modulation frequency for each octave band. There are a number of other methods and tests for
speech intelligibility; one can test for speech intelligibility of a male voice, or a female voice. Another
test is speech intelligibility for un-amplified voice, or speech intelligibility for non-native language
speaking listeners. Although STI was first used in the early 1970’s, it is still evolving today.
We at State of the Art Acoustik have developed a great expertise at designing for high STI. We use
various modeling software to determine STI before spaces are built or to modify spaces once they
are built. This modeling allows us to modify locations and angles of walls and other reflecting
surfaces to minimize detrimental sound reflections. We can also select and place loudspeakers to
optimize coverage patterns and minimize echoes while optimizing frequency response. Sound
absorbing materials can be added in these 3D models to control un-wanted reflections for surfaces
that cannot be moved. Background noise levels are also considered and varied to determine the
optimum noise level that can be allowed while still providing good speech intelligibility. This is one of
the most powerful measures to providing a built space that functions well in terms of speech
communication. We have made and designed hundreds of models to optimize STI.