SPEECH TRANSMISSION INDEX (STI)


There are several measures of speech intelligibility but STI has become the most common. The Speech Transmission Index (STI) is a measure developed by Drs. T. Houtgast and H. Steeneken at the Institute for Perception in the Netherlands in the early 1970s to determine how well speech can be understood. STI has undergone a number of studies and was proven accurate and is now an international standard (IEC 268-16). STI values range from 0.0 to 1.0, and a value of 0.7 or above indicates excellent intelligibility. The STI method can be used for both amplified and unamplified speech. For amplified speech to be intelligible, the STI measure requires that the speech amplified through a sound system must pass through unchanged in terms of its modulation characteristics, such that the sound of the speech entering the sound system will be the same as the sound heard by a listener. Speech sounds are created in a range of frequencies by the larynx, and the ones from 125 to 8000 Hz are considered by the STI. That basic range is amplitude modulated at 0 to 12.5 Hz by the tongue, lips, etc. to form the speech sounds needed for communication. These modulations must pass through directly or through the sound system to the listener undiminished. If any amount of modulation in the original speech is lost as speech travels towards the listener, the intelligibility and thus the STI is reduced. The principal causes of loss of intelligibility are noise and reverberation (itself a form of noise) and the frequency response of the speech transmission system. In simple terms it is the signal to noise ratio between the speech energy and the total room noise that determines speech intelligibility.


Essentially, speech starts as noise, as displayed below.





Then the speech organs, mouth, teeth, etc modulate the signal, varying the amplitude and timbre.





And creates understandable speech.





1) Frequency response of the system.
This affects speech intelligibility as the high frequencies of a voice are very important to the intelligibility of speech. If there is a significant high frequency loss, then there is no signal left from which to detect any modulation.
2) The ratio of the background noise in the room to the received speech sound.
This affects the STI as the noise in the room determines the lower limit of the modulation that can be heard. The noise in the room will mask the lower volume part of the modulation, thus reducing the total modulation received. As speech levels must be at comfortable levels whether amplified or unamplified, the background noise in the room is a critical component to good intelligibility since typical noise levels are often only 10 dBA below speech levels. A speech level of 50 dBA can be quite typical, as is a noise level of 40 dBA, giving only a 10 dBA signal to noise ratio. Usually a signal to noise ratio of 15 dBA is required for good intelligibility.
3) The reverberation time.
This affects the sound remaining in the room between speech modulations. Once the sound is modulated at maximum level, the reverberation decays slowly and the speech sounds heard in the room are a combination of the original speech and the reverberations (plus noise). The longer the reverberation time, the worse the STI result will be.
4) Discrete reflections or echoes.
Each major sound reflection with a path difference of over 40 ms, or a 40 foot extra travel distance, begins to reduce the STI result. What happens is that the reflected sound adds to the original sound and reduces the lowest modulation level. Multiple discrete reflections or echoes can be caused by either sound reflecting surfaces or multiple loud speakers. Early reflections can be beneficial while strong late discrete arrivals are particularly harmful to speech intelligibility. The STI test generates a sequence of non-random noise from which the impulse response is determined. It then measures the received signal including reverberation and noise at some point in the room. Then the received modulations are compared to the output modulations and the difference calculated. This is analyzed in 7 different octave bands, from 125 Hz to 8 kHz. The noise in each octave band is analyzed in terms of its modulation loss at 14 very low frequencies from 0.63 to 12.5 Hz. The averaged modulation difference for each octave band is called the Modulation Transfer Index, as shown below.

Table 1: Modulation frequencies and octave bands. This is what is called a modulation reduction matrix. 1.000 is equivalent to no distortion or loss meaning 100% modulation transfer; the following table resulted in a mid to low level of speech intelligibility.



These averaged modulation differences (or MTI) are then weighted and scaled to arrive at the single STI number. There are a number of different intelligibility scales and methods that are based on these modulation transfer indices, each with small differences. When using STI, the scale should not be interpreted as a percentage of intelligibility from 0 to 1.0, but rather needs to be related to descriptors as shown below. The following figures show descriptors that human listeners would assign to the STI values measured or predicted. It should be noted that any intelligibility above 0.7 is to be considered excellent and almost equivalent to 100% intelligibility.

Any STI below 0.3 is to be considered unacceptable or bad. An STI below 0.45 is to be considered poor or poor to fair and could be reported as ‘questionable’. An STI between 0.45 and 0.55 is between fair and fair to good, and reported as ‘acceptable’. In order to be considered ‘good’ an STI should be 0.6 or above. An STI in the range of 0.7 or above can be considered excellent. These numbers are based on % of word intelligibility using real persons having been compared to measured STI values in the same rooms. Another aspect of speech intelligibility is what is known as Articulation Loss of Consonants, or ALcons. ALcons is a percentage value, ranging from 100 to 0%. The percentage is of lost consonants therefore 100% is equivalent to total consonant loss. For example is you had a STI value of 1, you would have an ALcons percentage of 0%. ALcons was originally empirically based on the number of consonants that could be heard at various locations by listeners. A formula relating STI to consonant loss is as follows:


Consonant loss is very important to speech intelligibility but the ALcons value is the most useful when compared to other ALcons values. ALcons values do not take into account many factors that strongly affect speech intelligibility; therefore STI is generally more useful. Some other related scales and methods of measuring speech intelligibility include, Common Intelligibility Scale (CIS), Rapid Speech Transmission Index (RaSTI), and Speech Intelligibility Index for Telecommunications (STITEL) and STIPA (STI for Public Address). CIS was simply made popular because some people did not like that you only needed an STI of 0.7 or 70% to get excellent intelligibility. CIS provides nothing new that STI provides, since CIS is directly obtained from STI values by re-scaling the STI values. To derive the CIS value, the same procedures are used to test for STI, the same modulation frequencies and octave bands. However a different weighting is used to derive a single CIS value from the average modulation loss in each octave band. CIS compares to STI as shown in the following diagram.




As shown on the above figure, an STI value of 0.50 will yield a CIS value of 0.70. CIS values are used to comply with the National Fire Alarm Code minimum speech intelligibility rules for a voice alarm system. The appendix of National Fire Alarm Code document NFPA 72 2007 Edition calls for a minimum intelligibility of 0.70 on the CIS scale, which corresponds to an STI of 0.50 (fair – fair to good) for a voice alarm system. A voice alarm system performing at or above either of these values constitutes a compliant system. It should be noted that a CIS of 0.70 is far from perfect intelligibility. It corresponds to about 80% word intelligibility, and about 95% sentence intelligibility, which has been shown to be only slightly higher than what is required to reliably and accurately transmit an emergency message. When testing RaSTI fewer modulation frequencies are used, and fewer octave bands are weighted to derive the single number RaSTI value. While the test is faster it is also less accurate than STI. Another method of testing intelligibility is STITEL, which is similar to both RaSTI and STI. STITEL considers each of the seven octave bands used with STI, but only uses a single, octave specific, modulation frequency for each octave band. There are a number of other methods and tests for speech intelligibility; one can test for speech intelligibility of a male voice, or a female voice. Another test is speech intelligibility for un-amplified voice, or speech intelligibility for non-native language speaking listeners. Although STI was first used in the early 1970’s, it is still evolving today. We at State of the Art Acoustik have developed a great expertise at designing for high STI. We use various modeling software to determine STI before spaces are built or to modify spaces once they are built. This modeling allows us to modify locations and angles of walls and other reflecting surfaces to minimize detrimental sound reflections. We can also select and place loudspeakers to optimize coverage patterns and minimize echoes while optimizing frequency response. Sound absorbing materials can be added in these 3D models to control un-wanted reflections for surfaces that cannot be moved. Background noise levels are also considered and varied to determine the optimum noise level that can be allowed while still providing good speech intelligibility. This is one of the most powerful measures to providing a built space that functions well in terms of speech communication. We have made and designed hundreds of models to optimize STI.