
In June 2008 Audience Corp of the US began full-scale shipment of the A1010 voice-processing integrated circuit (IC) for mobile phone handsets in Japan. It has already been used in the SH705iII, manufactured by Sharp Corp of Japan for NTT DoCoMo, Inc of Japan.
The A1010 is capable
of separating the speaker's voice out of environmental noise, and
transmitting only that voice to the other party, thanks to an IC with a
noise suppression function (see Fig). Made with 130nm-rule
complementary metal-oxide semiconductor (CMOS) manufacturing
technology, the chip has a 2.7 x 3.5mm footprint. Dissipation is 25mW
max in operation, and it has a processing delay of no more than 20ms.
Audience developed an IC that reproduces the information processing system of human hearing. More specifically, it simulated the portion from the cochlea through the brain cortex. The technology is based on the work of Audience chairman and chief technology officer (CTO) Lloyd Watts, who presented it in his doctoral thesis at the California Institute of Technology in the US.
The voice signals from two microphones are first subjected to a fast cochlea transformation (FCT), which breaks them down according to frequency, timing and other factors. The cochlea is the spiral organ in the human inner ear that converts acoustic signals into electrical signals. According to Audience president and chief executive officer (CEO) Peter Santos, "FCT duplicates the conversion characteristics of the cochlea. Its output is logarithmic, matching human hearing."
After FCT, data is characterized according to sound frequency; sound source distance, angle and other spatial location information; start time, etc. There are other technologies that can isolate a sound source using frequency, spatial position, etc, but most of them depend on a single cue. "We combine multiple cues, so even if one is insufficient to produce the right answer, we have others to look at, too. It's the same way hearing works," explained Santos. In human beings this information processing is handled in the brainstem, he added.
The next processing step duplicates what happens in the cortex. Signals are grouped depending on various characteristics, and identified as representing the sound of a passing train, a whistle, music, or whatever. Extraneous sounds are eliminated, and then the voice data is returned to an acoustic signal via inverse FCT.
Santos added: "Noise suppression performance is 25dB. In addition to continuous noise, it can also handle instantaneous noises like horns with a delay of 500ms or less." He explained that the company adopted an information processing system similar to human hearing because "... biological systems are well designed and robust, plus they are highly expansible."
by Tetsuo Nozawa