Speaker recognition

Differences in how different people’s voices sound result from a combination of physiological differences in the shape of vocal tracts and learned speaking habits. Speaker recognition technology uses these differences to discriminate between speakers.

During enrollment, speaker recognition systems capture samples of a person’s speech by having him or her speak some predetermined information into a microphone or telephone a number of times. This information, known as a passphrase, can be a piece of information such as a name, birth month, birth city, or favorite color or a sequence of numbers. Text independent systems are also available that recognize a speaker without using a predefined phrase.

This phrase is converted from analog to digital format, and the distinctive vocal characteristics, such as pitch, cadence, and tone, are extracted, and a speaker model is established. A template is then generated and stored for future comparisons. Voice templates are much larger than templates generated from other biometric technologies, typically 10,000 to 20,000 bytes.

Speaker recognition can be used to verify a person’s claimed identity or to identify a particular person. It is often where voice is the only available biometric identifier, such as telephone and call centers.