The Technologies Involved in Speech Recognition Chips

PublishTime： 2022-02-12 Article Source：POROSVOC

Speech recognition chips are also called speech recognition ICs. Compared with traditional voice chips, the biggest feature of voice recognition chips is that they can recognize voices. It allows machines to understand human voices and perform various actions on command, such as blinking, opening a mouth (smart doll). In addition, the speech recognition chip also has high-quality, high-compression recording and playback functions, enabling man-machine dialogue.This post POROSVOC will introduce the technologies involved in speech recognition chips.

Fig.1

The technologies involved in speech recognition chips include signal processing, pattern recognition, probability theory, information theory, sound mechanism, auditory mechanism, artificial intelligence, etc.

According to the user's restrictions, the speech recognition chip can be divided into a specific person's speech recognition chip and a non-specific person's speech recognition chip.

specific person speech recognition

The specific person speech recognition chip is used for the specific person's speech recognition. If no other person can be recognized, the user's speech reference sample must be stored in the database as a comparison database, that is, the speech recognition of a specific person must be trained on speech before use, usually following the machine prompt to train the speech input twice to use it.

Human-Independent Speech Recognition

Human-independent speech recognition is a recognition technology that does not need to target a specific person regardless of age or gender, as long as the same language is used. The application pattern was to collect about 200 people based on a dozen or so voice interaction items identified before the product was finalized. The voice samples of the PC are processed by the PC algorithm to obtain the voice model and feature database of the interactive entry and then burned into the chip. Machines using this chip (smart dolls, electronic pets, children's computers) have interactive capabilities.

Some non-human speech recognition applications are based on phoneme algorithms. In this mode, interactive recognition can be performed without collecting many people's speech samples, but the disadvantage is that the recognition rate is not high and the recognition performance is unstable.

According to the continuity of speaking mode, speech recognition chips can be divided into discontinuous speech recognition and continuous speech recognition.

Intermittent speech recognition

For discontinuous speech, each spoken word must be identified separately, and a pause is required after each word is spoken.

Continuous speech recognition

Continuous speech recognition can perform human-like speech recognition in a generally natural and fluent way of speaking, but it is difficult to achieve good recognition results due to the problem of connecting voices.

+1 Like
Add to Favorites

Recommend

Technical Resources

More>

New Products & Solutions

More>

This document is provided by Sekorm Platform for VIP exclusive service. The copyright is owned by Sekorm. Without authorization, any medias, websites or individual are not allowed to reprint. When authorizing the reprint, the link of www.sekorm.com must be indicated.

Integrated Circuits