Wednesday, 3 July 2013

Simple Speech Recognition System Using LPC

Theory:

Linear Predictive Coding:

Linear prediction means  predicting the next sample using the previous samples.Theory is given below.


w(0),w(1)...w(p-1) are called the LPC coefficients.


As we can see the LPC coefficients captures almost all formant frequencies.So we can effectively represent the voice samples through a lesser number of coefficients.
These coefficients are used train the neural network to classify the voice signals.

Neural Network:



It is a net like structure which takes the features x1,x2.....xn to classify it as one of the outputs.Theory is given below.







































Implementation:

1.Record the voice samples for 1 second with the sampling rate of 8000.So we will get 8000 samples.
2.Noise removal:

3.After removing the noise by keeping the threshold
number of samples will be reduced greatly.
It is passed through high pass filter to boost the high frequency components,since the high frequency components are more susceptible for noise.
4.Frame blocking:
Array of voice samples are divided into many number of overlapping blocks.This is done because over the small length of the block we can assume the process as WSS process.
We have taken the block length as 240 samples and the overlapping as 80 samples.So the first block will be from 0 to 239 and the second block will start from the index 160 i.e (240-80).
5.Hamming window:
Each block is multiplied by the hamming window.This is done because if we consider the block directly for the analysis there will be sudden discontinuity at the ends.So to remove the sudden discontinuity the ends are tapered using the hamming window.

So the ends will be attenuated in the process.That is why we consider the overlapping windows in order to accommodate all the information correctly.
 6.Auto correlation Matrix Formation:
The matrix is formed as explained in the theory.


 LPC coefficients are obtained for the each block.

These coefficients obtained are used to train the neural network.To get accurate result many samples are required to train it.
In our examples we tell the numbers from 0 to 9 and the system should recognize it and display the number.
We trained 15 samples of each 0 to 9 in a constant acoustic and   could achieve an accuracy of 90 % for a speaker.


Matlab files:

MATLAB LPC.rar



Steps to be followed after downloading matlab files:

1.Open normalizedlpc.m file and run it.
2.It will ask you to record your voice.Record a number after pressing enter.
3.Give the corresponding number when it ask to enter the number.
4.When you tell 'zero',instead of giving the number as 0,give it as 10.(because the program uses the entered number as indices and the matlab does not contain 0th index)
5.After repeating from 0 to 9,again start from 0,it will ask u to record 30 samples.
6.Each number should repeat for thrice in 30 samples,their order does not matter.
7.The LPC coefficients will be stored in variable X and the numbers will be in Y.These two variables will be stored in lpcdata,mat file.
8.Load the normlpcdatabase.mat file, copy X and Y to K and L respectively.Save these variable back to the normlpcdatabase.mat file.
9.Again run  normalizedlpc.m file to collect next 30 training samples,accumulate X and Y to K and L to creat  
large database.
10.After collecting 150 to 180 training samples run lpcnuralnet.m file.It will train the neural network using back propagation algorithm.
11.Finally run the voicepredict.m file to check the prediction.
12.By increasing the training samples,and doing it in different acoustics the prediction accuracy can be increased.

This program can be easily altered to recognize other words also.
All kind of suggestions are welcome.