The Use of Deep Learning in Speech Enhancement

Deep learning is an emerging area in current scenario. Mostly, Convolutional Neural Network (CNN) and Deep Belief Network (DBN) are used as the model in deep learning. It is termed as Deep Neural Network (DNN). The use of DNN is widely spread in many applications, exclusively for detection and classification purpose. In this paper, authors have used the same network for signal enhancement purpose. Speech is considered for the input signal with noise. The model of DNN is used with two layers. It has been compared with the ADALINE model to prove its efficacy.


I. INTRODUCTION
PEECH enhancement in noisy conditions is always a fascinating and challenging task for speech recognition, mobile communications, teleconferencing systems, hearing aids design etc.The objective of speech enhancement is to reduce the noise as well as to increase the SNR of the noisy speech signals in adverse environment.From the several decades, researchers have focused more attention in this area.But the results are not always satisfactory in terms of quality and intelligibility [1].

S
Speech signals are nonstationary in nature.Adaptive filters perform better in real time environment.Many adaptive algorithms are designed, such as Least Mean Squares (LMS), Recursive Least Squares (RLS), Normalized LMS (NLMS) and different variations in LMS.The authors compared LMS and RLS with the State Space Recursive Least Squares (SSRLS) algorithm.The improvement in SNR of the proposed algorithm is much better than the existing algorithms [2] [3].Spectral subtraction (SS) algorithm suppresses background noise and proves better for stationary noise.S.Vihari et.al.proposed a noise estimation algorithm based on the Decision Directed approach.The Wiener filter and the SS algorithms are tested for nonstationary noise and outperform better [4].
In this digital world, machine learning approaches are more demanding day to day.Earlier Adaptive Linear Neuron Network (ADALINE) is designed as the single layer neural network.It is based on the principle of Multilayer Perceptron (MLP) [5].The network consisting of the activation function and the function's output is utilized for adapting the weights.Generally Fourier Transform (FT) is used for extracting the features as the magnitude and phase and passed to the ADALINE for training.The Discrete Cosine Transform (DCT) and the Fractional DCT (FrDCT) coefficients are extracted from the noisy speech signal and ADA-LINE trains these features.The better enhanced signal is obtained in terms of SNR and PESQ for FrDCT ADALINE [6].Artificial Neural Network (ANN), Convolutional Neural Network (CNN) are also designed for speech enhancement.An overview of the Neural Network is proposed in [7] [8].
Understanding of speech is difficult in noisy environment.To improve the quality and intelligibility of the speech signal, neural network based speech enhnacement is proposed in [9].To acquire the high SNR, the time-frequency bins are decomposed and extracted.These features are fed to the network for better accuracy.Yong Xu et.al.proposed a regression based Deep Neural Network (DNN) for speech enhancement.A mapping function is calculated between the noisy features and the clean features.Different hidden layers are considered for SNR measurement [10] [11].An improved LMS adaptive filter combines with the DNN for speech enhancement.The adaptive filter coefficients are estimated by the Deep belief Network (DBN) and the enhanced speech is prevailed through ILMSAF [12].Reinforcement learning can be used for optimization of the large set of DNN training sets.The cochlear implant is designed based on the application of the DNN used for speech enhancement [13] [14].Ram et.al.performed the enhancement the speech signals through the DNN with the hidden layers three [16].Audio and Visual enhancement can also be achieved through the DNN [17].
The rest of the work is organized as follows: speech enhancement using ADALINE is explained in Section 2. Section 3 presents the speech enhancement using DNN considering two hidden layers: DNN_1 and DNN_2.Comparison results of DNN and ADALINE for speech enhancement are presented in Section 4 and Section 5 concludes the work.

III. SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS
DNN is based on the supervised learning and determines the mapping from the noisy features to clean features.The structure of the DNN is presented in Fig. 2 and is divided into 2 phases: training phase and testing phase.The hidden layers employ as the activation function. Inverse FFT and Overlap Add method are implemented to reconstruct the speech signal.
Network pretraining, regularization are employed to make the system better.

IV. EXPERIMENTAL RESULTS
In this work, 10 sentences from both male and female speakers are considered for training.A total of 100 sentences have been collected with different types of noise signals as mentioned earlier.All utterances are sampled with a sampling rate of 8 KHz.Fig. 4 shows the spectrogram of the clean speech signal.Fig. 5 is the noisy signal (Babble noise of SNR 10dB).The clean signal is considered as the target signal.The noisy signal is applied to the ADALINE and DNN.The enhanced signal of the ADALINE obtained as the error signal as shown in Fig. 6.Table I shows the SNR as well as the improvement of SNR of ADALINE and DNN.The maximum SNR improvement is 2.87 dB achieved in DNN_2.Table II shows the PESQ measures of the different noise levels.DNN_2 provides a maximum PESQ of for 15dB of Babble noise.When the number of the better enhanced signal is obtained in the DNN.

V. CONCLUSION
ADALINE and DNN are used to enhance the noisy speech signal in this work.ADALINE is considered as the basic Neural Network implemented for speech enhancement.The DNN is used for different hidden layers that can prove the validity of speech enhancement in the field of data mining.The better performance result is obtained using ADALINE model, whereas the DNN model outperforms the ADALINE.Though the time consumption is more in DNN, speech enhancement is better.In the future, the weights of the ADALINE model can be varied and other transforms can be applied to extract the features and observe the performance.
USING ADALINE ADALINE is a simple neural network used for noise cancellation and is based on the principle of the LMS The Use of Deep Learning in Speech Enhancement algorithm.Fig.1 represents the block diagram of the ADALINE used for speech enhancement.

Fig. 1 .
Fig.1.Speech Enhancement using ADALINEThis adaptive network is consisting of a single neuron with connected weights and a single bias.Because of nonstationary nature of speech signals, it is divided into overlapping frames before processing.To avoid spectral leakage, hamming window of length 256 is multiplied to all the overlapping frames .These overlapping windowed frames are processed in the network as inputs for enhancement.To obtain the output of each instant of the speech signal, the each set of weights and biases are calculated.The input layers 12 , ,..., m x x x are connected to the output y by interconnecting the weights 12 , ,..., m w w w and bias b .The following steps are followed for speech enhancement using ADALINE. Set the weights{ ( )} wm at 0.25 and biases { ( )} bm at 0.825 experimentally. Se the learning rate parameter () l as 0.5. Consider the clean signal as the target signal{ ( )} tm . Set the noisy signal as the input signal{ ( )} xm . For each time index m , the output signal{ ( )} ym and the error{ ( )} am can be calculated as ( ) ( )* ( ) ( ) y m w m x m b m   ( ) ( ) ( ) a m t m y m   weights and biases of the network are adapted as ( ) ( ) { ( ) ( )} new w m w m l a m x m    ( ) ( ) { ( )} new b m b m l a m    All parameters are set experimentally for proper adaptation of the network.The weights and biases are adjusted to attain the desired signal.The enhanced signal is achieved as the error signal by the ADALINE.

Fig. 2 .
Fig.2.Structure of Deep Neural Network In this work, two hidden layers are considered and sigmoid function is considered as the activation function for the output.To learn the DNN of noisy log spectra, the multiple restricted Boltzmann machines (RBMs) are arranged [15].The NOIZEOUS database is taken from the softcopy of Loizou.Babble Noise, Train Noise, Airport Noise and Restaurant Noise of SNR 0dB, 5dB, 10dB and 15dB are considered for training and Drilling Noise, Street Noise are considered for testing.Total 100 speech samples of noisy as well as clean features are acquired in the training phase.Two hidden layers are considered with 512 hidden units each and 8 output units.Total 512*2=1024 hidden units are trained for noisy speech features.

Fig. 3 .
Fig.3.Speech Enhancement using Deep Neural Network For speech enhancement, the noisy sentence is divided into overlapping frames.Hamming window of length 512 is multiplied to the framed signal to avoid signal distortion.The proposed DNN based speech enhancement method is represented in Fig.3.To extract the magnitude and phase spectra, Fourier Transform is employed.Only the magnitude

TABLE III PESQ
SCORE OF BABBLE NOISE WITH DIFFERENT NOISE LEVELS OF