|
|
1 | (8) |
|
Jacob Benesty, Shoji Makino, Jingdong Chen |
|
|
|
|
1 | (2) |
|
1.2 Challenges and Opportunities |
|
|
3 | (1) |
|
1.3 Organization of the Book |
|
|
4 | (3) |
|
|
7 | (1) |
|
|
8 | (1) |
|
2 Study of the Wiener Filter for Noise Reduction |
|
|
9 | (34) |
|
Jacob Benesty, Jingdong Chen, Yiteng (Arden) Huang, Simon Doclo |
|
|
|
|
9 | (2) |
|
2.2 Estimation of the Clean Speech Samples |
|
|
11 | (2) |
|
2.3 Estimation of the Noise Samples |
|
|
13 | (1) |
|
2.4 Important Relationships Between Noise Reduction and Speech Distortion |
|
|
14 | (7) |
|
2.5 Particular Case: White Gaussian Noise |
|
|
21 | (2) |
|
2.6 Better Ways to Manage Noise Reduction and Speech Distortion |
|
|
23 | (6) |
|
2.6.1 A Suboptimal Filter |
|
|
23 | (3) |
|
2.6.2 Noise Reduction Exploiting the Speech Model |
|
|
26 | (1) |
|
2.6.3 Noise Reduction with Multiple Microphones |
|
|
27 | (2) |
|
2.7 Simulation Experiments |
|
|
29 | (8) |
|
|
37 | (1) |
|
|
38 | (5) |
|
3 Statistical Methods for the Enhancement of Noisy Speech |
|
|
43 | (24) |
|
|
|
|
43 | (1) |
|
|
44 | (1) |
|
3.3 The Wiener Filter and its Implementation |
|
|
45 | (6) |
|
3.4 Estimation of Spectral Amplitudes |
|
|
51 | (3) |
|
|
51 | (2) |
|
3.4.2 Maximum Likelihood and MAP Estimation |
|
|
53 | (1) |
|
3.5 MMSE Estimation Using Super-Gaussian Speech Models |
|
|
54 | (4) |
|
3.6 Background Noise Power Estimation |
|
|
58 | (2) |
|
3.6.1 Minimum Statistics Noise Power Estimation |
|
|
58 | (2) |
|
3.7 The MELPe Speech Coder |
|
|
60 | (2) |
|
|
62 | (1) |
|
|
63 | (4) |
|
4 Single- and Multi-Microphone Spectral Amplitude Estimation Using a Super-Gaussian Speech Model |
|
|
67 | (30) |
|
|
|
|
67 | (1) |
|
4.2 Single-Channel Statistical Filter |
|
|
68 | (15) |
|
|
70 | (8) |
|
|
78 | (5) |
|
4.3 Multichannel Statistical Filter |
|
|
83 | (5) |
|
4.3.1 Joint Statistical Model |
|
|
84 | (2) |
|
4.3.2 Multichannel MAP Spectral Amplitude Estimation |
|
|
86 | (2) |
|
|
88 | (5) |
|
|
93 | (1) |
|
|
93 | (4) |
|
5 From Volatility Modeling of Financial Time-Series to Stochastic Modeling and Enhancement of Speech Signals |
|
|
97 | (18) |
|
|
|
|
97 | (2) |
|
|
99 | (2) |
|
|
101 | (3) |
|
5.4 Statistical Model for Speech Signals |
|
|
104 | (1) |
|
|
105 | (1) |
|
|
106 | (2) |
|
|
108 | (3) |
|
|
111 | (4) |
|
6 Single-Microphone Noise Suppression for 3G Handsets Based on Weighted Noise Estimation |
|
|
115 | (20) |
|
Akihiko Sugiyama, Masanori Kato, Masahiro Serizawa |
|
|
|
|
115 | (2) |
|
6.2 Conventional Noise Suppression Algorithm |
|
|
117 | (3) |
|
|
117 | (2) |
|
6.2.2 Problem in Noise Estimation |
|
|
119 | (1) |
|
6.3 New Noise Suppression Algorithm |
|
|
120 | (4) |
|
6.3.1 Weighted Noise Estimation |
|
|
121 | (2) |
|
6.3.2 Spectral Gain Modification |
|
|
123 | (1) |
|
6.3.3 Computational Requirements |
|
|
123 | (1) |
|
|
124 | (7) |
|
6.4.1 Objective Evaluation for Noise Estimation |
|
|
125 | (2) |
|
6.4.2 Subjective Evaluation |
|
|
127 | (4) |
|
|
131 | (1) |
|
|
131 | (4) |
|
7 Signal Subspace Techniques for Speech Enhancement |
|
|
135 | (26) |
|
Firas Jabloun, Benoit Champagne |
|
|
|
|
135 | (2) |
|
7.2 Signal and Noise Models |
|
|
137 | (1) |
|
7.3 Linear Signal Estimation |
|
|
138 | (5) |
|
7.3.1 Least-Squares Estimator |
|
|
139 | (1) |
|
7.3.2 The Linear Minimum Mean Squared Error Estimator |
|
|
139 | (1) |
|
7.3.3 The Time-Domain Constrained Estimator |
|
|
140 | (1) |
|
7.3.4 The Spectral-Domain Constrained Estimator |
|
|
141 | (2) |
|
7.4 Handling Colored Noise |
|
|
143 | (3) |
|
|
143 | (1) |
|
7.4.2 The Generalized Eigenvalue Decomposition Method |
|
|
144 | (1) |
|
7.4.3 The Rayleigh Quotient Method |
|
|
145 | (1) |
|
7.5 A Filterbank Interpretation |
|
|
146 | (2) |
|
7.5.1 The Frequency to Eigendomain Transformation |
|
|
146 | (1) |
|
7.5.2 The Eigen Filterbank |
|
|
146 | (2) |
|
7.6 Implementation Issues |
|
|
148 | (4) |
|
7.6.1 Estimating the Covariance Matrix |
|
|
149 | (1) |
|
|
150 | (2) |
|
7.7 Fast Subspace Estimation Techniques |
|
|
152 | (3) |
|
7.7.1 Fast Eigenvalue Decomposition Methods |
|
|
153 | (1) |
|
7.7.2 Subspace Tracking Methods |
|
|
153 | (1) |
|
7.7.3 The Frame Based EVD (FBEVD) Method |
|
|
154 | (1) |
|
7.8 Some Recent Developments |
|
|
155 | (2) |
|
|
155 | (1) |
|
7.8.2 Multi-Microphone Systems |
|
|
156 | (1) |
|
|
156 | (1) |
|
|
157 | (1) |
|
|
157 | (4) |
|
8 Speech Enhancement: Application of the Kalman Filter in the Estimate-Maximize (EM) Framework |
|
|
161 | (38) |
|
|
|
|
161 | (5) |
|
|
166 | (2) |
|
|
168 | (4) |
|
8.3.1 State Estimation (E-Step) |
|
|
169 | (1) |
|
8.3.2 Parameter Estimation (M-Step) |
|
|
170 | (1) |
|
|
171 | (1) |
|
|
171 | (1) |
|
8.4 Parameter Estimation Using Higher-Order Statistics |
|
|
172 | (2) |
|
8.5 Gradient-Based Sequential Algorithm |
|
|
174 | (1) |
|
8.6 All-Kalman Speech and Parameter Estimation |
|
|
175 | (6) |
|
|
176 | (2) |
|
|
178 | (3) |
|
|
181 | (8) |
|
|
181 | (1) |
|
8.7.2 Verifying the Gaussian Assumption |
|
|
182 | (1) |
|
8.7.3 Objective Evaluation |
|
|
183 | (3) |
|
8.7.4 Subjective Evaluation |
|
|
186 | (2) |
|
8.7.5 Comparison Between EM-Based Algorithms |
|
|
188 | (1) |
|
8.7.6 Evaluation of the UKF |
|
|
188 | (1) |
|
|
189 | (6) |
|
|
195 | (4) |
|
9 Speech Distortion Weighted Multichannel Wiener Filtering Techniques for Noise Reduction |
|
|
199 | (30) |
|
Simon Doclo, Ann Spriet, Jan Wouters, Marc Moonen |
|
|
|
|
199 | (2) |
|
9.2 GSC and Spatially Pre-Processed SDW-MWF |
|
|
201 | (6) |
|
9.2.1 Notation and General Structure |
|
|
201 | (3) |
|
9.2.2 Generalized Sidelobe Canceller |
|
|
204 | (1) |
|
9.2.3 Speech Distortion Weighted Multichannel Wiener Filter |
|
|
205 | (2) |
|
9.3 Frequency-Domain Criterion for SDW-MWF |
|
|
207 | (6) |
|
9.3.1 Frequency-Domain Notation |
|
|
207 | (1) |
|
|
208 | (2) |
|
|
210 | (2) |
|
9.3.4 Practical Implementation |
|
|
212 | (1) |
|
9.4 Approximations for Reducing the Complexity |
|
|
213 | (6) |
|
9.4.1 Block-Diagonal Correlation Matrices |
|
|
213 | (3) |
|
9.4.2 Diagonal Correlation Matrices |
|
|
216 | (1) |
|
9.4.3 Unconstrained Algorithms |
|
|
217 | (1) |
|
|
218 | (1) |
|
9.5 Experimental Results, |
|
|
219 | (5) |
|
9.5.1 Setup and Performance Measures |
|
|
219 | (1) |
|
9.5.2 SNR Improvement and Robustness Against Microphone Mismatch |
|
|
220 | (3) |
|
9.5.3 Tracking Performance |
|
|
223 | (1) |
|
|
224 | (1) |
|
|
225 | (4) |
10 Adaptive Microphone Arrays Employing Spatial Quadratic Soft Constraints and Spectral Shaping |
|
229 | (18) |
|
Sven Nordholm, Hai Quang Dam, Nedelko Grbic, Siow Yong Low |
|
|
|
|
229 | (2) |
|
10.2 Signal Modelling and Problem Formulation |
|
|
231 | (4) |
|
10.2.1 Analysis and Synthesis Filterbanks |
|
|
232 | (1) |
|
10.2.2 The Wiener Solution |
|
|
233 | (1) |
|
10.2.3 The Space Constrained Source Covariance Information |
|
|
234 | (1) |
|
10.3 Robust Soft Constrained Adaptive Microphone Array (RSCAMA) |
|
|
235 | (3) |
|
10.3.1 Problem Formulation |
|
|
235 | (2) |
|
10.3.2 A Recursive Algorithm for the RSCAMA |
|
|
237 | (1) |
|
10.4 Noise Statistics Updated Adaptive Microphone Array (NSUAMA) |
|
|
238 | (4) |
|
10.4.1 Problem Formulation |
|
|
238 | (1) |
|
10.4.2 The Noise Covariance Detector |
|
|
238 | (2) |
|
10.4.3 Estimation of Power Spectrum of SOI |
|
|
240 | (1) |
|
10.4.4 The NSUAMA Algorithm |
|
|
241 | (1) |
|
|
242 | (3) |
|
10.5.1 The Simulation Scenario |
|
|
242 | (1) |
|
10.5.2 Results for RSCAMA and NSUAMA Beamformers |
|
|
242 | (3) |
|
|
245 | (1) |
|
|
245 | (2) |
11 Single-Microphone Blind Dereverberation |
|
247 | (24) |
|
Tomohiro Nakatani, Masato Miyoshi, Keisuke Kinoshita |
|
|
|
|
247 | (2) |
|
11.2 Overview of Existing Approaches |
|
|
249 | (2) |
|
11.2.1 Blind Inverse Filtering |
|
|
249 | (1) |
|
11.2.2 Dereverberation Based on Speech Signal Features |
|
|
250 | (1) |
|
11.3 Harmonicity of Speech Signals and Its Robust Estimation |
|
|
251 | (4) |
|
11.3.1 Model of Speech Harmonicity |
|
|
251 | (1) |
|
11.3.2 Adaptive Harmonic Filtering |
|
|
252 | (1) |
|
11.3.3 Robust F0 Estimation and Voicing Detection |
|
|
253 | (2) |
|
11.4 Harmonicity Based Dereverberation - HERB |
|
|
255 | (5) |
|
|
255 | (1) |
|
11.4.2 Model of Reverberant Speech Signal |
|
|
256 | (2) |
|
11.4.3 Dereverberation Filter |
|
|
258 | (1) |
|
11.4.4 Interpretation of the Dereverberation Filter |
|
|
258 | (2) |
|
11.5 Implementation of a Prototype System |
|
|
260 | (2) |
|
11.5.1 Dereverberation Filter Calculation |
|
|
261 | (1) |
|
11.5.2 Heuristics Improving Accuracy of Fo Estimation and Voicing Decisions with Reverberation |
|
|
261 | (1) |
|
11.6 Simulation Experiments |
|
|
262 | (3) |
|
11.6.1 Task: Dereverberation of Word Utterances |
|
|
262 | (1) |
|
11.6.2 Energy Decay Curves of Impulse Responses |
|
|
262 | (1) |
|
11.6.3 Speaker Dependent Word Recognition Rate |
|
|
263 | (2) |
|
|
265 | (3) |
|
11.7.1 Theoretical Extension of HERB |
|
|
266 | (1) |
|
11.7.2 Accuracy Improvement of Speech Model |
|
|
266 | (2) |
|
11.7.3 Reduction of Training Data Size |
|
|
268 | (1) |
|
|
268 | (1) |
|
|
269 | (2) |
12 Separation and Dereverberation of Speech Signals with Multiple Microphones |
|
271 | (28) |
|
Yiteng (Arden) Huang, Jacob Benesty, Jingdong Chen |
|
|
|
|
271 | (3) |
|
12.2 Signal Model and Problem Formulation |
|
|
274 | (2) |
|
12.3 Blind Identification of a SIMO System |
|
|
276 | (3) |
|
12.4 Separating Reverberant Speech and Concurrent Interference |
|
|
279 | (5) |
|
12.4.1 Example: Removing Interference Signals in a 2 x 3 MIMO Acoustic System |
|
|
279 | (2) |
|
|
281 | (3) |
|
12.5 Speech Dereverberation |
|
|
284 | (3) |
|
|
284 | (2) |
|
12.5.2 The Least-Squares Implementation |
|
|
286 | (1) |
|
|
287 | (9) |
|
12.6.1 Performance Measures |
|
|
287 | (1) |
|
12.6.2 Experimental Setup |
|
|
288 | (2) |
|
12.6.3 Experimental Results |
|
|
290 | (6) |
|
|
296 | (1) |
|
|
297 | (2) |
13 Frequency-Domain Blind Source Separation |
|
299 | (30) |
|
Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino |
|
|
|
|
299 | (2) |
|
13.2 BSS for Convolutive Mixtures |
|
|
301 | (1) |
|
13.3 Overview of Frequency-Domain Approach |
|
|
302 | (2) |
|
|
304 | (2) |
|
|
306 | (5) |
|
13.5.1 Basic Theory for Nearfield Model |
|
|
307 | (1) |
|
13.5.2 DOA Estimation with Farfield Model |
|
|
308 | (3) |
|
13.6 Permutation Alignment |
|
|
311 | (4) |
|
13.6.1 Localization Approach |
|
|
312 | (1) |
|
13.6.2 Correlation Approach |
|
|
312 | (2) |
|
|
314 | (1) |
|
|
315 | (2) |
|
|
317 | (3) |
|
|
318 | (1) |
|
13.8.2 Minimizing Error by Adjusting Scaling Ambiguity |
|
|
319 | (1) |
|
13.9 Experimental Results |
|
|
320 | (4) |
|
|
320 | (2) |
|
|
322 | (2) |
|
|
324 | (1) |
|
|
324 | (5) |
14 Subband Based Blind Source Separation |
|
329 | (24) |
|
Shoko Araki, Shoji Makino |
|
|
|
|
329 | (2) |
|
14.2 BSS of Convolutive Mixtures |
|
|
331 | (2) |
|
|
331 | (1) |
|
14.2.2 Frequency-Domain BSS and Related Issue |
|
|
332 | (1) |
|
|
333 | (6) |
|
14.3.1 Configuration of Subband BSS |
|
|
333 | (3) |
|
14.3.2 Time-Domain BSS Implementation for a Separation Stage |
|
|
336 | (1) |
|
14.3.3 Solving the Permutation and Scaling Problems |
|
|
337 | (2) |
|
14.4 Basic Experiments for Subband BSS |
|
|
339 | (5) |
|
14.4.1 Experimental Setup |
|
|
339 | (1) |
|
|
340 | (1) |
|
14.4.3 Conventional Frequency-Domain BSS |
|
|
340 | (1) |
|
14.4.4 Conventional Fullband Time-Domain BSS |
|
|
341 | (1) |
|
|
341 | (2) |
|
|
343 | (1) |
|
14.5 Frequency-Appropriate Processing for Further Improvement |
|
|
344 | (5) |
|
14.5.1 Longer Separation Filters in Low Frequency Bands |
|
|
345 | (1) |
|
14.5.2 Overlap-Blockshift in Low Frequency Bands |
|
|
346 | (1) |
|
|
347 | (2) |
|
|
349 | (1) |
|
|
350 | (3) |
15 Real-Time Blind Source Separation for Moving Speech Signals |
|
353 | (18) |
|
Ryo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino |
|
|
|
|
353 | (2) |
|
15.2 ICA Based BSS of Convolutive Mixtures |
|
|
355 | (3) |
|
15.2.1 Frequency-Domain ICA |
|
|
355 | (1) |
|
15.2.2 Permutation and Scaling Problems |
|
|
356 | (1) |
|
15.2.3 Low Delay Blockwise Batch Algorithm |
|
|
357 | (1) |
|
15.3 Residual Crosstalk Cancellation |
|
|
358 | (4) |
|
15.3.1 Straight and Crosstalk Components of BSS |
|
|
358 | (1) |
|
15.3.2 Model of Residual Crosstalk Component Estimation |
|
|
359 | (1) |
|
15.3.3 Adaptive Algorithm and Spectrum Estimation |
|
|
360 | (2) |
|
15.4 Experiments and Discussions |
|
|
362 | (5) |
|
15.4.1 Experimental Conditions |
|
|
362 | (2) |
|
15.4.2 Performance for Fixed Sources |
|
|
364 | (1) |
|
15.4.3 Moving Target and Moving Interference |
|
|
365 | (1) |
|
15.4.4 Performance of Blockwise Batch Algorithm with Postprocessing |
|
|
366 | (1) |
|
15.4.5 Performance of Online Algorithm |
|
|
367 | (1) |
|
|
367 | (1) |
|
|
368 | (3) |
16 Separation of Speech by Computational Auditory Scene Analysis |
|
371 | (32) |
|
Guy J. Brown, DeLiang Wang |
|
|
|
|
371 | (1) |
|
16.2 Auditory Scene Analysis |
|
|
372 | (1) |
|
16.3 Computational Auditory Scene Analysis |
|
|
373 | (18) |
|
16.3.1 Peripheral Auditory Processing and Feature Extraction |
|
|
375 | (1) |
|
16.3.2 Monaural Approaches |
|
|
376 | (6) |
|
16.3.3 Binaural Approaches |
|
|
382 | (5) |
|
16.3.4 Frameworks for Cue Integration |
|
|
387 | (4) |
|
16.4 Integrating CASA with Speech Recognition |
|
|
391 | (3) |
|
16.5 CASA Compared to ICA |
|
|
394 | (1) |
|
|
395 | (3) |
|
|
398 | (1) |
|
|
398 | (5) |
Index |
|
403 | |