Nonlinear Compensation and Heterogeneous Data Modeling for Robust Speech Recognition

Download or Read eBook Nonlinear Compensation and Heterogeneous Data Modeling for Robust Speech Recognition PDF written by Yong Zhao and published by . This book was released on 2013 with total page pages. Available in PDF, EPUB and Kindle.
Nonlinear Compensation and Heterogeneous Data Modeling for Robust Speech Recognition
Author :
Publisher :
Total Pages :
Release :
ISBN-10 : OCLC:858457534
ISBN-13 :
Rating : 4/5 (34 Downloads)

Book Synopsis Nonlinear Compensation and Heterogeneous Data Modeling for Robust Speech Recognition by : Yong Zhao

Book excerpt: The goal of robust speech recognition is to maintain satisfactory recognition accuracy under mismatched operating conditions. This dissertation addresses the robustness issue from two directions. In the first part of the dissertation, we propose the Gauss-Newton method as a unified approach to estimating noise parameters for use in prevalent nonlinear compensation models, such as vector Taylor series (VTS), data-driven parallel model combination (DPMC), and unscented transform (UT), for noise-robust speech recognition. While iterative estimation of noise means in a generalized EM framework has been widely known, we demonstrate that such approaches are variants of the Gauss-Newton method. Furthermore, we propose a novel noise variance estimation algorithm that is consistent with the Gauss-Newton principle. The formulation of the Gauss-Newton method reduces the noise estimation problem to determining the Jacobians of the corrupted speech parameters. For sampling-based compensations, we present two methods, sample Jacobian average (SJA) and cross-covariance (XCOV), to evaluate these Jacobians. The Gauss-Newton method is closely related to another noise estimation approach, which views the model compensation from a generative perspective, giving rise to an EM-based algorithm analogous to the ML estimation for factor analysis (EM-FA). We demonstrate a close connection between these two approaches: they belong to the family of gradient-based methods except with different convergence rates. Note that the convergence property can be crucial to the noise estimation in many applications where model compensation may have to be frequently carried out in changing noisy environments to retain desired performance. Furthermore, several techniques are explored to further improve the nonlinear compensation approaches. To overcome the demand of the clean speech data for training acoustic models, we integrate nonlinear compensation with adaptive training. We also investigate the fast VTS compensation to improve the noise estimation efficiency, and combine the VTS compensation with acoustic echo cancellation (AEC) to mitigate issues due to interfering background speech. The proposed noise estimation algorithm is evaluated for various compensation models on two tasks. The first is to fit a GMM model to artificially corrupted samples, the second is to perform speech recognition on the Aurora 2 database, and the third is on a speech corpus simulating the meeting of multiple competing speakers. The significant performance improvements confirm the efficacy of the Gauss-Newton method to estimating the noise parameters of the nonlinear compensation models. The second research work is devoted to developing more effective models to take full advantage of heterogeneous speech data, which are typically collected from thousands of speakers in various environments via different transducers. The proposed synchronous HMM, in contrast to the conventional HMMs, introduces an additional layer of substates between the HMM state and the Gaussian component variables. The substates have the capability to register long-span non-phonetic attributes, such as gender, speaker identity, and environmental condition, which are integrally called speech scenes in this study. The hierarchical modeling scheme allows an accurate description of probability distribution of speech units in different speech scenes. To address the data sparsity problem in estimating parameters of multiple speech scene sub-models, a decision-based clustering algorithm is presented to determine the set of speech scenes and to tie the substate parameters, allowing us to achieve an excellent balance between modeling accuracy and robustness. In addition, by exploiting the synchronous relationship among the speech scene sub-models, we propose the multiplex Viterbi algorithm to efficiently decode the synchronous HMM within a search space of the same size as for the standard HMM. The multiplex Viterbi can also be generalized to decode an ensemble of isomorphic HMM sets, a problem often arising in the multi-model systems. The experiments on the Aurora 2 task show that the synchronous HMMs produce a significant improvement in recognition performance over the HMM baseline at the expense of a moderate increase in the memory requirement and computational complexity.


Nonlinear Compensation and Heterogeneous Data Modeling for Robust Speech Recognition Related Books

Nonlinear Compensation and Heterogeneous Data Modeling for Robust Speech Recognition
Language: en
Pages:
Authors: Yong Zhao
Categories: Automatic speech recognition
Type: BOOK - Published: 2013 - Publisher:

DOWNLOAD EBOOK

The goal of robust speech recognition is to maintain satisfactory recognition accuracy under mismatched operating conditions. This dissertation addresses the ro
Robust Speech Recognition of Uncertain or Missing Data
Language: en
Pages: 387
Authors: Dorothea Kolossa
Categories: Technology & Engineering
Type: BOOK - Published: 2011-07-14 - Publisher: Springer Science & Business Media

DOWNLOAD EBOOK

Automatic speech recognition suffers from a lack of robustness with respect to noise, reverberation and interfering speech. The growing field of speech recognit
Compensation for Nonlinear Distortion in Noise for Robust Speech Recognition
Language: en
Pages: 0
Authors: Mark J. Harvilla
Categories:
Type: BOOK - Published: 2014 - Publisher:

DOWNLOAD EBOOK

Model Compensation Methods for Robust Speech Recognition
Language: en
Pages: 66
Authors: Stephen Mingyu Chu
Categories:
Type: BOOK - Published: 1999 - Publisher:

DOWNLOAD EBOOK

New Era for Robust Speech Recognition
Language: en
Pages: 433
Authors: Shinji Watanabe
Categories: Computers
Type: BOOK - Published: 2017-10-30 - Publisher: Springer

DOWNLOAD EBOOK

This book covers the state-of-the-art in deep neural-network-based methods for noise robustness in distant speech recognition applications. It provides insights