Training Methods for Deep Neural Network-Based Acoustic Models in Speech Recognition

Grósz Tamás
Training Methods for Deep Neural Network-Based Acoustic Models in Speech Recognition.
PhD, University of Szeged.
(2018)

[img]
Preview
PDF (disszertáció)
Download (1MB) | Preview
[img]
Preview
PDF (tézis)
Download (340kB) | Preview
[img]
Preview
PDF (tézis)
Download (375kB) | Preview

Abstract in foreign language

Nowadays, speech recognition technology is built on Deep Neural Networks. These networks represents the latest direction of machine learning. They are based on the theory of artificial neural networks, which have been used for decades. However, unlike traditional Neural Networks, all deep networks contain many processing layers, which allow the hierarchical processing of the input data. While the concept of deep networks is not totally new, their efficient training required several new achievements. These new networks managed to completely replace the Gaussian Mixture Models in the state-of-the-art speech recognition systems. In this study, we decided to focus on Deep Neural Network-based recognition systems. First, we compared the performance of several new training algorithms with each other, in order to determine the best one for later use. Then, we turned my attention to the algorithms that the new speech recognition systems have inherited from the previous Gaussian Mixture Model-based approaches, as the algorithms might not be optimal for Deep Neural Networks. we proposed new algorithms for obtaining the initial alignment of the frame-level state labels and the creation of context-dependent states, and found that they are better suited for the new acoustic models. Lastly, we also experimented with a data re-sampling method to improve the accuracy of the models.

Item Type: Thesis (PhD)
Creators: Grósz Tamás
Hungarian title label: Tanítási módszerek mély neuronhálós akusztikus modellekhez beszédfelismerésben
Title of the thesis in foreign language: Training Methods for Deep Neural Network-Based Acoustic Models in Speech Recognition
Divisions: Doctoral School Informatics
Discipline label: műszaki tudományok > informatikai tudományok
Defence date label: 2018. October 05.
Item ID: 4225
Identification Number: 30616981
doi: https://doi.org/10.14232/phd.4225
Date Deposited: 2018. Mar. 09. 08:32
Last Modified: 2019. Apr. 01. 10:18
Depository no.: B 6425
Supervisor label:
Supervisor Supervisor scientific name label
Dr. Tóth László
egyetemi docens, PhD, Szegedi Tudományegyetem Informatikai Intézet, Számítógépes Algoritmusok és Mesterséges Intelligencia Tanszék
Reviewer label:
Reviewer name label Reviewer scientific name label
Dr.rer.nat Schlüter Ralf
Dr. Mihajlik Péter
senior counsillor, RTWH Aachen University
PhD, adjunktus, BME VIK Távközlési és Médiainformatikai Tanszék
President label:
President name label President scientific name label
Dr. Kató Zoltán
DSc, egyetemi tanár, SZTE TTIK Képfeldolgozás és Számítógépes Grafika Tanszék
Member label:
Member name label Member scientific name label
Dr. Beszédes Árpád
Dr. Varga Dániel
Dr. Kincses Zoltán
PhD, egyetemi docens, SZTE TTIK Szoftverfejlesztés Tanszék
PhD, tudományos főmunkatárs, MTA Rényi Alfréd Matematikai Kutatóintézet
PhD, adjunktus, SZTE TTIK Műszaki Informatika Tanszék
URI: http://doktori.bibl.u-szeged.hu/id/eprint/4225
Defence/Citable status: Defended.

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year