توسعه يك رويكرد يادگيري عميق سبك‌وزن براي بازشناسي خودكار گفتار فارسي

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي كامپيوتر - هوش مصنوعي و رباتيكز

دانشكده

مهندسي كامپيوتر

تاريخ دفاع

1403/10/29

صفحه شمار

103 ص.

استاد راهنما

حميدرضا برادران كاشاني

كليدواژه فارسي

بازشناسي خودكار گفتار , يادگيري عميق سبك وزن , فشرده¬سازي مدل , تقطير دانش , هرس شبكه

چكيده فارسي

امروزه، مدل‌هاي بازشناسي گفتار با استفاده از يادگيري عميق به عملكرد بالايي دست يافته‌اند؛ اما اكثر آن‌ها به منظور بهبود دقت و عملكرد بالا، تمايل به استفاده از مدل‌هاي بزرگ و پيچيده دارند. اين موضوع مي‌تواند با دو چالش اصلي در بازشناسي گفتار همراه باشد: (1) افزايش زمان پردازش و (2) نياز به منابع سخت‌افزاري قدرتمند. راه‌حل‌ اصلي براي غلبه بر اين چالش‌ها، سبك‌سازي مدل با استفاده از روش‌هاي مختلفي مانند تقطير دانش، هرس شبكه، چندي‌سازي، جستجوي معماري عصبي و طراحي لايه¬هاي پردازشي كم پارامتر است. در اين رساله، از ميان روش‌هاي فشرده‌سازي، تركيب دو روش تقطير دانش و هرس شبكه به دليل عملكرد موفق آن‌ها در فشرده‌سازي مدل‌هاي زباني بزرگ، به‌عنوان رويكرد اصلي براي سبك‌سازي مدل بازشناسي گفتار انتخاب شده است. در اين راستا، روش‌هاي متنوعي براي انتقال دانش از لايه‌هاي مياني مدل بزرگ (معلم) به مدل سبك‌وزن (دانش‌آموز) به‌صورت جداگانه و تركيبي ارائه شده است. همچنين، تكنيك‌هاي مختلفي براي بهبود هرس شبكه پيشنهاد شده است. علاوه بر اين، رويكردهاي تركيبي براي ادغام تقطير دانش و هرس شبكه به‌منظور بهبود عملكرد اين دو تكنيك در فشرده‌سازي مدل‌هاي بازشناسي گفتار پيشنهاد و به‌طور جامع مطالعه و تحليل شده‌اند. نتايج اين پژوهش نشان مي‌دهد كه بهره‌گيري از اطلاعات نهفته در لايه‌هاي مياني در فرآيند تقطير دانش، همراه با به‌كارگيري روش‌هاي بهينه براي هرس شبكه و انتخاب استراتژي مناسب جهت تركيب اين دو روش، مي‌تواند ضمن حفظ عملكرد مدل، سرعت را 50٪ افزايش داده و پيچيدگي محاسباتي و ميزان حافظه مصرفي را به‌ترتيب 32٪ و 20٪ نسبت به مدل پايه كاهش دهد. اين بهبودها امكان استفاده از مدل را در محيط‌هاي با منابع محاسباتي محدود فراهم مي‌سازد.

كليدواژه لاتين

Automatic Speech Recognition , Lightweight Deep Learning , Model Compression , , Knowledge Distillation , Network Pruning

عنوان لاتين

Developing a Lightweight Deep Learning Approach for Automatic Persian Speech Recognition

گروه آموزشي

مهندسي هوش مصنوعي

چكيده لاتين

Today, speech recognition models based on deep learning have achieved high performance. However, to enhance accuracy an‎d efficiency, most of these models tend to be large an‎d complex, which introduces two main challenges in speech recognition: (1) increased processing time an‎d (2) the need for powerful hardware resources. The primary solution to address these challenges is model compression using various techniques such as knowledge distillation, network pruning, quantization, neural architecture search, an‎d the design of low-parameter processing layers. In this dissertation, among the available compression techniques, the combination of knowledge distillation an‎d network pruning has been chosen as the primary approach for compressing the speech recognition model due to their proven effectiveness in reducing the size of large language models. In this regard, various methods for transferring knowledge from the intermediate layers of the larger (teacher) model to the lightweight (student) model have been explored, both individually an‎d in combination. Additionally, different techniques have been proposed to enhance network pruning. Furthermore, hybrid approaches integrating knowledge distillation an‎d network pruning have been introduced an‎d comprehensively analyzed to improve their effectiveness in compressing speech recognition models. The findings of this research indicate that leveraging the latent information in intermediate layers during the knowledge distillation process, combined with optimized network pruning methods an‎d a well-designed strategy for integrating these two techniques, can enhance model efficiency while preserving its performance. Specifically, this approach increases processing speed by 50% while reducing computational complexity an‎d memory consumption by 32% an‎d 20%, respectively. These improvements enable the deployment of the model in environments with limited computational resources.

تعداد فصل ها

استاد راهنماي خارج از دانشگاه

عليرضا درويشي

فهرست مطالب pdf

143528

نويسنده

مظاهري، هاجر

لينک به اين مدرک

https://lib.ui.ac.ir/dl/search/default.aspx?Term=24938&Field=0&DTC=3