بازشناسي گوينده در زبان فارسي با استفاده از الگوريتم‌هاي يادگيري ماشين

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

زبانشناسي رايانشي

دانشكده

زبانهاي خارجي

تاريخ دفاع

1404-12-23

صفحه شمار

138 ص.

استاد راهنما

هما اسدي

استاد مشاور

اسفنديار طاهري

كليدواژه فارسي

بازشناسي گوينده , يادگيري ماشين , ضرايب كپسترال فركانسي مل , سبك خوانشي , سبك بداهه

چكيده فارسي

پژوهش پيش رو به بررسي و مقايسه عملكرد پنج الگوريتم يادگيري ماشين در بازشناسي گويندگان فارسي¬زبان مي‌پردازد. هدف اصلي، ارزيابي الگوريتم‌هايي است كه در پژوهش‌هاي پيشين نشان داده‌اند با حجم داده و هزينه آموزشي كمتر نسبت به مدل‌هاي يادگيري عميق و شبكه‌هاي عصبي، به نتايج قابل¬قبولي دست مي‌يابند. بدين منظور، الگوريتم‌هاي جنگل تصادفي، درخت تصميم، مدل آميخته گاوسي، نزديك‌ترين همسايه و ماشين بردار پشتيبان انتخاب شد و عملكرد آن¬ها روي يك پيكره صوتي فارسي شامل 60 گوينده مرد (با سبك‌هاي خوانشي و بداهه) مورد مقايسه قرار گرفت. براي آماده‌سازي داده‌ها، 12 ضريب كپسترال فركانسي مل با استفاده از نرم‌افزار پرت به‌عنوان ويژگي‌هاي اصلي استخراج شد. ارزيابي عملكرد مدل‌ها با استفاده از معيارهاي صحت، دقت، بازيابي و معيار اف¬وان هم به¬صورت كلي و هم براي هر كلاس گوينده انجام پذيرفت. همچنين زمان آموزش هر مدل، ماتريس پيچيدگي و هزينه‌هاي مرتبط با پياده‌سازي نيز تحليل گرديد. يافته‌هاي پژوهش نشان مي‌دهد كه الگوريتم نزديك‌ترين همسايه با صحت 84/58 درصد در سبك خوانشي و بدون استانداردسازي داده‌ها، بالاترين عملكرد را داشته است. در مقابل، الگوريتم درخت تصميم پايين‌ترين درصد صحت را در هر دو سبك به خود اختصاص داد. نكته قابل¬تأمل آنكه در تمامي مدل‌ها، درصد صحت سبك بداهه پايين‌تر از سبك خوانشي بود كه بيانگر تأثير معنادار سبك گفتار بر عملكرد سيستم‌هاي بازشناسي گوينده است.

كليدواژه لاتين

Speaker Recognition , Machine Learning , Mel Frequency Capestral Coefficients , Read Style , Spontaneous Style

عنوان لاتين

Speaker recognition in Persian using machine learning algorithms

گروه آموزشي

زبان شناسي

چكيده لاتين

The present study investigates an‎d compares the performance of five machine learning algorithms in Persian speaker recognition. The main goal is to eva‎luate algorithms that have been shown in previous studies to achieve acceptable results with less data volume an‎d training cost than deep learning models an‎d neural networks. For this purpose, the ran‎dom forest, decision tree, Gaussian mixture model, nearest neighbor, an‎d support vector machine algorithms were selec‎ted an‎d their performance was compared on a Persian audio corpus consisting of 60 male speakers (with read an‎d spontaneous styles). To prepare the data, 12 Mel frequency cepstral coefficients were extracted using the Pert software as the main features. The performance of the models was eva‎luated using the criteria of accuracy, precision, recall, an‎d the F-1 criterion both overall an‎d for each speaker class. The training time of each model, the complexity matrix, an‎d the costs associated with implementation were also analyzed. The research findings show that the nearest neighbor algorithm had the highest performance with an accuracy of 84.58% in the reading style an‎d without data stan‎dardization. In contrast, the decision tree algorithm had the lowest accuracy in both styles. It is noteworthy that in all models, the accuracy of the improvised style was lower than the reading style, which indicates a significant effect of speech style on the performance of speaker recognition systems.

تعداد فصل ها

فهرست مطالب pdf

157355

نويسنده

مهدي پور، علي

لينک به اين مدرک

https://lib.ui.ac.ir/dl/search/default.aspx?Term=25703&Field=0&DTC=3