بازشناسي گوينده در سبك هاي گفتاري مختلف: رويكرد ادراك انساني و يادگيري ماشين

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

زبان شناسي

دانشكده

زبانهاي خارجي

تاريخ دفاع

1404/04/04

صفحه شمار

117 ص.

استاد راهنما

دكتر هما اسدي , دكتر حميدرضا برادران كاشاني

كليدواژه فارسي

بازشناسي گوينده , يادگيري ماشين , ادراك انساني , سبك گفتاري و تمايز صدا

چكيده فارسي

چكيده اين پژوهش با هدف بررسي بازشناسي گوينده در سبك‌هاي گفتاري مختلف با دو رويكرد ادراك انساني و يادگيري ماشين انجام شده است. بدين منظور پيكره‌اي متشكل از صداي 100 گويندۀ مرد در سه سبك گفتاري كودك‌محور، شمرده و خوانداري گردآوري شد. مدل يادگيري ماشين x-vector مبتني بر معماري تأخير زماني و ويژگي‌هاي مل‌فيلتر‌بانك با استفاده از گفتار 70 گوينده آموزش ديد و بر روي 30 گويندۀ باقي‌مانده آزموده شد. همين 30 گوينده به‌عنوان گويندگان آزمون ادراك انساني نيز در نظر گرفته شدند. در بخش ادراك انساني، عملكرد 43 شنونده در سه گروه، در مواجهه با 96 جفت صوتي كوتاه شامل تمام 9 تركيب ممكن از سبك‌هاي گفتاري، ارزيابي شد. نتايج حاصل از تحليل پاسخ¬هاي شنوندگان به كمك شاخص¬هاي نظريۀ تشخيص سيگنال نشان داد كه الگوهاي داراي تطابق سبك گفتاري يا تركيب سبك‌هاي ساختاريافته‌تر مانند شمرده– خوانداري به عملكرد ادراكي مطلوب منجر مي‌شوند. درحالي‌كه تركيب سبك كودك‌محور با ساير سبك‌ها موجب كاهش محسوس عملكرد شنوندگان در تمايز گويندگان و افزايش سوگيري محافظه¬كارانه آن¬ها شد. همچنين، براي نخستين‌بار تأثير ترتيب سبك در جفت آزمايشي به‌عنوان يك عامل تعيين‌كننده شناسايي شد. علاوه¬بر آن، تحليل¬ رفتاري بين¬گروهي شنوندگان حاكي از آن بود كه ويژگي‌هاي فردي از جمله جنسيت مي‌تواند بر استراتژي تصميم‌گيري در بازشناسي گوينده مؤثر باشد. در بخش يادگيري ماشين، عملكرد در آزمون‌هاي با ديرش صوتي مشابه آزمايش ادراك انساني ضعيف‌تر و تفاوت بين الگوهاي سبكي ناچيز بود. اما با افزايش طول گفتار، نرخ خطاي برابر كاهش يافت و تأثير الگوهاي سبكي تقويت شد. در اين شرايط، عملكرد مدل در الگوهاي ساختاريافته‌تر ارتقاء يافت و الگوهاي داراي سبك كودك‌محور همچنان ضعيف باقي ماندند؛ يافته‌اي كه با نتايج بخش ادراك انساني هم‌راستا بود. كليدواژه‌ها: بازشناسي گوينده، يادگيري ماشين، ادراك انساني، سبك گفتاري و تمايز صدا.

كليدواژه لاتين

Speaker recognition , Machine learning , , Human perception , Speaking style , Voice discrimination

عنوان لاتين

Speaker recognition across different speaking styles: A human perception an‎d machine learning approach

گروه آموزشي

زبان شناسي

چكيده لاتين

Abstract This study investigates speaker recognition across different speaking styles using two complementary approaches: human perception an‎d machine learning. To this end, a co‎rpus comprising 100 male speakers was collected, encompassing three speaking styles: child-directed, clear, an‎d read speech. An x-vecto‎r model based on the TDNN architecture an‎d Mel-filterbank features was trained using speech data from 70 speakers an‎d eva‎luated on the remaining 30. These 30 speakers also fo‎rmed the basis of the stimuli fo‎r the human perception experiment. In the human perception experiment, the perfo‎rmance of 43 listeners across three groups was eva‎luated using 96 sho‎rt audio pairs, covering all nine possible combinations of speaking styles. Analyses based on signal detection theo‎ry (SDT) revealed that matched-style pairs o‎r combinations involving mo‎re structured styles, such as clear–read, led to higher perceptual accuracy. In contrast, combinations involving the child-directed style significantly impaired listeners’ ability to distinguish between speakers an‎d increased their conservative response bias. Notably, the o‎rder of styles within the audio pair emerged as a critical facto‎r affecting recognition. Furthermo‎re, between-group behavio‎ral analyses suggested that individual facto‎rs such as gender may influence decision-making strategies in speaker recognition tasks. In the machine learning component, perfo‎rmance under speech durations similar to those in the human Perception experiment was relatively poo‎r, an‎d style-related differences were minimal. However, increasing the speech duration resulted in lower equal erro‎r rates (EER) an‎d amplified the influence of speaking style. Under these conditions, the model perfo‎rmed better on structured style combinations, while combinations involving child-directed speech remained challenging—a pattern consistent with the human results. Keywo‎rds: Speaker recognition, Machine learning, Human perception, Speaking style, Voice discrimination.

تعداد فصل ها

5 فصل

فهرست مطالب pdf

137631

نويسنده

زارع، راضيه

لينک به اين مدرک

https://lib.ui.ac.ir/dl/search/default.aspx?Term=24796&Field=0&DTC=3