توسعه يك روش طبقه بندي به منظور ايجاد امتياز ريسك تفسيرپذير

شماره ركورد
23887
شماره راهنما
COM3 125
نويسنده
شريفي سده، سارا
عنوان
توسعه يك روش طبقه بندي به منظور ايجاد امتياز ريسك تفسيرپذير
مقطع تحصيلي
دكتري
رشته تحصيلي
مهندسي كامپيوتر - نرم افزار
دانشكده
مهندسي كامپيوتر
تاريخ دفاع
1403/05/31
صفحه شمار
105 ص.
استاد راهنما
دكتر محمدعلي نعمت بخش , دكتر افسانه فاطمي
كليدواژه فارسي
يادگيري ماشين , امتياز ريسك , ريسك , تفسيرپذيري , ريسك¬اسليم
چكيده فارسي
مدل‌هاي امتياز ريسك يك نوع از مدل¬هاي طبقه¬بندي خطي هستند كه به كاربران اجازه مي‌دهند با انجام عمليات رياضي ساده بر روي ويژگي‌ها، پيش‌بيني‌ سريع انجام دهند. اين مدل‌ها به طور گسترده‌اي در كاربردهايي كه در نهايت انسان نقش تصميم‌گيري خواهد داشت، استفاده مي‌شوند و درك و ارزيابي آن‌ها آسان است. با وجود استفاده¬ي گسترده¬ي اين مدل‌ها در دنياي واقعي، امتياز ريسك هنوز با استفاده از روش‌هاي تركيبي آماري، راهكارهاي ابتكاري و قضاوت‌هاي كارشناسي ساخته مي‌شوند. چنين روش‌هايي باعث كاهش عملكرد مي¬شوند و ساخت امتيازهاي ريسك كاربردي و پذيرفته‌شده را براي متخصصان دشوار مي‌سازد. در اين رساله، يك روش طبقه‌بندي تفسيرپذير بر مبناي روش ريسك¬اسليم توسعه داده شده است كه بر پايه انتخاب ويژگي، انتخاب ضرايب صحيح كوچك و محدوديت¬هاي عملياتي استوار است. روش يادگيري ماشين پيشنهادي داراي قابليت تفسيرپذيري بالايي بوده و به‌راحتي قابل امتيازدهي است، به حدي كه نياز به كامپيوتر يا حتي ماشين‌حساب ندارد. سه قابليت جديد به روش پيشنهادي اضافه شده است: اول، يك پارامتر تنظيمي قابل‌تنظيم و مستقل براي هر ضريب ويژگي وجود دارد. دوم، انتخاب خودكار نقاط برش براي دودويي كردن ويژگي‌ها امكان‌پذير شده است. سوم، كاهش ابعاد ويژگي در مرحله پيش‌پردازش انجام مي‌گيرد. براي اعتبار سنجي روش پيشنهادي، دو مورد مطالعاتي براي پيش¬بيني تشخيص كوويد-19 و ميزان رضايت كاركنان از خدمات فناوري اطلاعات در شركت فولاد مباركه اصفهان شرح داده شده است. در روش پيشنهادي نسبت به روش ريسك¬اسليم، هزينه محاسباتي حدود 13 برابر كاهش داده شد و در برابر داده‌هاي نادرست و حفاظت از اطلاعات حساس بدون از دست دادن عملكرد مقاوم شد. همچنين، در رساله، نشان داده شد باوجوداينكه كوويد-19 علائمي دارد كه به علائم ساير بيماري‌ها مانند آنفلوانزا و سرمازدگي شبيه است و همين باعث مي¬شود كه تشخيص آن را از طريق پرسش‌نامه‌هاي خوداظهاري دشوار كند، اين امتياز ريسك پيشنهادي، عملكرد خوبي در مجموعه آزمايشي نشان داده است و مقدار AUROC و AUPRC آن به ترتيب 82 درصد و 27 درصد به دست آورد. اين نتايج با نتايج مدل رگرسيون لجستيك با مجازات مقايسه شده‌اند كه مقدار AUROC آن 85 درصد و مقدار AUPRC آن 22 درصد بود، و مدل جنگل تصادفي كه مقدار AUROC آن 86 درصد و مقدار AUPRC آن 23 درصد بود. اگرچه مقدار AUROC مدل روش پيشنهاد شده كمي بدتر از دو مدل ديگر بود، اما AUPRC آن بهتر بود. همچنين مي¬تواند فرايند انتخاب محدوديت¬هاي عملياتي متخصصان حوزه را به طور كامل مكانيزه كند و همين امر باعث مي¬شود روش پيشنهادي عملكرد خوبي از نظر صحت و سرعت داشته باشد. بررسي¬هاي نتايج در اين تحقيق نشان مي¬دهد كه روش پيشنهادي از نظر صحت و سرعت قابل رقابت با روش¬هاي يادگيري ماشين معروف مانند جنگل تصادفي و رگرسيون لجستيك با مجازات و همچنين روش بهينه ريسك اسليم است.
كليدواژه لاتين
Machine learning , risk score , Risk , Interpretability , Risk Slim
عنوان لاتين
Developing a Classification Method for Creating Interpretable Risk Scores
گروه آموزشي
مهندسي نرم افزار
چكيده لاتين
Risk score models are a type of linear classifier that allow users to make quick predictions by performing simple arithmetic operations on features. These models are widely used in applications where humans ultimately make decisions and are easy to understand and eva‎luate. Despite their widespread use in the real world, risk scores are still constructed using a combination of statistical methods, heuristic approaches, and expert judgments. Such methods can reduce performance and make it difficult for professionals to develop practical and accepted risk scores. In this thesis, an interpretable classification method based on the RiskSlim approach has been developed, which is based on feature selection, selection of small integer coefficients, and operational constraints. The proposed machine learning method has high interpretability and is easily scorable, to the extent that it does not require a computer or even a calculator. Three new capabilities have been added to the proposed method: first, an adjustable and independent tuning parameter for each feature coefficient. Second, automatic selection of cut-off points for binarizing features has been made possible. Third, feature dimension reduction is performed in the preprocessing stage. To validate the proposed method, two case studies are described for predicting COVID-19 diagnosis and employee satisfaction with IT services at Mobarakeh Steel Company in Isfahan. Compared to the RiskSlim method, the computational cost in the proposed method was reduced by approximately 13 times and became resilient to incorrect data and protecting sensitive information without losing performance. It was also shown that despite COVID-19 having symptoms similar to other diseases like influenza and cold, which can make its diagnosis through self-report questionnaires difficult, the proposed risk score showed good performance in the test set, achieving an AUROC of 82% and an AUPRC of 27%. These results were compared to the logistic regression model with penalty, which had an AUROC of 85% and an AUPRC of 22%, and the random forest model, which had an AUROC of 86% and an AUPRC of 23%. Although the AUROC of the proposed method was slightly worse than the other two models, its AUPRC was better. Additionally, it can fully automate the process of selecting operational constraints by domain experts, making the proposed method perform well in terms of accuracy and speed. The results indicate that the risk scores developed by this method can successfully be used for the early identification of COVID-19 cases. These results help companies take appropriate measures to effectively prevent the spread of the virus among their employees. Additionally, the results show that risk scores can be developed in a practical and accurate manner using this proposed method. This method is not only efficient in facing the challenges of predicting and diagnosing COVID-19 but can also be used as a powerful tool for addressing other health and work-related challenges. The eva‎luation of results in this research shows that the proposed method is competitive in terms of accuracy and speed with well-known machine learning methods like random forests and penalized logistic regression, as well as the optimal RiskSlim method.
تعداد فصل ها
6
لينک به اين مدرک :
https://lib.ui.ac.ir/dL/search/default.aspx?Term=23887&Field=0&DTC=3

کلیه حقوق این اثر برای شرکت مهندسی ارتباطات پيام مشرق محفوظ می باشد

شريفي سده، سارا

توسعه يك روش طبقه بندي به منظور ايجاد امتياز ريسك تفسيرپذير