مقايسه طبقه بندي خودكار متون خبري بر اساس روش هاي مبتني بر يادگيري عميق و هستي شناسي(وردنت)

شماره ركورد
23611
شماره راهنما
LIB2 227
نويسنده
ميرزايي، عطيه
عنوان
مقايسه طبقه بندي خودكار متون خبري بر اساس روش هاي مبتني بر يادگيري عميق و هستي شناسي(وردنت)
مقطع تحصيلي
كارشناسي ارشد
رشته تحصيلي
علم اطلاعات و دانش شناسي - مديريت كتابخانه هاي دانشگاهي
دانشكده
علوم تربيتي و روان شناسي
تاريخ دفاع
بهمن‌ماه 140
صفحه شمار
92 ص.
استاد راهنما
ميترا پشوتني زاده
توصيفگر فارسي
طبقه‌بندي متون خبري , يادگيري عميق , هستي‌شناسي‌ها , طبقه‌بندي خودكار متن , شبكه‌عصبي‌پيچشي , حافظه طولاني كوتاه‌مدت
چكيده فارسي
با افزايش حجم متون الكترونيكي به خصوص در عرصه خبر، هدفاصلي پژوهش حاضرمقايسه روش‌هاي مبتني بر يادگيري عميق و هستي‌شناسي‌ وردنت براي طبقه‌بندي خودكار متون خبري است. از نظر روش پژوهش، پژوهش حاضراز نوع پژوهش‌هاي كاربردياست. از آنجاييكه پژوهش حاضر به توصيف چگونگيطبقه‌بندي خودكار متون خبري بر اساس روش‌هاي مبتني بر يادگيري عميق و هستي شناسي ‌وردنت پرداخته است و عملكرد طبقه‌بندي خودكار متون خبري با روش‌هاي ذكر شده مقايسه و ارزيابي مي‌گردد، از نظر روش اجرا درگروه پژوهش‌هاي تطبيقي نيز قرار مي‌گيرد. به اين منظور ابتدا داده‌هاي خبري انگليسي پايگاه 20 گروه خبري كه 19997 هزار متن بودنددر محيط عملياتي پايتونپيش‌پردازش و بر اساسروش‌هاي مبتني بر يادگيري عميق (شامل شبكه عصبي پيچشي و حافظه طولاني كوتاه‌مدت)، هستي‌شناسي وردنت و رويكرد تركيبي، طبقه‌بندي شدند. با استفاده از روش‌هاي يادگيري عميق،ابتدا كلمات پر اهميت متنبا تابع توكن مشخص ‌شدند. سپس،با تابع تعبيه‌ساز كلمات به بردار ويژگي تبديل شده و به عنوان ورودي روش‌هاي يادگيري عميقدر نظر گرفتهشدند.با روش هستي‌شناسي وردنت، ابتدا كلمات پر اهميت متنبا تابع تي‌اف‌-اي‌دي‌اف مشخص ‌شده، سپس معادلهستي‌شناسيواژگانيآن‌ها با وردنت و بلوك سين‌ست به مجموعه كلمات پر اهميت هر متن اضافه شدند. در آخر با رويكرد تركيبي كه از تركيب دو روش ذكر شده است متن به يك متن مبتني بر هستي‌شناسي تبديل شد و به‌طور كلي مشخصشد كدام يك از سه روش‌طبقه‌بندي متون خبريباعث بهبود عملكرد طبقه‌بندي خواهد شد. چهار معيار صحت، دقت، پوشش و اف براي ارزيابي عملكرد هر سه روش ياد شده مورد استفاده قرار گرفتند.يافته‌ها نشان داد عملكرد طبقه‌بندي خودكار متون بر اساس روش شبكه عصبي پيچشي با معيارهاي صحت، دقت، پوشش و اف به ترتيب 70/0، 92/0، 38/0 و 54/0 روش هستي‌شناسي وردنتبه ترتيب 39/0، 71/0، 39/0، 50/0 و همچنين روش تركيبيبه ترتيب 79/0، 93/0، 42/0 و 58/0 بود. از بين معيارهاي صحت، دقت، پوشش و اف شبكه عصبي پيچشي عملكرد و دقت بالاتري نسبت به هستي‌شناسي وردنت دارد. ميزان دقت شبكه عصبي پيچشي از روش يادگيري عميق برابر با 92/0 است كه در مقابل دقت روش هستي‌شناسي برابر 71/0 و كمترين مقدار است. شايد بتوان دليل اين امر را اين‌گونه شرح داد كه روش‌هاي مبتني بر يادگيري عميق بسياري از مشكلات مانند انفجار ابعاد، پراكندگي داده‌ها جلوگيري مي‌كنند و توانايي يادگيري قوي و دقت پيش‌بيني بالاتري دارند.روش‌هاي مبتني بر شبكه‌هاي عصبي قادر به مديريت حجم عظيمي از داده‌هاهستند ولي عملكرد هستي‌شناسي وردنت با افزايش حجم داده‌ها ضعيف شده ومهم‌ترين مزيت آن، افزايش دقت با اجتناب ازتكرار كلمات و همچنين، توجه به روابط معنايي كلمات است.جهت تعيين بهترين عملكرد با 4 معيار ذكر شده، روش تركيبي بهترين عملكرد و بالاترين دقت 93/. را نسبت به دو روش ديگر داشت. شايد بتوان دليل اين امر را اين‌گونه شرح داد كه از توانايي‌هاي دو روش يك‌جا استفاده شده است. بطور كلي اهميت اين پژوهش را مي‌توان اين‌گونه بيان كرد؛ اگرچه روش‌هاي يادگيري ماشين اعم از يادگيري عميق روش قدرتمندي در جهت طبقه‌بندي متون بشمار مي‌آيند اما تنها از اطلاعات و ارتباطات نحوي ميان كلمات در متن در اين راستا استفاده مي‌كنند. اما بهره بردن از دانش ارتباطات معنايي ميان كلمات متون در اين روش‌ها مي‌تواند منجر به نتايج دقيق‌تر در طبقه‌بندي متن گردد.
توصيفگر لاتين
Classification of news texts , Deep Learning , Ontologies , Automatic text classification , Convolutional Neural network , long short-term memory
عنوان لاتين
A comparison of automatic classification of news texts based on deep learning and ontology(wordnet)
گروه آموزشي
علم اطلاعات و دانش شناسي
چكيده لاتين
With the increase in the volume of electronic texts, especially in the field of news, the main goal of this research is to compare methods based on deep learning and Wordnet ontology for automatic classification of news texts. In terms of the research method, the current research is of the type of applied research. Since the current research describes how to automatically classify news texts based on methods based on deep learning and Wordnet ontology, and the performance of automatic classification of news texts is compared and eva‎luated with the mentioned methods, it is also included in the comparative research group in terms of the implementation method. Takes. For this purpose, first, the English news data of 20 newsgroups, which were 19,997 thousand texts, were pre-processed in Python operating environment and classified based on methods based on deep learning (including convolutional neural network and long short-term memory), Wordnet ontology and hybrid approach. Using deep learning methods, first the important words of the text were determined with the token function. Then, the words were converted into feature vectors with the embedding function and considered as the input of deep learning methods. With the Verdent ontology method, first the most important words of the text were identified with the TF-EDF function, then their lexical ontology equivalents were added to the set of most important words of each text with Verdent and Synset block. In the end, with the combined approach of combining the two methods mentioned, the text became an ontology-based text, and it was generally determined which of the three methods of classifying news texts would improve the classification performance. Four criteria of accuracy, precision, coverage and F were used to eva‎luate the performance of all three mentioned methods. The findings showed that the performance of automatic classification of texts based on the convolutional neural network method with the criteria of accuracy, precision, coverage and F respectively 0.70, 0.92, 0.38 and 0.54 Verdent ontology method respectively 0.39, 71. 0, 0.39, 0.50, and the combined method was 0.79, 0.93, 0.42, and 0.58, respectively. Among the criteria of correctness, accuracy, coverage and F, convolutional neural network has higher performance and accuracy than Wordnet ontology. The accuracy rate of convolutional neural network of deep learning method is equal to 0.92, which is the lowest value compared to the accuracy of ontology method equal to 0.71. Perhaps the reason for this can be explained as deep learning-based methods avoid many problems such as dimension explosion, data scattering, and have strong learning ability and higher prediction accuracy. The methods based on neural networks are able to manage a huge amount of data, but the performance of Wordnet ontology is weakened with the increase in the amount of data, and its most important advantage is the increase in accuracy by avoiding the repetition of words, as well as paying attention to the semantic relationships of words. In order to determine the best performance with the 4 mentioned criteria, the combined method of the best performance and the highest accuracy of 93. compared to the other two methods. Perhaps the reason for this can be explained as the abilities of two methods have been used at the same time. In general, the importance of this research can be expressed as follows; Although machine learning methods, including deep learning, are considered a powerful method for classifying texts, they only use syntactic information and connections between words in the text in this regard. But using the knowledge of semantic connections between the words of the texts in these methods can lead to more accurate results in text classification.
تعداد فصل ها
5
استاد مشاور خارج از دانشگاه
حسيني، مريم
لينک به اين مدرک :
https://lib.ui.ac.ir/dL/search/default.aspx?Term=23611&Field=0&DTC=3

کلیه حقوق این اثر برای شرکت مهندسی ارتباطات پيام مشرق محفوظ می باشد

ميرزايي، عطيه

مقايسه طبقه بندي خودكار متون خبري بر اساس روش هاي مبتني بر يادگيري عميق و هستي شناسي(وردنت)