پاسخ به پرسش‌هاي چندگامي از گراف‌دانش فارسي

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي كامپيوتر - نرم افزار

دانشكده

مهندسي كامپيوتر

تاريخ دفاع

19/11/1402

صفحه شمار

83 ص.

استاد راهنما

رضا رمضاني

كليدواژه فارسي

پرسش و پاسخ , سوال هاي چندگامي , پايگاه دانش , گراف دانش

چكيده فارسي

در سال‌هاي اخير، سيستم‌هاي پرسش‌وپاسخ، با هدف ارائه پاسخ نهايي به پرسش‌‌ها‌ي كاربران مورد توجه قرار گرفته‌اند. در اين سيستم‌ها، كاربران با ارائه پرسش به زبان طبيعي، پاسخي كوتاه و دقيق در كوتاه‌ترين زمان دريافت مي‌كنند. برخي از سيستم‌هاي پرسش‌وپاسخ، از گراف‌دانش به عنوان منبع داده براي ارائه پاسخ به سوال كاربران استفاده مي‌كنند. در اين سيستم‌ها، با دو دسته از پرسش‌‌ها‌ ساده و پيچيده روبرو هستيم. اكثر پژوهش‌ها بر روي پرسش‌‌ها‌ي ساده‌ كار كرده‌اند كه پاسخ به آن‌ها با يافتن يك رابطه در گراف‌دانش به دست مي‌آيد. اما در دنياي واقعي، با پرسش‌‌ها‌ي پيچيده‌تري روبرو هستيم كه براي پاسخ‌دهي به آن‌ها به يافتن روابط متعدد در گراف‌دانش نياز است. به اين نوع پرسش‌‌ها‌ي پيچيده، پرسش‌‌ها‌ي چندگامي گفته مي‌شود؛ كه سيستم‌هاي پرسش‌وپاسخ چندگامي مبتني بر گراف‌دانش، براي پاسخ به اين پرسش‌ها گسترش پيدا كرده‌اند. در اين سيستم‌ها، يافتن پاسخِ سوال نياز به استدلال چندگامي از موجوديت اصلي سوال تا پاسخ مورد انتظار در گراف‌دانش دارد. براي پاسخ‌گويي به پرسش‌‌ها‌ي چندگامي مبتني بر گراف‌دانش، سه رويكرد مبتني بر تجزيه معنايي، مبتني بر بازيابي اطلاعات و مبتني بر يادگيري عميق (تعبيه گراف‌دانش) وجود دارد. پژوهش‌هاي اين حوزه عمدتا روي زبان انگليسي كار مي‌كنند. در زبان فارسي، پژوهش‌هاي بسيار كمي در زمينه سيستم‌هاي پرسش‌وپاسخ پرسش‌‌ها‌ي پيچيده مبتني بر گراف‌دانش انجام شده كه در آن‌ها از روش‌هاي مبتني بر تجزيه معنايي استفاده گرديده است. در اين پژوهش، رويكرد تركيبي شامل بازيابي اطلاعات و تعبيه گراف‌دانش براي پاسخ‌گويي به پرسش‌‌ها‌ي چندگامي زبان فارسي ارائه شده است. استفاده از تركيب اين دو رويكرد وابستگي به قواعد نحوي و معنايي را از بين خواهد برد. بدين منظور، در مولفه پاسخ‌گويي به پرسش‌هاي چندگامي، پاسخ پرسش كاربر با بهره‌گيري از فضاي تعبيه گراف‌دانش به دست مي‌آيد. اگر چه روش پيشنهادي قادر به يافتن پاسخ بسياري از پرسش‌‌ها‌ي چندگامي است؛ اما ناقص بودن روابط در گراف‌دانش، باعث مي‌شود پاسخِ سوال پيدا نشود، در حالي كه پاسخِ سوال در گراف دانش وجود دارد. در زبان‌هاي كم‌منبع مانند زبان فارسي اين مشكل بسيار مهم‌تر خواهد بود. براي رفع اين مشكل، در اين پژوهش از مولفه تكميل گراف‌دانش استفاده شده است. با توجه به اين‌كه تعبيه روابط و موجوديت‌ها در تعبيه گراف‌دانش وجود دارد؛ از اين ويژگي براي استخراج نزديك‌ترين روابطي كه مي‌توان از آن‌ها براي يافتن پاسخ بهره برد، استفاده شده است. علاوه‌براين، در مواردي كه براي يك موجوديت، هيچ رابطه‌اي در گراف‌دانش وجود نداشته باشد، از فضاي تعبيه گراف‌دانش براي ايجاد رابطه (سه‌تايي) جديد بهره گرفته شده است. با ترجمه مجموعه‌داده MetaQA به زبان فارسي، ارزيابي‌هاي لازم بر روي روش ارائه شده انجام گرفته است. بر اساس آزمون‌ها، دقت 73.66% براساس معيار Hits@1 در مورد پاسخ‌گويي به پرسش‌هاي چندگامي به پرسش‌‌ها‌ي زبان فارسي براساس گراف‌دانش به دست آمده است. اين نتيجه بهبود 10.91% عملكرد مدل پيشنهادي نسبت به مطالعات فارسي پيشين را نشان مي‌دهد.

كليدواژه لاتين

Question Answering , Multi-hop Question , Knowledge Base , Knowledge Graph

عنوان لاتين

Answering Multi-hop Questions From Persian Knowlege Graph

گروه آموزشي

مهندسي نرم افزار

چكيده لاتين

In recent years, question an‎d answer systems have received attention with the aim of providing final answers to usersʹ questions. In these systems, users receive a short an‎d accurate answer in the shortest time by submitting a question in natural language. Some question an‎d answer systems use the database as a data source to provide answers to usersʹ questions. In these systems, we face two categories of simple an‎d complex questions. Most researches have worked on simple questions that can be answered by finding a relationship in the database. But in the real world, we face more complex questions that require finding multiple relationships in the database to answer them. These types of complex questions are called multi-step questions; that multi-step question an‎d answer systems based on knowledge base have been developed to answer these questions. In these systems, finding the answer to the question requires a multi-step reasoning from the original entity of the question to the expected answer in the database. To answer multi-step questions based on knowledge base, there are three approaches based on semantic analysis, based on information retrieva‎l an‎d based on deep learning (knowledge graph embedding). Researches in this field mainly work on English language. In Persian language, very few researches have been done in the field of complex question an‎d answer systems based on knowledge base, in which methods based on semantic analysis have been used. In this research, a hybrid approach including information retrieva‎l an‎d knowledge graph embedding is presented to answer multi-step Persian questions. Using the combination of these two approaches will eliminate the dependence on syntactic an‎d semantic rules. For this purpose, in the component of answering multi-step questions, the answer to the userʹs question is obtained by using the knowledge graph embedding space. Although the proposed method is able to find answers to many multi-step questions; But the incompleteness of the relationships in the knowledge graph makes the answer to the question not found, while the answer to the question exists in the knowledge graph. This problem will be much more important in low-source languages such as Persian. To solve this problem, in this research, the completion component of graph knowledge has been used. Considering that the embedding of relationships an‎d entities exists in the embedding of knowledge graph; This feature has been used to extract the closest relationships that can be used to find the answer. In addition, in cases where there is no relationship in the knowledge graph for an entity, the embedding space of the knowledge graph has been used to create a new (ternary) relationship. By translating the MetaQA dataset into Persian language, necessary eva‎luations have been done on the presented method. Based on the experiments, the accuracy of 73.66% based on the Hits@1 criterion has been obtained in answering multi-step questions to Persian language questions based on graph knowledge. This result shows a 10.91% improvement in the performance of the proposed model compared to previous Persian studies.

تعداد فصل ها

فهرست مطالب pdf

132199

نويسنده

رحيمي انداني، امنه

لينک به اين مدرک

https://lib.ui.ac.ir/dl/search/default.aspx?Term=24698&Field=0&DTC=3