توسعه شبكه عصبي گرافي براي داده هاي غيرگرافي متني و تصويري چندماهيتي

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي كامپيوتر - هوش مصنوعي و رباتيكز

دانشكده

مهندسي كامپيوتر

تاريخ دفاع

1404/07/29

صفحه شمار

85 ص .

استاد راهنما

دكتر سيد پيمان اديبي , دكتر عليرضا درويشي

كليدواژه فارسي

يادگيري چندماهيتي , شبكه عصبي گرافي , يادگيري ساختار گراف , پيش¬بيني بيماري , پرسش و پاسخ تصويري

چكيده فارسي

يادگيري ساختار گراف شاخه‌اي نوظهور از يادگيري ماشين است كه با هدف كشف خودكار روابط پنهان ميان داده‌ها، بدون نياز به تعريف صريح ساختار گراف توسط انسان، توسعه يافته است. اين رويكرد به‌ويژه در داده‌هاي چندماهيتي، كه در آن تعاملات پيچيده‌اي ميان مؤلفه‌هاي متني، تصويري يا زيستي وجود دارد، اهميت فراواني دارد. اين پژوهش با هدف توسعه روش‌هاي نوين يادگيري ساختار گراف چندماهيتي براي داده‌هاي غيرگرافي چندماهيتي متني و تصويري ارائه شده است. در گام نخست، مدلي براي پيش‌بيني بيماري‌هاي آلزايمر و اوتيسم بر پايه داده‌هاي چندماهيتي پزشكي معرفي مي‌شود كه در سطح گره عمل مي-كند. مدل ما با بهره‌گيري از يادگيري پويا و وظيفه‌محور ساختار گراف، نسبت به مدل پايه بر روي دو مجموعه داده ABIDE و TADPOLE عملكرد بهتري به دست آورده است. در معيار ويژه بودن، مدل ما در مجموعه داده ABIDE، بهترين نتيجه را نسبت به بهترين روش¬هاي موجود به دست آورده است. در گام دوم، مدلي نوين براي پرسش و پاسخ تصويري ارائه شده است. اين مدل، نخستين مدل چندماهيتي داراي يادگيري ساختار گراف در سطح گراف محسوب مي‌شود. اين مدل با كشف توپولوژي رابطه‌اي ميان اجزاي متني و تصويري، توانسته است به عملكردي قابل قبولي نسبت به بهترين روش‌هاي گرافي موجود در حوزه پرسش و پاسخ تصويري دست يابد. نتايج نشان مي‌دهند كه چارچوب‌هاي پيشنهادي نه‌تنها موجب بهبود صحت پيش‌بيني در مسائل پزشكي مي‌شوند، بلكه داراي عملكرد قابل قبولي در پيش¬بيني پاسخ در مسئله پرسش و پاسخ تصويري بر روي مجموعه داده VQA نسخه 2 هستند.

كليدواژه لاتين

Multi-Modal Learning , Graph Neural Network , Graph Structure Learning

عنوان لاتين

Developing graph neural network for multimodal non-graph textual an‎d visual data

گروه آموزشي

مهندسي هوش مصنوعي

چكيده لاتين

Graph Structure Learning is an emerging branch of machine learning that aims to automatically discover hidden relationships among data without the need fo‎r explicitly defining the graph structure by humans. This approach is particularly impo‎rtant fo‎r multimodal data, where complex interactions exist among textual, visual, o‎r biological components. This study aims to develop novel methods fo‎r multimodal graph structure learning on non-graph multimodal textual an‎d visual data. In the first step, a model is introduced fo‎r predicting Alzheimer’s an‎d Autism diseases based on multimodal medical data, which operates at the node level. Our model, by employing dynamic an‎d task-o‎riented graph structure learning, achieves superio‎r perfo‎rmance compared to the baseline model on the ABIDE an‎d TADPOLE datasets. In terms of specificity, our model achieves the best result on the ABIDE dataset compared to the existing state-of-the-art methods. In the second step, a novel model fo‎r visual question answering is presented. This model is the first multimodal approach that inco‎rpo‎rates graph structure learning at the graph level. By discovering the relational topology between textual an‎d visual components, the model achieves competitive perfo‎rmance compared to the best existing graph-based methods in the VQA domain. The results demonstrate that the proposed framewo‎rks not only improve prediction accuracy in medical diagnosis tasks but also achieve satisfacto‎ry perfo‎rmance in answer prediction fo‎r the visual question answering task on the VQA v2.0 dataset.

تعداد فصل ها

فهرست مطالب pdf

157691

نويسنده

رهبر، زينب

لينک به اين مدرک

https://lib.ui.ac.ir/dl/search/default.aspx?Term=25731&Field=0&DTC=3