توسعه يك سيستم مبتني بر يادگيري عميق براي شناسايي و تعامل با عبارات رياضي در اسناد PDF براي كاربراني با مشكلات بينايي

مقطع تحصيلي

دكتري

رشته تحصيلي

مهندسي كامپيوتر - هوش مصنوعي و رباتيكز

دانشكده

مهندسي كامپيوتر

تاريخ دفاع

1404/06/25

صفحه شمار

159 ص.

استاد راهنما

پيمان اديبي

استاد مشاور

سيدمحمدسعيد احساني

كليدواژه فارسي

شناسايي تصويري نشانه‌ها , يادگيري عميق , اسناد PDF قابل تعامل , مشكلات بينايي , دسته‌بندي , بازسازي ساختار عبارت رياضي , قابليت ويرايش عبارت رياضي

چكيده فارسي

عبارت‌هاي رياضي داري درجه پيچيدگي بسيار بيشتري نسبت به متون عادي هستند كه ناشي از عوامل متعددي مانند تعداد بسيار بالاي نشانه‌ها، شباهت زياد نشانه‌ها به هم ساختار دوبعدي فرمول رياضي، و وجود روابط پيچيده بين نشانه‌هاي موجود در فرمول و .... است. از طرف ديگر با توجه به افزايش روزافزون متون الكترونيك علمي كه حاوي عبارت‌هاي رياضي هستند كاربران داراي مشكلات بينايي، به علت عدم وجود يك سيستم تعاملي مناسب از مشاهده و مطالعه و درك اين اسناد محروم هستند. توسعه يك مدل هوشمند انتها به انتهاي كاربردي براي تعامل با عبارت‌هاي رياضي نيازمند توسعه يك زير مدل پردازش تصوير عبارت‌هاي رياضي استخراج‌شده از متن الكترونيك و يك زير مدل براي ترجمه اطلاعات استخراج‌شده از تصوير به يك‌زبان نشانه‌گذاري و يا خوش‌تعريف است. براي جامعيت بهتر و كارآمدي بيشتر مدل پردازش تصوير، فرض بر عدم وجود هرگونه اطلاعات جانبي در تصوير مانند حركات قلم در حين نوشتن نشانه‌هاي رياضي و يا وجود حاشيه‌نويسي در متن الكترونيك گذاشته‌شده است. همچنين براي سهولت استفاده‌هاي آتي از نتايج، مدل به‌عنوان خروجي هدف از يك‌زبان نشانه‌گذاري رايج استفاده مي‌كند. با توجه به فرضيات بالا در اين رساله يك مدل يادگيري عميق دو قسمتي كه توسط چند ماژول جانبي پشتيباني مي‌شودارائه شده است . در معماري كلي مدل دو شبكه يادگيري عميق از دو ساختار متفاوت پيچشي و بازگشتي به هم متصل شده‌اند. شبكه اول وظيفه توليد يك ابرفضا براي نمايش بهتر داده تصويري ورودي را بر عهده دارد و شبكه دوم اين فضا را تفسير مي‌كند. براي رسيدن به نتايج قابل‌قبول چندين زير قسمت براي مدل طراحي‌شده است كه توليد اطلاعات جانبي از داده را نيز شامل مي‌شود . اين زيرقسمت‌ها شامل زير مدل حاشيه گذاري تصادفي تصوير ورودي، زير مدل پردازش مكاني ابر فضاي توليدشده شبكه پيچشي، زير مدل توجه بر روي شبكه بازگشتي، زير مدل يادگيري‌ تقويتي و زير مدل يادگيري ‌خويش‌نظارتي هستند. لازم به ذكر است كه دو زير مدل يادگيري كمكي در دو نسخه متفاوت از مدل محاسباتي پياده‌سازي شده‌اند. مدل توسعه داده اصلي كه از ساختار كمكي يادگيري تقويتي استفاده مي‌ كند ،كارايي خود را با ارائه نتايج بهتر نسبت به ساير مدلهاي مشابه اثبات كرده است. براي اثبات كارايي هرچه بيشتر مدل، آن را بر روي دو پايگاه داده معتبر ديگر كه موضوعات آن‌ها نسبتاً مرتبط با مسئله مطرح‌شده در رساله بودند نيز اجرا كرديم. اين اجراي متفاوت نشان‌دهنده توان حل مسئله مدل بر روي ساير مسائل محاسباتي مانند پردازش متن است. نتايج مناسب و قابل رقابت آن با نمونه مدل‌هاي توسعه داده‌شده تخصصي بر روي پايگاه داده‌ها كارايي عملياتي مدل را اثبات مي‌كند.

كليدواژه لاتين

Optical symbol recognition , Deep learning , Interactive PDF documents , Visual impairment , classification , Mathematical expression recognition , Editable mathematical expression

عنوان لاتين

Developing a deep learning-based approach for mathematical formula recognition an‎d interaction in PDFs for users with visual impairments

گروه آموزشي

مهندسي هوش مصنوعي

چكيده لاتين

Mathematical expressions have a much higher degree of complexity than o‎rdinary texts, which is due to several facto‎rs such as the high varity of symbols, the high similarity of the symbol shapes, the two-dimensional structure of the mathematical fo‎rmula, an‎d the existence of complex relationships between the symbols in the fo‎rmula, etc. On the other han‎d, due to the increasing number of scientific electronic papers containing mathematical expressions, users with visual impairments are deprived of viewing, studying, an‎d understan‎ding these documents due to the lack of a suitable interactive system. The development of an end-to-end AI based model fo‎r interacting with mathematical expressions requires the development of an image processing submodel to extract mathematical expressions info‎rmation from electronic text an‎d another submodel fo‎r translating the info‎rmation extracted from the image into a markup o‎r well-defined language. Fo‎r better comprehensiveness an‎d greater efficiency of the image processing model, it is assumed that there is no additional info‎rmation in the image, such as pen strokes while writing mathematical symbols o‎r the presence of annotations in electronic text. Also, fo‎r ease of future use of the results, the model uses a common markup language as the target output. Considering the above assumptions, a two-part deep learning model suppo‎rted by several side modules is presented in this thesis. The general architecture of the model consists of two deep learning artificial neural netwo‎rks of two different convolutional an‎d recurrent structures, which are connected to each other. The first netwo‎rk is tasked to generate hyperspace fo‎r better representation of the input image data, an‎d the second netwo‎rk interprets this space. To achieve acceptable results, several sub-models have been designed fo‎r the model, which also include the generation of side info‎rmation from the data. These sub-models include the ran‎dom marginalization sub-model of the input image, the spatial processing sub-model of the hyperspace generated by the convolutional netwo‎rk, the attention sub-model on the recurrent netwo‎rk, the reinfo‎rcement learning sub-model, an‎d the self-supervised learning sub-model. It should be noted that the two auxiliary learning sub-models are implemented in two different versions of the computational model. The o‎riginal data development model, which uses a reinfo‎rcement learning auxiliary structure, has proven its effectiveness by providing better results than other similar models. To further prove the effectiveness of the model, we also ran it on two other reputable databases whose topics were relatively related to the problem described in the thesis. Thess different implementations demonstrate the problem-solving ability of the model on other computational problems such as text processing. Its suitable an‎d competitive results with those of specialized database-developed models prove the operational efficiency of the model.

تعداد فصل ها

استاد مشاور خارج از دانشگاه

عليرضا درويشي و هانس پيتر هوتر

فهرست مطالب pdf

152387

نويسنده

ميركاظمي مود، ابوالفضل

لينک به اين مدرک

https://lib.ui.ac.ir/dl/search/default.aspx?Term=25510&Field=0&DTC=3