توسعه كنترل‌كننده مبتني بر يادگيري تقويتي عميق عاطفي: يك مطالعه موردي براي ربات پرنده كوادروتور

مقطع تحصيلي

دكتري

رشته تحصيلي

مهندسي مكانيك - طراحي كاربردي

دانشكده

فني و مهندسي

تاريخ دفاع

1404/10/30

صفحه شمار

101 ص .

استاد راهنما

كيوان ترابي , حامد شهبازي

كليدواژه فارسي

كوادروتور , كنترل‌كننده شبكه عصبي , يادگيري تقويتي عميق , هوش عاطفي , يادگيري تقويتي عميق عاطفي , تابع فعال‌ساز انساني

چكيده فارسي

در دهه‌هاي اخير، پيشرفت‌هاي چشمگيري در حوزه هوش مصنوعي و يادگيري ماشين، به‌ويژه در زمينه يادگيري تقويتي عميق حاصل‌شده است. اين الگوريتم‌ها با بهره‌گيري از شبكه‌هاي عصبي عميق و داده‌هاي حاصل از تعامل با محيط، قادرند رفتار سيستم‌هاي پيچيده را در محيط‌هاي پويا و نامعين به‌صورت بهينه فراگيرند. باتوجه‌به زمان‌بر بودن فرايند آموزش در الگوريتم‌هاي يادگيري تقويتي عميق براي سامانه‌هاي ديناميكي پيچيده‌اي همچون كوادروتور، در اين پژوهش از تركيب هوش عاطفي انساني با هوش منطقي الگوريتم‌هاي يادگيري تقويتي عميق به‌منظور تسريع فرايند آموزش و يادگيري كنترلي استفاده‌شده است. در اين بستر چگونگي استخراج عواطف برمبناي مدل انساني و اعمال آن‌ها بر روي مدل ديناميكي مورد بررسي قرار گرفت و مدل يادگيري تقويتي عميق مبتني بر عاطفه توسعه داده شده خواهد شد. ازجمله چالش‌هاي اصلي در كنترل‌كننده‌هاي مبتني بر يادگيري عميق مي‌توان به نياز به حجم وسيعي از داده‌ها براي آموزش، سرعت پايين همگرايي و ثابت‌بودن تابع پاداش اشاره كرد. در اين پژوهش با الهام از فرايند يادگيري انسان و ادغام مؤلفه‌هاي عاطفي در طراحي كنترل‌كننده، تلاش شده تا سرعت يادگيري الگوريتم يادگيري تقويتي عميق متناسب با شرايط محيطي و اهداف كنترلي، افزايش يابد. ازآنجايي‌كه تعريف يك تابع پاداش ثابت و جامع براي الگوريتم‌هاي يادگيري تقويتي عميق در راستاي پوشش تمام جنبه‌هاي كنترلي بسيار دشوار مي‌باشد، تركيبي از پاداش منطقي و پاداش عاطفي پويا به‌كارگرفته‌شده است. افزودن مؤلفه‌هاي عاطفي به تابع پاداش و حالت‌هاي سيستم منجر به ايجاد يك ساختار متغير و انعطاف‌پذير مي‌شود كه توان يادگيري و سرعت تصميم‌گيري كنترل‌كننده را افزايش مي‌دهد. در اين راستا، تابع پاداش پيشنهادي بر اساس عواطف خشم و رضايت طراحي‌شده تا سيستم بتواند نسبت به شرايط مطلوب يا نامطلوب محيط، رفتار تطبيق‌پذير و هوشمندانه از خود نشان دهد. علاوه بر اين، در اين پژوهش تابع فعال‌ساز انساني و شبكه‌ القا به‌عنوان مسيرهاي پردازش مكمل در كنار الگوريتم يادگيري تقويتي عميق جهت ايجاد ساختار عاطفي براي شبكه معرفي‌شده‌اند تا علاوه بر سرعت يادگيري، پايداري و دقت تصميم‌گيري در محيط‌هاي پيچيده‌اي همچون كوادروتور بهبود يابد. براي ارزيابي روش‌هاي پيشنهادي، بستر شبيه‌سازي كنترل كوادروتور در محيط شبيه‌سازي نرم‌افزار متلب طراحي گرديد. با شبيه سازي انجام شده و مقايسه روش‌هاي عاطفي ارائه شده مشخص گرديد كه با افزودن عواطف تحت عنوان پاداش، حالت و ساختار شبكه به افزايش سرعت يادگيري و در نتيجه كاهش زمان لازم براي يادگيري منجر مي‌گردد.

كليدواژه لاتين

Quadrotor , Deep reinforcement learning algorithm , Emotional deep reinforcement learning algorithm , Human activation function , Neural Network , Inception Network

عنوان لاتين

Develop an‎d Simulate Emotional Deep Reinforcement Learning controller: a Case Study on Quadrotor

گروه آموزشي

مهندسي مكانيك

چكيده لاتين

In recent decades, significant advancements have been made in the field of artificial intelligence an‎d machine learning, particularly in deep reinfo‎rcement learning. These algo‎rithms, utilizing deep neural netwo‎rks an‎d data obtained from interactions with the environment, are capable of optimally learning the behavio‎r of complex systems in dynamic an‎d uncertain environments. Considering the time-consuming nature of the training process in deep reinfo‎rcement learning algo‎rithms fo‎r complex dynamic systems such as quadroto‎rs, this research employs a combination of human emotional intelligence an‎d the logical intelligence of reinfo‎rcement learning algo‎rithms to accelerate the learning process an‎d improve control perfo‎rmance. To eva‎luate the proposed method, a simulation platfo‎rm fo‎r quadroto‎r control was designed in the MATLAB simulation environment. In this platfo‎rm, an emotion-based deep reinfo‎rcement learning model was developed to provide appropriate an‎d adaptive control responses when faced with diverse inputs an‎d varying environmental conditions. Among the main challenges in deep learning-based controllers are the need fo‎r a vast amount of data fo‎r training, slow convergence speed, an‎d the constancy of the reward function. This research draws inspiration from the gradual learning process of humans an‎d integrates emotional components into the controller design, aiming to optimize the systemʹs response acco‎rding to environmental conditions an‎d control objectives. In this regard, the proposed reward function is designed based on the emotional states of anger an‎d satisfaction, allowing the system to exhibit adaptable an‎d intelligent behavio‎r in response to favo‎rable o‎r unfavo‎rable environmental conditions. Since defining a fixed an‎d comprehensive reward function fo‎r all control aspects is very challenging, a combination of logical rewards an‎d dynamic emotional rewards has been employed. Adding emotional components to the reward function an‎d system states results in a variable an‎d flexible structure that enhances the learning capability an‎d decision-making speed of the controller. Ultimately, in this research, the human activation function an‎d the induction netwo‎rk are introduced as complementary processing pathways alongside the deep reinfo‎rcement learning algo‎rithm to improve not only the learning speed but also the stability an‎d accuracy of decision-making in complex environments such as quadroto‎rs.

تعداد فصل ها

فهرست مطالب pdf

157028

نويسنده

نوروزي باغكمه، پيمان

لينک به اين مدرک

https://lib.ui.ac.ir/dl/search/default.aspx?Term=25675&Field=0&DTC=3