يادگيري تقويتي عميق معكوس عاطفي در كاربردهاي رباتيك: يك مطالعه موردي بر روي ربات انسان‌نما

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي مكاترونيك

دانشكده

فني و مهندسي

تاريخ دفاع

1403/06/10

صفحه شمار

241 ص.

استاد راهنما

حامد شهبازي

كليدواژه فارسي

ساختار عاطفي , يادگيري تقويتي , الگوريتم TD3 , الگوريتم GAIL , ربات انسان‌نما

چكيده فارسي

يكي از چالش‌هاي اصلي در كنترل ربات‌ها با استفاده از تكنيك‌هاي يادگيري تقويتي، تسريع فرآيند يادگيري است. اين پژوهش به طراحي يك كنترل‌كننده‌ يادگيرنده با ساختار يادگيري تقويتي عميق معكوس عاطفي مبتني بر الگوريتم TD3 براي تسريع فرآيند يادگيري و آموزش مي‌پردازد. اين كنترل‌كننده‌ در دو ساختار با و بدون استفاده از تابع پاداش (با استفاده از ساختار الگوريتم GAIL) طراحي شده‌است. نوآوري‌هاي مختلفي در اين پژوهش براي تسريع يادگيري و كاهش سعي و خطا در آموزش به كار رفته‌اند. نخست، در نوآوري Transfer با تنظيم وزن‌هاي اوليه شبكه‌هاي عصبي TD3 با دانش پيشين مربي، نشان داده شده كه اين روش مي‌تواند نقش عاطفه را در كاهش سعي و خطا در يادگيري ايفا كند. دوم، نوآوري EDC تابع هزينه را نه تنها از لايه انتهايي، بلكه از لايه مياني نيز براي محاسبه ارزش انجام عمل در حالت خاص به كار مي‌گيرد. سوم، نوآوري در ساختار يادگيري تقويتي عميق معكوس مبتني بر الگوريتم GAIL است كه با محدودسازي كرانه پاداش، كاركرد اين الگوريتم را در تركيب با TD3 و بدون استفاده از تابع پاداش بهبود مي‌بخشد. نتايج اين پژوهش نشان مي‌دهند كه نوآوري‌هاي مطرح شده مي‌توانند به تسريع فرآيند يادگيري و كاهش سعي و خطا در رسيدن به سياست بهينه كمك كنند. كنترل‌كننده‌هاي طراحي شده در محيط‌هاي مختلفي از جمله پاندول معكوس (در حالت‌هاي Balancing و Swing Up)، پاندول معكوس دوگانه، ربات چيتا و ربات هاپر جهت راستي‌آزمايي و اعتبارسنجي شبيه‌سازي شده‌اند. همچنين، مطالعه موردي بر روي ربات انسان‌نما انجام شده‌است. اين پژوهش با ادامه شبيه‌سازي‌ها در محيط‌هاي دو بعدي BipedalWalker و Walker2D با 4 و 6 درجه آزادي آغاز شده و تا شبيه‌سازي بر روي ربات PLEN2 به صورت سه‌بعدي و 18 درجه آزادي ادامه يافته‌است. به منظور پياده‌سازي، ربات PLEN2 ساخته شده و محيط شبيه‌سازي با واقعيت ادغام و الگوي راه‌رفتن آن در ساختار كنترل كلاسيك و الگوريتم‌هاي يادگيري انجام شده‌است. نتايج نشان مي‌دهند كه استفاده از نوآوري‌هاي اين پژوهش مي‌تواند بهبود قابل توجهي در فرآيند و سرعت يادگيري در شبيه‌سازي‌هاي دو بعدي و سه‌بعدي داشته باشد.

كليدواژه لاتين

Emotional Structure , Reinforcement Learning , TD3 Algorithm , GAIL Algorithm , Humanoid Robot

عنوان لاتين

Emotional Inverse Deep Reinforcement Learning in Case Study Humanoid Robots

گروه آموزشي

مهندسي مكانيك

چكيده لاتين

One of the main challenges in controlling robots using reinforcement learning techniques is accelerating the learning process. This research focuses on designing a learner controller based on an emotional deep reinforcement learning structure using the TD3 algorithm to expedite the learning and training process. The controller is designed in two structures, with and without the use of a reward function (utilizing the GAIL algorithm structure). Various innovations in this research have been employed to accelerate learning and reduce trial and error in training. First, in the Transfer innovation, by adjusting the initial weights of the TD3 neural networks with prior knowledge from the trainer, it has been shown that this method can play a role in reducing trial and error through emotional influence. Second, the EDC innovation uses the cost function not only from the final layer but also from the intermediate layer to calculate the value of performing an action in a specific state. Third, the innovation in the emotional deep reinforcement learning structure based on the GAIL algorithm improves the functionality of this algorithm when combined with TD3 and without using a reward function by constraining the reward boundary. The results of this research indicate that the proposed innovations can help accelerate the learning process and reduce trial and error in achieving optimal policies. The designed controllers have been validated and verified through simulations in various environments, including an inverted pendulum (in both Balancing and Swing Up modes), a double inverted pendulum, a cheetah robot, and a hopper robot. Additionally, a case study has been conducted on a humanoid robot. This research began with simulations in two-dimensional environments such as BipedalWalker and Walker2D with 4 and 6 degrees of freedom, respectively, and continued with three-dimensional simulations on the PLEN2 robot with 18 degrees of freedom. For implementation, the PLEN2 robot was constructed, and the simulation environment was integrated with reality, allowing its walking pattern to be developed within both classical control structures and learning algorithms. The results indicate that utilizing the innovations from this research can significantly improve the learning process and speed in both two-dimensional and three-dimensional simulations.

تعداد فصل ها

فهرست مطالب pdf

34472

نويسنده

اميرخاني ورنوسفادراني، مسعود

لينک به اين مدرک

https://lib.ui.ac.ir/dl/search/default.aspx?Term=23749&Field=0&DTC=3