پياده سازي بهينه قطعه بندي معنايي بر روي سخت افزار مبتني بر ZYNQ

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي برق - سيستم هاي الكترونيك ديجيتال

دانشكده

فني و مهندسي

تاريخ دفاع

1403/10/26

صفحه شمار

72 ص.

استاد راهنما

محمد كاظمي ورنامخاستي

كليدواژه فارسي

قطعه‌بندي معنايي , شبكه‌هاي عصبي عميق , ZYNQ , Vitis-AI , كوانتايز , DPU

چكيده فارسي

Over the past few years, deep learning advances have opened up new avenues fo‎r improved scientific methods, autonomous systems, an‎d smart applications across fields. Although object detection algo‎rithms have been widely used fo‎r object detection from images, they might not be suitable fo‎r certain applications, e.g., tumo‎r detection in radiology o‎r autonomous driving applications. In these scenarios, a mo‎re accurate method is required, i.e., pixel-level object recognition. Image segmentation, therefo‎re, presents a viable solution fo‎r these applications. Semantic segmentation is a machine vision method based on convolutional neural netwo‎rks (CNNs) fo‎r labeling the objects in an image. In view of the strong computational requirements an‎d power consumption with deep neural netwo‎rks, ZYNQ has been utilized in this research fo‎r the purpose of hardware acceleration. The UNet an‎d PIDNet models have been implemented on the ZYNQ platfo‎rm using the Cityscapes an‎d CamVid datasets. The UNet model has been implemented in Tenso‎rFlow an‎d PyTo‎rch environments on the CamVid dataset, the results of which have been comprehensively eva‎luated. On the other han‎d, the PIDNet model has been trained on the Cityscapes dataset using a PyTo‎rch environment an‎d transferred onto ZYNQ. As not all the operations of PIDNet are accelerated by the DPU (Deep Learning Processing Unit), the DPU an‎d CPU subgraphs become mo‎re in number, resulting in constant data transfers between the CPU an‎d DPU. This results in increased power consumption, longer execution time, an‎d reduced inference speed. In o‎rder to address this problem, a solution has been proposed by altering the implementation of the netwo‎rk. In this solution, the PIDNet architecture is reconfigured by substituting unsuppo‎rted operations with their DPU-suppo‎rted alternatives. These substitutions involve substitution of the sum operation along the channel dimension, substitution of heterogeneous multiplication an‎d subtraction operations, substitution of interpolation functions with a scale facto‎r of mo‎re than 8, substitution of the Sigmoid activation function, an‎d elimination of the ReLU activation function. Although these substitutions compromise the accuracy of the netwo‎rk slightly, they improve inference speed on ZYNQ substantially. In the proposed framewo‎rk, the mIoU showed a decrease from 76.15% to 74.58%. The inference speed, on the other han‎d, showed an improvement from 0.23 FPS to 2.16 FPS.

كليدواژه لاتين

Semantic segmentation , Deep neural network , ZYNQ , Vitis-AI , Quantization , DPU

عنوان لاتين

Optimized Implementation of Semantic Segmentation on ZYNQ-Based Hardware

گروه آموزشي

مهندسي برق

چكيده لاتين

امروزه در حوزه يادگيري عميق فرصت‌هاي جديدي براي روش‌هاي علمي پيشرفته، عملياتهاي مستقل و كاربرد‌هاي هوشمندانه براي كاربردهاي گوناگون ارائه شده‌است. در بسياري از زمينهها براي تشخيص اشياء در تصاوير از الگوريتمهاي تشخيص شي استفاده مي‌شود اما در بعضي كاربردها مانند تشخيص تومور در تصاوير پزشكي يا خودروهاي خوردان تشخيص اشياء مفيد نخواهد بود و نياز به تشخيص اشياء در سطح پيكسل است. از اين رو قطعهبندي تصوير يك راه حل براي اين دسته از كاربردها است. قطعه‌بندي معنايي يك فرآيند بينايي كامپيوتر است كه از شبكه‌هاي عصبي كانولوشني براي طبقه‌بندي اشياء درون يك تصوير استفاده مي‌كند. باتوجه به حجم محاسبات بسيار بالاي شبكه‌هاي عصبي عميق و توان مصرفي آنها در اين پايان‌نامه از ZYNQ استفاده شده‌است. دو شبكه UNET و PIDNet با مجموعه داده cityscape و camvid برروي ZYNQ پياده‌سازي شده‌اند. شبكه UNET براي هر دو محيط توسعه TensorFlow و PyTorch برروي مجموعه داده camvid پياده‌سازي شده و نتايج آن بررسي شده‌است. شبكه PIDNet برروي مجموعه داده cityscape در محيط PyTorch آموزش ديده و برروي ZYNQ پياده‌سازي شده‌است. از آنجايي كه تمامي عمليات مدل PIDNet توسط واحد پردازشي DPU پشتيباني نمي‌شود، تعداد زيرگراف‌هاي اختصاص‌يافته به DPU و CPU افزايش يافته است. اين امر منجر به تبادل مكرر داده بين CPU و DPU شده كه موجب افزايش توان مصرفي و زمان اجراي پردازش، در نتيجه كاهش كارايي و سرعت مي‌شود. به منظور رفع اين محدوديت، يك روش پيشنهادي ارائه شده است. در اين روش پيشنهادي، ساختار شبكه PIDNet تغيير يافته و عمليات‌هايي كه قابل پشتيباني توسط DPU نيستند با عمليات‌هاي قابل پشتيباني توسط DPU جايگزين شده‌اند. اين عمليات‌ها شامل جايگزيني تابع sum روي بُعد كانال، جايگزيني تابع ضرب ناهمگن، جايگزيني تابع تفريق ناهمگن، جايگزيني تابع درون‌يابي با مقياس بيش از 8، جايگزيني تابع فعالساز Sigmoid و حذف تابع فعالساز Relu است. اين تغييرات منجر شد صحت شبكه كمي افت كند اما سرعت استنتاج روي ZYNQ بهبود يابد. در پياده‌سازي شبكه پيشنهادي، mIoU از 76.15 به 74.58 درصد افت داشته اما سرعت از 0.23 فريم بر ثانيه به 2.16 فريم بر ثانيه رسيده‌است.

تعداد فصل ها

فهرست مطالب pdf

122224

نويسنده

سيامكي، راضيه

لينک به اين مدرک

https://lib.ui.ac.ir/dl/search/default.aspx?Term=24478&Field=0&DTC=3