چكيده فارسي
Over the past few years, deep learning advances have opened up new avenues for improved scientific methods, autonomous systems, and smart applications across fields. Although object detection algorithms have been widely used for object detection from images, they might not be suitable for certain applications, e.g., tumor detection in radiology or autonomous driving applications. In these scenarios, a more accurate method is required, i.e., pixel-level object recognition. Image segmentation, therefore, presents a viable solution for these applications. Semantic segmentation is a machine vision method based on convolutional neural networks (CNNs) for labeling the objects in an image.
In view of the strong computational requirements and power consumption with deep neural networks, ZYNQ has been utilized in this research for the purpose of hardware acceleration. The UNet and PIDNet models have been implemented on the ZYNQ platform using the Cityscapes and CamVid datasets. The UNet model has been implemented in TensorFlow and PyTorch environments on the CamVid dataset, the results of which have been comprehensively evaluated. On the other hand, the PIDNet model has been trained on the Cityscapes dataset using a PyTorch environment and transferred onto ZYNQ.
As not all the operations of PIDNet are accelerated by the DPU (Deep Learning Processing Unit), the DPU and CPU subgraphs become more in number, resulting in constant data transfers between the CPU and DPU. This results in increased power consumption, longer execution time, and reduced inference speed. In order to address this problem, a solution has been proposed by altering the implementation of the network. In this solution, the PIDNet architecture is reconfigured by substituting unsupported operations with their DPU-supported alternatives. These substitutions involve substitution of the sum operation along the channel dimension, substitution of heterogeneous multiplication and subtraction operations, substitution of interpolation functions with a scale factor of more than 8, substitution of the Sigmoid activation function, and elimination of the ReLU activation function. Although these substitutions compromise the accuracy of the network slightly, they improve inference speed on ZYNQ substantially.
In the proposed framework, the mIoU showed a decrease from 76.15% to 74.58%. The inference speed, on the other hand, showed an improvement from 0.23 FPS to 2.16 FPS.