چكيده لاتين
In recent years, machine learning and neural networks have founds their way into a variety of applications, ranging from medicine and industry to economics and security. With advancements in this field driving the need for higher accuracy, networks have grown deeper and the number of parameters has extended significantly. Consequently, the computational workload and complexity, especially within convolutional layers of convolutional neural networks (CNNs), have increased, making their implementation more challenging. Among various platforms such as CPUs, GPUs, ASICs, and FPGAs used for the inference stage in deep neural networks, FPGAs stand out due to their reprogrammable properties, ability to support parallel architectures, energy efficiency, and customizability. However, challenges remain, particularly regarding limited bit-widths for network’s weights due to resource constraints and memory limitations, which can impact the network’s final output accuracy.
The Double MAC architecture is an FPGA-based accelerator designed to enhance computational speed in CNNs. By placing two MAC (Multiply and Accumulate) operations within a single DSP (Digital Signal Processing) block in a SIMD (single Instruction Multiple Data) fashion, it produces two outputs simultaneously, effectively doubling the processing speed. This thesis focuses on improving network accuracy by adjusting the bit-width of weights within the Double MAC architecture. In this approach, the input to the DSP block dedicated to network weights is fully assigned to one weight from two distinct convolution filter channels, each using 8-bit precision and the second weight output is derived from the DSP output. The weights are then compared in pairs under four conditions: weight equality, one weight being the negative of the other, one weight being twice the other, and vice versa. The output for the second weight is obtained from the DSP output, its two’s complement, or a shifted DSP output as needed.
This structure was evaluated by implementing the 0.5 MobileNet-160 lightweight network with ImageNet Dataset on a Xilinx Virtex-6 FPGA board. Results show that by using 4% more memory and 15% more LUT (Look-Up Table) and FF (Flip-Flop), network accuracy improved by up to 7.5% compared to the standard configuration where all coefficients are 8-bit