A Hardware Accelerator for the Inference of a Convolutional Neural network

González , Edwin; Villamizar Luna , Walter D.; Fajardo Ariza, Carlos Augusto

Acelerador en hardware para la inferencia de una red neuronal convolucional

dc.creator	González , Edwin
dc.creator	Villamizar Luna , Walter D.
dc.creator	Fajardo Ariza, Carlos Augusto
dc.date	2019-11-12
dc.identifier	https://revistas.unimilitar.edu.co/index.php/rcin/article/view/4194
dc.identifier	10.18359/rcin.4194
dc.description	Convolutional Neural Networks (CNNs) are becoming increasingly popular in deep learning applications, e.g. image classification, speech recognition, medicine, to name a few. However, the CNN inference is computationally intensive and demanding a large among of memory resources. In this work is proposed a CNN inference hardware accelerator, which was implemented in a co-processing scheme. The aim is to reduce the hardware resources and achieve the better possible throughput. The design was implemented in the Digilent Arty Z7-20 development board, which is based on System on Chip (SoC) Zynq-7000 of Xilinx. Our implementation achieved a  of accuracy for the MNIST database using only 12-bits fixed-point format. The results show that the co-processing scheme operating at a conservative speed of 100 MHz can identify around 441 images per second, which is about 17% times faster than a 650 MHz - software implementation. It is difficult to compare our results against other implementations based on Field-Programmable Gate Array (FPGA), because the others implementations are not exactly like ours. However, some comparisons, regarding the logical resources used and accuracy, suggest that our work could be better than previous works.
dc.description	Las redes neuronales convolucionales cada vez son más populares en aplicaciones de aprendizaje profundo, como por ejemplo en clasificación de imágenes, reconocimiento de voz, medicina, entre otras. Sin embargo, estas redes son computacionalmente costosas y requieren altos recursos de memoria. En este trabajo se propone un acelerador en hardware para el proceso de inferencia de la red Lenet-5, un esquema de co-procesamiento hardware/software. El objetivo de la implementación es reducir el uso de recursos de hardware y obtener el mejor rendimiento computacional posible durante el proceso de inferencia. El diseño fue implementado en la tarjeta de desarrollo Digilent Arty Z7-20, la cual está basada en el System on Chip (SoC) Zynq-7000 de Xilinx. Nuestra implementación logró una precisión del 97.59% para la base de datos MNIST utilizando tan solo 12 bits en el formato de punto fijo. Los resultados muestran que el esquema de co-procesamiento, el cual opera a una velocidad de 100 MHz, puede identificar aproximadamente 441 imágenes por segundo, que equivale aproximadamente a un 17% más rápido que una implementación de software a 650 MHz. Es difícil comparar nuestra implementación con otras implementaciones similares, porque las implementaciones encontradas en la literatura no son exactamente como la que realizó en este trabajo. Sin embargo, algunas comparaciones, en relación con el uso de recursos lógicos y la precisión, sugieren que nuestro trabajo supera a trabajos previos.
dc.format	application/pdf
dc.format	text/xml
dc.language	eng
dc.publisher	Universidad Militar Nueva Granada
dc.relation	https://revistas.unimilitar.edu.co/index.php/rcin/article/view/4194/4084
dc.relation	https://revistas.unimilitar.edu.co/index.php/rcin/article/view/4194/4255
dc.relation	/ref/Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.https://doi.org/10.1109/5.726791
dc.relation	/ref/C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1-9, June 2015. https://doi.org/10.1109/CVPR.2015.7298594
dc.relation	/ref/A. Dundar, J. Jin, B. Martini, and E. Culurciello, "Embedded streaming deep neural networks accelerator with applications," IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 7, pp. 1572-1583, July 2017. https://doi.org/10.1109/TNNLS.2016.2545298
dc.relation	/ref/B. Ahn, "Real-time video object recognition using convolutional neural network," in 2015 International Joint Conference on Neural Networks (IJCNN), July 2015, pp. 1-7. https://doi.org/10.1109/IJCNN.2015.7280718
dc.relation	/ref/B. Yu, Y. Tsao, S. Yang, Y. Chen, and S. Chien, "Architecture design of convolutional neural networks for face detection on an fpga platform," in 2018 IEEE International Workshop on Signal Processing Systems (SiPS), Oct 2018, pp. 88-93.
dc.relation	/ref/Z. Xiong, M. K. Stiles, and J. Zhao, "Robust ecg signal classification for detection of atrial fibrillation using a novel neural network," in 2017 Computing in Cardiology (CinC), Sep. 2017, pp. 1-4. https://doi.org/10.22489/CinC.2017.066-138
dc.relation	/ref/K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang, and H. Yang, "Angel-eye: A complete design flow for mapping cnn onto embedded fpga," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 1, pp. 35-47, Jan 2018. https://doi.org/10.1109/TCAD.2017.2705069
dc.relation	/ref/N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J.-s. Seo, and Y. Cao, "Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA '16. New York, NY, USA: ACM, 2016, pp. 16-25. http://doi.acm.org/10.1145/2847263.2847276
dc.relation	/ref/C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA '15. New York, NY, USA: ACM, 2015, pp. 161-170. https://doi.org/10.1145/2684746.2689060
dc.relation	/ref/K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, and E. S. Chung, "Accelerating deep convolutional neural networks using specialized hardware," Microsoft Research Whitepaper, vol. 2, no. 11, pp. 1-4, 2015.
dc.relation	/ref/T. Tsai, Y. Ho, and M. Sheu, "Implementation of fpga-based accelerator for deep neural networks," in 2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits Systems (DDECS), April 2019, pp. 1-4. https://doi.org/10.1109/DDECS.2019.8724665
dc.relation	/ref/Y. Shen, M. Ferdman, and P. Milder. Maximizing cnn accelerator efficiency through resource partitioning. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 535-547, June 2017. https://doi.org/10.1145/3140659.3080221
dc.relation	/ref/Y. Wang, L. Xia, T. Tang, B. Li, S. Yao, M. Cheng, and H. Yang. Low power convolutional neural networks on a chip. In 2016 IEEE International Symposium on Circuits and Systems (ISCAS), pages 129-132, May 2016. https://doi.org/10.1109/ISCAS.2016.7527187
dc.relation	/ref/Gan Feng, Zuyi Hu, Song Chen, and Feng Wu. Energy-efficient and high-throughput fpga-based accelerator for convolutional neural networks. In 2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), pages 624-626, Oct 2016. https://doi.org/10.1109/ICSICT.2016.7998996
dc.relation	/ref/S. Ghaffari and S. Sharifian, "FPGA-based convolutional neural network accelerator design using high level synthesize," in Proceedings - 2016 2nd International Conference of Signal Processing and Intelligent Systems, ICSPIS 2016, 2017, pp. 1-6. https://doi.org/10.1109/ICSPIS.2016.7869873
dc.relation	/ref/Y. Zhou and J. Jiang, "An FPGA-based accelerator implementation for deep convolutional neural networks," in Proceedings of 2015 4th International Conference on Computer Science and Network Technology, ICCSNT 2015, 2015, vol. 01, no. Iccsnt, pp. 829-832. https://doi.org/10.1109/ICCSNT.2015.7490869
dc.relation	/ref/V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzman machines," in Proceedings of the 27th International Conference on International Conference on Machine Learning, ser. ICML'10. USA: Omnipress, 2010, pp. 807-814. [Online]. http://dl.acm.org/citation.cfm?id=3104322.3104425
dc.relation	/ref/Xilinx. Axi reference guide. [Online]. Available: https://www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf
dc.rights	Derechos de autor 2019 Ciencia e Ingeniería Neogranadina
dc.source	Ciencia e Ingenieria Neogranadina; Vol. 30 No. 1 (2020); 107-116
dc.source	Ciencia e Ingeniería Neogranadina; Vol. 30 Núm. 1 (2020); 107-116
dc.source	Ciencia e Ingeniería Neogranadina; v. 30 n. 1 (2020); 107-116
dc.source	1909-7735
dc.source	0124-8170
dc.subject	CNN
dc.subject	FPGA
dc.subject	Hardware accelerator
dc.subject	MNIST
dc.subject	Zynq
dc.subject	CNN, FPGA, Hardware Accelerator, MNIST, Zynq.
dc.title	A Hardware Accelerator for the Inference of a Convolutional Neural network
dc.title	Acelerador en hardware para la inferencia de una red neuronal convolucional
dc.type	info:eu-repo/semantics/article
dc.type	info:eu-repo/semantics/publishedVersion
dc.type	Text
dc.type	Texto

Este ítem pertenece a la siguiente institución

Universidad Militar Nueva Granada (Colombia)