A Hardware Accelerator for the Inference of a Convolutional Neural network
Acelerador en hardware para la inferencia de una red neuronal convolucional
dc.creator | González , Edwin | |
dc.creator | Villamizar Luna , Walter D. | |
dc.creator | Fajardo Ariza, Carlos Augusto | |
dc.date | 2019-11-12 | |
dc.identifier | https://revistas.unimilitar.edu.co/index.php/rcin/article/view/4194 | |
dc.identifier | 10.18359/rcin.4194 | |
dc.description | Convolutional Neural Networks (CNNs) are becoming increasingly popular in deep learning applications, e.g. image classification, speech recognition, medicine, to name a few. However, the CNN inference is computationally intensive and demanding a large among of memory resources. In this work is proposed a CNN inference hardware accelerator, which was implemented in a co-processing scheme. The aim is to reduce the hardware resources and achieve the better possible throughput. The design was implemented in the Digilent Arty Z7-20 development board, which is based on System on Chip (SoC) Zynq-7000 of Xilinx. Our implementation achieved a of accuracy for the MNIST database using only 12-bits fixed-point format. The results show that the co-processing scheme operating at a conservative speed of 100 MHz can identify around 441 images per second, which is about 17% times faster than a 650 MHz - software implementation. It is difficult to compare our results against other implementations based on Field-Programmable Gate Array (FPGA), because the others implementations are not exactly like ours. However, some comparisons, regarding the logical resources used and accuracy, suggest that our work could be better than previous works. | |
dc.description | Las redes neuronales convolucionales cada vez son más populares en aplicaciones de aprendizaje profundo, como por ejemplo en clasificación de imágenes, reconocimiento de voz, medicina, entre otras. Sin embargo, estas redes son computacionalmente costosas y requieren altos recursos de memoria. En este trabajo se propone un acelerador en hardware para el proceso de inferencia de la red Lenet-5, un esquema de co-procesamiento hardware/software. El objetivo de la implementación es reducir el uso de recursos de hardware y obtener el mejor rendimiento computacional posible durante el proceso de inferencia. El diseño fue implementado en la tarjeta de desarrollo Digilent Arty Z7-20, la cual está basada en el System on Chip (SoC) Zynq-7000 de Xilinx. Nuestra implementación logró una precisión del 97.59% para la base de datos MNIST utilizando tan solo 12 bits en el formato de punto fijo. Los resultados muestran que el esquema de co-procesamiento, el cual opera a una velocidad de 100 MHz, puede identificar aproximadamente 441 imágenes por segundo, que equivale aproximadamente a un 17% más rápido que una implementación de software a 650 MHz. Es difícil comparar nuestra implementación con otras implementaciones similares, porque las implementaciones encontradas en la literatura no son exactamente como la que realizó en este trabajo. Sin embargo, algunas comparaciones, en relación con el uso de recursos lógicos y la precisión, sugieren que nuestro trabajo supera a trabajos previos. | |
dc.format | application/pdf | |
dc.format | text/xml | |
dc.language | eng | |
dc.publisher | Universidad Militar Nueva Granada | |
dc.relation | https://revistas.unimilitar.edu.co/index.php/rcin/article/view/4194/4084 | |
dc.relation | https://revistas.unimilitar.edu.co/index.php/rcin/article/view/4194/4255 | |
dc.relation | /*ref*/Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.https://doi.org/10.1109/5.726791 | |
dc.relation | /*ref*/C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1-9, June 2015. https://doi.org/10.1109/CVPR.2015.7298594 | |
dc.relation | /*ref*/A. Dundar, J. Jin, B. Martini, and E. Culurciello, "Embedded streaming deep neural networks accelerator with applications," IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 7, pp. 1572-1583, July 2017. https://doi.org/10.1109/TNNLS.2016.2545298 | |
dc.relation | /*ref*/B. Ahn, "Real-time video object recognition using convolutional neural network," in 2015 International Joint Conference on Neural Networks (IJCNN), July 2015, pp. 1-7. https://doi.org/10.1109/IJCNN.2015.7280718 | |
dc.relation | /*ref*/B. Yu, Y. Tsao, S. Yang, Y. Chen, and S. Chien, "Architecture design of convolutional neural networks for face detection on an fpga platform," in 2018 IEEE International Workshop on Signal Processing Systems (SiPS), Oct 2018, pp. 88-93. | |
dc.relation | /*ref*/Z. Xiong, M. K. Stiles, and J. Zhao, "Robust ecg signal classification for detection of atrial fibrillation using a novel neural network," in 2017 Computing in Cardiology (CinC), Sep. 2017, pp. 1-4. https://doi.org/10.22489/CinC.2017.066-138 | |
dc.relation | /*ref*/K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang, and H. Yang, "Angel-eye: A complete design flow for mapping cnn onto embedded fpga," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 1, pp. 35-47, Jan 2018. https://doi.org/10.1109/TCAD.2017.2705069 | |
dc.relation | /*ref*/N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J.-s. Seo, and Y. Cao, "Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA '16. New York, NY, USA: ACM, 2016, pp. 16-25. http://doi.acm.org/10.1145/2847263.2847276 | |
dc.relation | /*ref*/C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA '15. New York, NY, USA: ACM, 2015, pp. 161-170. https://doi.org/10.1145/2684746.2689060 | |
dc.relation | /*ref*/K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, and E. S. Chung, "Accelerating deep convolutional neural networks using specialized hardware," Microsoft Research Whitepaper, vol. 2, no. 11, pp. 1-4, 2015. | |
dc.relation | /*ref*/T. Tsai, Y. Ho, and M. Sheu, "Implementation of fpga-based accelerator for deep neural networks," in 2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits Systems (DDECS), April 2019, pp. 1-4. https://doi.org/10.1109/DDECS.2019.8724665 | |
dc.relation | /*ref*/Y. Shen, M. Ferdman, and P. Milder. Maximizing cnn accelerator efficiency through resource partitioning. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 535-547, June 2017. https://doi.org/10.1145/3140659.3080221 | |
dc.relation | /*ref*/Y. Wang, L. Xia, T. Tang, B. Li, S. Yao, M. Cheng, and H. Yang. Low power convolutional neural networks on a chip. In 2016 IEEE International Symposium on Circuits and Systems (ISCAS), pages 129-132, May 2016. https://doi.org/10.1109/ISCAS.2016.7527187 | |
dc.relation | /*ref*/Gan Feng, Zuyi Hu, Song Chen, and Feng Wu. Energy-efficient and high-throughput fpga-based accelerator for convolutional neural networks. In 2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), pages 624-626, Oct 2016. https://doi.org/10.1109/ICSICT.2016.7998996 | |
dc.relation | /*ref*/S. Ghaffari and S. Sharifian, "FPGA-based convolutional neural network accelerator design using high level synthesize," in Proceedings - 2016 2nd International Conference of Signal Processing and Intelligent Systems, ICSPIS 2016, 2017, pp. 1-6. https://doi.org/10.1109/ICSPIS.2016.7869873 | |
dc.relation | /*ref*/Y. Zhou and J. Jiang, "An FPGA-based accelerator implementation for deep convolutional neural networks," in Proceedings of 2015 4th International Conference on Computer Science and Network Technology, ICCSNT 2015, 2015, vol. 01, no. Iccsnt, pp. 829-832. https://doi.org/10.1109/ICCSNT.2015.7490869 | |
dc.relation | /*ref*/V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzman machines," in Proceedings of the 27th International Conference on International Conference on Machine Learning, ser. ICML'10. USA: Omnipress, 2010, pp. 807-814. [Online]. http://dl.acm.org/citation.cfm?id=3104322.3104425 | |
dc.relation | /*ref*/Xilinx. Axi reference guide. [Online]. Available: https://www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf | |
dc.rights | Derechos de autor 2019 Ciencia e Ingeniería Neogranadina | |
dc.source | Ciencia e Ingenieria Neogranadina; Vol. 30 No. 1 (2020); 107-116 | |
dc.source | Ciencia e Ingeniería Neogranadina; Vol. 30 Núm. 1 (2020); 107-116 | |
dc.source | Ciencia e Ingeniería Neogranadina; v. 30 n. 1 (2020); 107-116 | |
dc.source | 1909-7735 | |
dc.source | 0124-8170 | |
dc.subject | CNN | |
dc.subject | FPGA | |
dc.subject | Hardware accelerator | |
dc.subject | MNIST | |
dc.subject | Zynq | |
dc.subject | CNN, FPGA, Hardware Accelerator, MNIST, Zynq. | |
dc.title | A Hardware Accelerator for the Inference of a Convolutional Neural network | |
dc.title | Acelerador en hardware para la inferencia de una red neuronal convolucional | |
dc.type | info:eu-repo/semantics/article | |
dc.type | info:eu-repo/semantics/publishedVersion | |
dc.type | Text | |
dc.type | Texto |