Reconocimiento de escenas violentas en imágenes de CCTV utilizando aprendizaje profundo

Hernández Díaz, Kelly Gissela

Recognition of violent scenes in CCTV images using deep learning

dc.contributor	Ballesteros Larrotta, Dora Maria
dc.contributor	Renza Torres, Diego
dc.creator	Hernández Díaz, Kelly Gissela
dc.date	2023-04-18T16:01:28Z
dc.date	2023-04-18T16:01:28Z
dc.date	2021-05-10
dc.date.accessioned	2023-09-06T18:00:32Z
dc.date.available	2023-09-06T18:00:32Z
dc.identifier	http://hdl.handle.net/10654/43682
dc.identifier	instname:Universidad Militar Nueva Granada
dc.identifier	reponame:Repositorio Institucional Universidad Militar Nueva Granada
dc.identifier	repourl:https://repository.unimilitar.edu.co
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/8693810
dc.description	El uso cada vez más generalizado de sistemas de videovigilancia para identificar acciones o situaciones violentas en lugares como bancos, hospitales o avenidas, ha provocado la necesidad de implementar un método que permita el reconocimiento automático de este tipo de escenas con el fin de evitar posibles riesgos a la seguridad e integridad de las personas. Por lo anterior, en el presente trabajo se propone un modelo de detección y clasificación de escenas violentas en imágenes de CCTV, basado en aprendizaje profundo. Específicamente, se utilizó el conjunto de datos CHU Surveillance Violence Dataset (CSVD), que corresponde a imágenes de videos de CCTV clasificadas en acciones tanto violentas como no violentas. Se evaluaron cuatro modelos pre-entrenados: VGG16, MobileNet, Inception y ResNet50, y mediante transferencia de aprendizaje se seleccionaron distintos puntos de congelamiento en cada una de sus arquitecturas. Adicionalmente, se emplearon tres optimizadores: Adam, Adadelta y SGD, con el fin de comparar su impacto en la clasificación de las imágenes. Para la evaluación del desempeño de los modelos a nivel de validación, se consideraron los valores obtenidos en las métricas Accuracy, Precision y Recall. Como resultado, el modelo proveniente de Inception logró un mejor rendimiento en general, a diferencia del modelo proveniente de ResNet50, que presentó los valores de métricas más bajos.
dc.description	1 INTRODUCCIÓN 1.1 PLANTEAMIENTO DEL PROBLEMA 1.2 JUSTIFICACIÓN 1.3 PREGUNTA DE INVESTIGACIÓN 1.4 OBJETIVOS 1.4.1 Objetivo General 1.4.2 Objetivos Específicos 1.5 METODOLOGÍA 2 MARCO TEÓRICO 2.1 INTELIGENCIA ARTIFICIAL 2.2 APRENDIZAJE AUTOMÁTICO 2.2.5. Redes Neuronales Artificiales (ANN) 2.3. COMPUTER VISION 2.4. APRENDIZAJE PROFUNDO 2.4.1. Red Neuronal Convolucional (CNN) 2.4.2. Arquitecturas CNN 2.4.3. Hiperparámetros del modelo 2.4.4. Transferencia de aprendizaje 2.5 MÉTRICAS DE EVALUACIÓN 2.5.1 Matriz de confusión 2.5.2 Accuracy 2.5.3 Recall 2.5.4 Precision 3 ESTADO DEL ARTE 3.1 TÉCNICAS DE DETECCIÓN DE VIOLENCIA UTILIZANDO APRENDIZAJE PROFUNDO 3.2 CONJUNTOS DE DATOS PARA LA DETECCIÓN DE VIOLENCIA 4 FASE 1: ENTENDIMIENTO DEL PROBLEMA 5 FASE 2: COMPRENSIÓN Y PREPARACIÓN DE LOS DATOS 5.1 CONJUNTO DE DATOS CSVD 5.2 SELECCIÓN DE IMÁGENES PARA ENTRENAMIENTO Y VALIDACIÓN 5.3 PREPARACIÓN DE LOS DATOS: AUMENTO DE IMÁGENES 5.3.1 Rotación 5.3.2 Brillo 5.3.3 Desplazamiento de canal 5.3.4 Efecto espejo horizontal 5.3.5 Zoom 6 FASE 3: MODELADO 6.1 DISEÑO DEL PROTOCOLO DE PRUEBAS 6.1.1 Tipo de arquitectura 6.1.2 Profundidad de la red 6.1.3 Algoritmo de optimización 6.1.4 Métricas de evaluación 6.1.5 Diseño del protocolo de pruebas 6.2 DISEÑO DE LOS MODELOS 6.2.1 Elección del modelo pre-entrenado 6.2.2 Elección del punto de transferencia 6.2.3 Adición de capas top (FC) y de clasificación 6.3 AUMENTO DE DATOS CON IMAGEDATAGENERATOR DE KERAS 6.4 ENTRENAMIENTO DE LOS MODELOS 6.4.1 Compilación del modelo 6.4.2 Punto de control del modelo 6.4.3 Función model.fit() 7 FASE 4: EVALUACIÓN DE LOS MODELOS 7.1 DESEMPEÑO A NIVEL DE VALIDACIÓN 7.2 EVALUACIÓN DEL IMPACTO DEL MODELO 7.3 EVALUACIÓN DEL IMPACTO DEL OPTIMIZADOR 7.4 EVALUACIÓN DEL IMPACTO DE LA PROFUNDIDAD 8 CONCLUSIONES 9 REFERENCIAS 10 ANEXO A. RESULTADOS OBTENIDOS DE LAS MÉTRICAS DE EVALUACIÓN PARA CADA MODELO 11 ANEXO B. GRÁFICAS DE DESEMPEÑO A LO LARGO DE LAS ÉPOCAS
dc.description	The increasingly widespread use of video surveillance systems to identify violent actions or situations in places such as banks, hospitals or avenues, has led to the need to implement methods for automatic recognition of such scenes in order to avoid possible risks to the safety and integrity of people. Therefore, this paper proposes a model for detection and classification of violent scenes in CCTV images, based on deep learning. Specifically, the CHU Surveillance Violence Dataset (CSVD), which corresponds to CCTV video images classified into both violent and non-violent actions, was used. Four pre-trained models were evaluated: VGG16, MobileNet, Inception and ResNet50, and through transfer learning, different freezing points were selected in each of their architectures. Additionally, three optimizers, Adam, Adadelta and SGD, were used to compare their impact on image classification. To evaluate the performance of the models at the validation level, the values obtained in the Accuracy, Precision and Recall metrics were considered. As a result, the model from Inception achieved a better overall performance, unlike the model from ResNet50, which presented the lowest metric values.
dc.description	Pregrado
dc.description	L'utilisation de plus en plus répandue des systèmes de vidéosurveillance pour identifier des actions ou des situations violentes dans des lieux tels que les banques, les hôpitaux ou les avenues, a conduit à la nécessité de mettre en œuvre une méthode permettant la reconnaissance automatique de telles scènes afin d'éviter les risques éventuels pour la sécurité et l'intégrité des personnes. Par conséquent, cet article propose un modèle pour la détection et la classification des scènes violentes dans les images de vidéosurveillance, basé sur l'apprentissage profond. Plus précisément, le CHU Surveillance Violence Dataset (CSVD), qui correspond aux images vidéo CCTV classées en actions violentes et non violentes, a été utilisé. Quatre modèles pré-entraînés ont été évalués : VGG16, MobileNet, Inception et ResNet50, et différents points de gel dans chacune de leurs architectures ont été sélectionnés par apprentissage par transfert. En outre, trois optimiseurs, Adam, Adadelta et SGD, ont été utilisés pour comparer leur impact sur la classification des images. Pour évaluer la performance des modèles au niveau de la validation, les valeurs obtenues dans les métriques Accuracy, Precision et Recall ont été considérées. Par conséquent, le modèle provenant d'Inception a obtenu une meilleure performance globale, contrairement au modèle provenant de ResNet50, qui a présenté les valeurs métriques les plus faibles.
dc.format	applicaction/pdf
dc.format	application/pdf
dc.language	spa
dc.publisher	Ingeniería en Telecomunicaciones
dc.publisher	Facultad de Ingeniería
dc.publisher	Universidad Militar Nueva Granada
dc.relation	Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017). Understanding of a convolutional neural network. 2017 International Conference on Engineering and Technology (ICET). https://doi.org/10.1109/icengtechnol.2017.8308186
dc.relation	Alto, V. (2019, 5 julio). Neural Networks: parameters, hyperparameters and optimization strategies. Towards Data Science. Recuperado 14 de abril de 2022, de https://towardsdatascience.com/neural-networks-parameters-hyperparameters-and-optimization-strategies-3f0842fac0a5
dc.relation	Anber, S., Alsaggaf, W., & Shalash, W. (2022). A Hybrid Driver Fatigue and Distraction Detection Model Using AlexNet Based on Facial Features. Electronics, 11(2). https://doi.org/10.3390/electronics11020285
dc.relation	Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., & Sukthankar, R. (2011). Violence Detection in Video Using Computer Vision Techniques. Computer Analysis of Images and Patterns, 332–339. https://doi.org/10.1007/978-3-642-23678-5_39
dc.relation	Coming Lopez, D. J., & Lien, C. C. (2020). Real-Time Human Violent Activity Recognition Using Complex Action Decomposition. 2020 International Computer Symposium (ICS), 360–364. https://doi.org/10.1109/ics51289.2020.00078
dc.relation	Copeland, M. (2016, 29 julio). The Difference Between AI, Machine Learning, and Deep Learning? NVIDIA Blog. Recuperado 26 de marzo de 2022, de https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
dc.relation	Deng, J., Dong, W., Socher, R., Li, L. J., Kai Li, & Li Fei-Fei. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2009.5206848
dc.relation	Di, W., Bhardwaj, A., & Wei, J. (2018). Deep Learning Essentials: Your Hands-on Guide to the Fundamentals of Deep Learning and Neural Network Modeling. Packt Publishing.
dc.relation	Ditsanthia, E., Pipanmaekaporn, L., & Kamonsantiroj, S. (2018). Video Representation Learning for CCTV-Based Violence Detection. 2018 3rd Technology Innovation Management and Engineering Science International Conference (TIMES-iCON), 1–5. https://doi.org/10.1109/times-icon.2018.8621751
dc.relation	Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning [Libro electrónico]. The MIT Press. Recuperado 20 de junio de 2022, de https://www.deeplearningbook.org/
dc.relation	Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Adam, H., & Andreetto, M. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv. https://arxiv.org/pdf/1704.04861.pdf
dc.relation	IBM Cloud Education. (2020, 20 octubre). Convolutional Neural Networks. IBM. Recuperado 29 de marzo de 2022, de https://www.ibm.com/cloud/learn/convolutional-neural-networks
dc.relation	Image Augmentation on the fly using Keras ImageDataGenerator. (2020, 11 agosto). Analytics Vidhya. Recuperado 22 de abril de 2022, de https://www.analyticsvidhya.com/blog/2020/08/image-augmentation-on-the-fly-using-keras-imagedatagenerator/#h2_6
dc.relation	International Business Machines. (s. f.). What is Computer Vision? IBM. Recuperado 26 de marzo de 2022, de https://www.ibm.com/topics/computer-vision
dc.relation	Irfanullah, Hussain, T., Iqbal, A., Yang, B., & Hussain, A. (2022). Real time violence detection in surveillance videos using Convolutional Neural Networks. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-022-13169-4
dc.relation	Jain, A., & Vishwakarma, D. K. (2020). State-of-the-arts Violence Detection using ConvNets. 2020 International Conference on Communication and Signal Processing (ICCSP), 813–817. https://doi.org/10.1109/iccsp48568.2020.9182433
dc.relation	Jana, A., & Gopalakrishna, M. T. (2016). Violence Detection in Surveillance Video-A survey. International Journal of Latest Research in Engineering and Technology (IJLRET), 11–17. https://www.researchgate.net/publication/321873996_Violence_Detection_in_Surveillance_Video-A_survey
dc.relation	Kadre, S., & Konasani, V. R. (2021). Machine Learning and Deep Learning Using Python and TensorFlow (1.a ed.). McGraw-Hill Education. https://www.accessengineeringlibrary.com/content/book/9781260462296
dc.relation	Keras. (s. f.). Keras documentation: ModelCheckpoint. Recuperado 24 de abril de 2022, de https://keras.io/api/callbacks/model_checkpoint/
dc.relation	Kota, S. D. K. (2020, 17 mayo). Understanding Image Augmentation Using Keras(Tensorflow). Medium. Recuperado 22 de abril de 2022, de https://medium.com/analytics-vidhya/understanding-image-augmentation-using-keras-tensorflow-a6341669d9ca
dc.relation	Leiva Tarazona, A., & Ramírez Ríos, A. (2021). Efectos de la inseguridad Ciudadana en el bienestar de la población. Ciencia Latina Revista Científica Multidisciplinar, 5(3), 3341-3352. https://doi.org/10.37811/cl_rcm.v5i3.535
dc.relation	Liang, T., Glossner, J., Wang, L., Shi, S., & Zhang, X. (2021). Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461, 370–403. https://doi.org/10.1016/j.neucom.2021.07.045
dc.relation	Morales, G., Salazar-Reque, I., Telles, J., & Díaz, D. (2019). Detecting Violent Robberies in CCTV Videos Using Deep Learning. IFIP Advances in Information and Communication Technology, 282–291. https://doi.org/10.1007/978-3-030-19823-7_23
dc.relation	Mu, G., Cao, H., & Jin, Q. (2016). Violent Scene Detection Using Convolutional Neural Networks and Deep Audio Features. Communications in Computer and Information Science, 451–463. https://doi.org/10.1007/978-981-10-3005-5_37
dc.relation	Muggah, R., & Aguirre, K. (2018). Citizen Security in Latin America: The Hard Facts. Irapagué Institute, Strategic Paper, 33, 1-63.
dc.relation	Oficina de Análisis de Información y Estudios Estratégicos. (2019, diciembre). Evaluación del Sistema de videovigilancia de Bogotá D.C. https://scj.gov.co/sites/default/files/documentos_oaiee/Imapcto%20Videovigilancia%20en%20Bogot%C3%A1.pdf
dc.relation	Okewu, E., Adewole, P., & Sennaike, O. (2019). Experimental Comparison of Stochastic Optimizers in Deep Learning. Computational Science and Its Applications – ICCSA 2019, 704–715. https://doi.org/10.1007/978-3-030-24308-1_55
dc.relation	Ongsulee, P. (2017). Artificial intelligence, machine learning and deep learning. 2017 15th International Conference on ICT and Knowledge Engineering (ICT&KE), 1–6. https://doi.org/10.1109/ictke.2017.8259629
dc.relation	Ramzan, M., Abid, A., Khan, H. U., Awan, S. M., Ismail, A., Ahmed, M., Ilyas, M., & Mahmood, A. (2019). A Review on State-of-the-Art Violence Detection Techniques. IEEE Access, 7, 107560–107575. https://doi.org/10.1109/access.2019.2932114
dc.relation	Rollins, J. B. (2015). Metodología Fundamental para la Ciencia de Datos. IBM Analytics. Recuperado 22 de marzo de 2022, de https://www.ibm.com/downloads/cas/WKK9DX51
dc.relation	Sarkar, D., Bali, R., & Ghosh, T. (2018). Hands-On Transfer Learning with Python: Implement Advanced Deep Learning and Neural Network Models Using TensorFlow and Keras. Packt Publishing, Limited.
dc.relation	Sharma, M., & Baghel, R. (2020). Video Surveillance for Violence Detection Using Deep Learning. Advances in Data Science and Management, 411–420. https://doi.org/10.1007/978-981-15-0978-0_40
dc.relation	Soliman, M. M., Kamal, M. H., El-Massih Nashed, M. A., Mostafa, Y. M., Chawky, B. S., & Khattab, D. (2019). Violence Recognition from Videos using Deep Learning Techniques. 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), 80–85. https://doi.org/10.1109/icicis46948.2019.9014714
dc.relation	Subsecretaría de Inversiones y Fortalecimiento de Capacidades Operativas. (2020). Ampliación concepto de Línea de Inversión Local para la dotación con recursos tecnológicos para la seguridad. http://www.sdp.gov.co/sites/default/files/anexo_1._dotacion_con_recursos_teconologicos_para_seguridad.pdf
dc.relation	Tay, N. C., Connie, T., Ong, T. S., Goh, K. O. M., & Teh, P. S. (2018). A Robust Abnormal Behavior Detection Method Using Convolutional Neural Network. Lecture Notes in Electrical Engineering, 37–47. https://doi.org/10.1007/978-981-13-2622-6_4
dc.relation	Sudhakaran, S., & Lanz, O. (2017, August). Learning to detect violent videos using convolutional long short-term memory. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 1-6). IEEE.
dc.relation	Ullah, F. U. M., Ullah, A., Muhammad, K., Haq, I. U., & Baik, S. W. (2019). Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network. Sensors, 19(11), 2472. MDPI AG. http://dx.doi.org/10.3390/s19112472
dc.relation	Vasilev, I., Slater, D., Spacagna, G., Roelants, P., & Zocca, V. (2019). Python Deep Learning: Exploring Deep Learning Techniques and Neural Network Architectures with PyTorch, Keras, and TensorFlow (2.a ed.). Packt Publishing.
dc.relation	Vijeikis, R., Raudonis, V., & Dervinis, G. (2022). Efficient Violence Detection in Surveillance. Sensors, 22(6). https://doi.org/10.3390/s22062216
dc.relation	Zhang, A., Lipton, C. Z., Li, M., & Smola, J. A. (2021). Dive into Deep Learning [Libro electrónico]. arXiv preprint arXiv:2106. 11342. Recuperado 27 de marzo de 2022, de https://d2l.ai/index.html
dc.relation	Zhou, P., Ding, Q., Luo, H., & Hou, X. (2017). Violent Interaction Detection in Video Based on Deep Learning. Journal of Physics: Conference Series, 844. https://doi.org/10.1088/1742-6596/844/1/012044
dc.rights	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	http://purl.org/coar/access_right/c_abf2
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights	Acceso abierto
dc.subject	COMPRESION DE IMAGENES
dc.subject	COMPRESION DE VIDEOS
dc.subject	VIOLENCIA
dc.subject	INTELIGENCIA ARTIFICIAL
dc.subject	violence recognition
dc.subject	image classification
dc.subject	deep learning
dc.subject	transfer learning
dc.subject	clasificación de imágenes
dc.subject	transferencia de aprendizaje
dc.subject	aprendizaje profundo
dc.subject	identificación de violencia
dc.title	Reconocimiento de escenas violentas en imágenes de CCTV utilizando aprendizaje profundo
dc.title	Recognition of violent scenes in CCTV images using deep learning
dc.type	Tesis/Trabajo de grado - Monografía - Pregrado
dc.type	info:eu-repo/semantics/bachelorThesis
dc.type	http://purl.org/coar/resource_type/c_7a1f
dc.type	info:eu-repo/semantics/acceptedVersion
dc.coverage	Calle 100

Este ítem pertenece a la siguiente institución

Universidad Militar Nueva Granada (Colombia)