dc.contributor | Salcedo Parra, Octavio José | |
dc.contributor | Salazar Herrera, Carlos Alberto | |
dc.creator | Céspedes Maestre, María Martha | |
dc.date.accessioned | 2021-06-25T03:01:19Z | |
dc.date.available | 2021-06-25T03:01:19Z | |
dc.date.created | 2021-06-25T03:01:19Z | |
dc.date.issued | 2021-05 | |
dc.identifier | https://repositorio.unal.edu.co/handle/unal/79722 | |
dc.identifier | Universidad Nacional de Colombia | |
dc.identifier | Repositorio Institucional Universidad Nacional de Colombia | |
dc.identifier | https://repositorio.unal.edu.co/ | |
dc.description.abstract | En la actualidad, los ciberdelincuentes perpetran ataques web de forma sencilla, en los que aplican diferentes vectores para poner en peligro la seguridad de la información y en los que entienden al ser humano como un flanco fácil para lograr sus objetivos. Generalmente, los usuarios de internet deben realizar una acción que permita el éxito del ataque, por ejemplo, dar clic a alguna URL. Es por lo anterior, que muchos esfuerzos están dirigidos a encontrar técnicas que mitiguen esta problemática y se apuestan grandes cantidades de dinero en generar soluciones.
Tomando como referencia el uso de listas negras, la clasificación heurística, y, prestando especial atención a las técnicas de aprendizaje automático capaces de detectar ataques de día cero, en el presente trabajo se despliega un diseño de detección de URLs maliciosas, haciendo uso de criterios léxicos y de ofuscación de la URL. Estas, clasificadas por medio de técnicas de aprendizaje automático como Logistic Regression, Support Vector Machine y Random Forest; demostrando que los tres clasificadores implementados mantienen una relación de eficacia y rendimiento con porcentajes de precisión del 98%, y, tiempos de respuesta satisfactorio. Es preciso aclarar que Random Forest puede estar sujeto a mejoras, ya que se pretende detectar de manera automática las URLs maliciosas y este clasificador tarda en promedio 16 segundos en hacerlo. Como resultado general del diseño, se obtiene un modelo de libre distribución que puede ser utilizado de forma masiva por diferentes usuarios en la red, capaz de detectar de forma precisa URLs maliciosas. | |
dc.description.abstract | Today, cybercriminals carry out web attacks in a simple way, in which they apply different vectors to endanger information security and in which they understand the human being as an easy flank to achieve their objectives. Generally, Internet users must take an action that allows the attack to succeed, for example, clicking on a URL. This is why many efforts are aimed at finding techniques that mitigate this problem and large amount of money are bet on generating solutions. Taking as a reference the use of blacklists, heuristic classification, and, paying special attention to machine learning techniques capable of detecting zero-day attacks, in this work a design for detecting malicious URLs is deployed, making use of criteria Lexical and URL obfuscation. These, classified by means of machine learning techniques such as Logistic Regression, Support Vector Machine and Random Forest; demonstrating that the three implemented classifiers maintain an efficiency and performance ratio with 98% accuracy percentages, and satisfactory response times. It should be clarified that Random Forest may be subject to improvements, since it is intended to automatically detect malicious URLs and this classifier takes an average of 16 seconds to do so. As a general result of the design, a free distribution model is obtained that can be used an masse by different users on the network, capable of accurately detecting malicious URLs. | |
dc.language | spa | |
dc.publisher | Universidad Nacional de Colombia | |
dc.publisher | Bogotá - Ingeniería - Maestría en Ingeniería - Telecomunicaciones | |
dc.publisher | Departamento de Ingeniería de Sistemas e Industrial | |
dc.publisher | Facultad de Ingeniería | |
dc.publisher | Bogotá, Colombia | |
dc.publisher | Universidad Nacional de Colombia - Sede Bogotá | |
dc.relation | [API navegación segura Google, 2010]API de navegación segura de Google: Google Code ,2010, [en línea] Disponible: http://code.google.com/apis/safebrowsing/ | |
dc.relation | [Akiyama et al., 2017] Akiyama, M., Yagi, T., Yada, T., Mori, T., & Kadobayashi, Y. (2017). Analyzing the ecosystem of malicious URL redirection through longitudinal observation from honeypots. Computers & Security, 69, 155–173. doi:10.1016/j.cose.2017.01.003 | |
dc.relation | [Bahnsen et al., 2017] Bahnsen, Alejandro Correa; Bohorquez, Eduardo Contreras; Villegas, Sergio; Vargas, Javier; Gonzalez, Fabio A. (2017). [IEEE 2017 APWG Symposium on Electronic Crime Research (eCrime) - Pheonix, AZ, USA (2017.4.25-2017.4.27)] 2017 APWG Symposium on Electronic Crime Research (eCrime) - Classifying phishing URLs using recurrent neural networks. , (), 1–8.doi:10.1109/ECRIME.2017.7945048. | |
dc.relation | [Basit et al., 2020] Basit, A., Zafar, M., Liu, X. Una encuesta completa de las técnicas de detección de ataques de phishing habilitadas por IA. Telecommun Syst (2020). doi:10.1007/s11235-020-00733-2 | |
dc.relation | [Berners-Lee, et al., 1994] Berners-Lee, T., Masinter, L., McCahil, M. (1994). “Uniform Resource Locators (URL)” , RFC 1738, diciembre de 1994 | |
dc.relation | [Berners-Lee, 1994] Berners-Lee, T. (1994). “Universal Resource Identifiers in WWW: A Sintaxis unificadora para la expresión de nombres y direcciones de Objetos en la red utilizados en la World-Wide Web”, RFC 1630, CERN, junio de 1994. | |
dc.relation | [Berners-Lee, 2005] Berners-Lee, T. (2005). “Uniform Resource Identifier (URI): Generic Syntax”, RFC 3986, CERN, enero de 2005. | |
dc.relation | [Bezzera and Feitosa, 2015] Bezzera, M. Feitosa, E., (2015). Investigando o uso de Características na Detecção de URLs Maliciosas. XV Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais — SBSeg 2015. | |
dc.relation | [Bhagwat et al., 2019] Bhagwat, A., Lodhi, K., Dalvi, S., & Kulkarni, U. (2019). An implemention of a mechanism for malicious URLs detection. In Proceedings of the 2019 6th International Conference on Computing for Sustainable Global Development, INDIACom 2019 (pp. 1008–1013). Institute of Electrical and Electronics Engineers Inc. | |
dc.relation | [Breiman, 2001] Breiman, L. (2001). Machine Learning, 45(1), 5–32. doi:10.1023/a:1010933404324 | |
dc.relation | [Burgess et al., 2020] J. Burgess, D. Carlin, P. O'Kane y S. Sezer, "REdiREKT: Extracting Malicious Redirections from Exploit Kit Traffic", Conferencia IEEE de 2020 sobre comunicaciones y seguridad de redes (CNS) , Avignon, Francia, 2020, págs. 1-9, doi: 10.1109 / CNS48642.2020.9162304. | |
dc.relation | [Cheng and Greiner, 1999] Cheng, J. & Greiner, R. (1999). Comparing Bayesian Network Classifiers. UAI'99: Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, July 1999 Pages 101–108 | |
dc.relation | [Chen et al., 2019] Chen, W., Zeng, Y., & Qiu, M. (2019). Using Adversarial Examples to Bypass Deep Learning Based URL Detection System. 2019 IEEE International Conference on Smart Cloud (SmartCloud). doi:10.1109/smartcloud.2019.00031 | |
dc.relation | [Chiew et al., 2018] Chiew, K. L., Yong, K. S. C., & Tan, C. L. (2018). A survey of phishing attacks: Their types, vectors and technical approaches. Expert Systems with Applications, 106, 1–20. doi: 10.1016/j.eswa.2018.03.050 | |
dc.relation | [Cisco, 2018] Reporte Anual de Ciberseguridad,2018, [en línea] Disponible: https://www.cisco.com/c/dam/global/es_mx/solutions/pdf/reporte-anual-cisco-2018-espan.pdf | |
dc.relation | [Cutler and Zhao, 2001] Cutler, A., & Zhao, G. (2001). PERT – Perfect Random Tree Ensembles. | |
dc.relation | [Das et al., 2019] Das, A., Baki, S., El Aassal, A., Verma, R., & Dunbar, A. (2019). SoK: A Comprehensive Reexamination of Phishing Research from the Security Perspective. IEEE Communications Surveys & Tutorials, 1–1. doi:10.1109/comst.2019.2957750 | |
dc.relation | [DNS-BH, s.f.] DNS-BH malware Domains. (s.f.). DNS-BH Malware Domains Blocklist by RiskAnalytics http://mirror1.malwaredomains.com/files/domains.txt | |
dc.relation | [Elwell and Polikar, 2011] Elwell, R., & Polikar, R. (2011). Incremental Learning of Concept Drift in Nonstationary Environments. IEEE Transactions on Neural Networks, 22(10), 1517–1531. doi:10.1109/tnn.2011.2160459 | |
dc.relation | Fayrix, s.f.] Fayrix. (s.f.). “Selección de métricas para aprendizaje automático”. https://fayrix.com/machine-learning-metrics_es | |
dc.relation | [Filtro SmartScreen Microsoft, 2011] Filtro SmartScreen - Microsoft Windows, 2011, [en línea] Disponible: https://support.microsoft.com/es-us/help/17443/windows-internet-explorer-smartscreen-faq | |
dc.relation | [Friedman et al., 1997] Friedman, N., Geiger, D. & Goldszmidt, M. Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997). doi: 10.1023/A:1007465528199 | |
dc.relation | [Garera et al., 2007] Garera, S., Provos, N., Chew, M., & Rubin, A. D. (2007). A framework for detection and measurement of phishing attacks. Proceedings of the 2007 ACM Workshop on Recurring Malcode - WORM ’07. doi:10.1145/1314389.1314391 | |
dc.relation | [Ghafir and Prenosil, 2015] Ghafir, I., & Prenosil, V. (2015). Blacklist-based malicious IP traffic detection. 2015 Global Conference on Communication Technologies (GCCT). doi:10.1109/gcct.2015.7342657 | |
dc.relation | [Goodfellow et al., 2016] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning [Libro electrónico]. MIT Press. http://www.deeplearningbook.org | |
dc.relation | [Gowtham et al., 2014] Gowtham, R.; Krishnamurthi, Ilango (2014). A comprehensive and efficacious architecture for detecting phishing webpages. Computers & Security, 40(), 23–37. doi:10.1016/j.cose.2013.10.004 | |
dc.relation | [Kan and Thi, 2005] Kan, M.-Y., & Thi, H. O. N. (2005). Fast webpage classification using URL features. Proceedings of the 14th ACM International Conference on Information and Knowledge Management - CIKM ’05. doi:10.1145/1099554.1099649 | |
dc.relation | [Khonji et al.,2013] Khonji, M., Iraqi, Y., & Jones, A. (2013). Phishing Detection: A Literature Survey. IEEE Communications Surveys & Tutorials, 15(4), 2091–2121. doi:10.1109/surv.2013.032213.00009 | |
dc.relation | [Khor et al., 2010] Khor, K. C., Ting, C. Y., & Phon-Amnuaisuk, S. (2010). Comparing Single and Multiple Bayesian Classifiers Approaches for Network Intrusion Detection. 2010 Second International Conference on Computer Engineering and Applications. doi:10.1109/iccea.2010.214 | |
dc.relation | [Kim et al.,2018] Kim, S., Kim, J., & Kang, B. B. (2018). Malicious URL protection based on attackers’ habitual behavioral analysis. Computers & Security. doi: 10.1016/j.cose.2018.01.013 | |
dc.relation | [Kolosnjaji et al., 2016] Kolosnjaji, B., Zarras, A., Webster, G., & Eckert, C. (2016). Deep Learning for Classification of Malware System Call Sequences. Lecture Notes in Computer Science, 137–149. doi:10.1007/978-3-319-50127-7_11 | |
dc.relation | [Kühnel and Meyer, 2016] Kühnel, M., & Meyer, U. (2016). Applying highly space efficient blacklisting to mobile malware. Logic Journal of IGPL, 24(6), 971–981. doi:10.1093/jigpal/jzw052 | |
dc.relation | [Landwehr et al., 2005] Landwehr N., Hall M., & Frank E. (2005). Logistic Model Trees. Machine Learning, 59, 161–205, 2005 | |
dc.relation | [Latorre, 2018] Latorre M. (2018) Universidad Marcelino Champagnat, [en línea]. Disponible: http://umch.edu.pe/arch/hnomarino/74_Historia%20de%20la%20Web.pdf | |
dc.relation | [Lee and Kim, 2013] Lee, S., & Kim, J. (2013). Fluxing botnet command and control channels with URL shortening services. Computer Communications, 36(3), 320–332. doi: 10.1016/j.comcom.2012.10.003 | |
dc.relation | [Lin et al., 2013] Lin, M.-S., Chiu, C.-Y., Lee, Y.-J., & Pao, H.-K. (2013). Malicious URL filtering — A big data application. 2013 IEEE International Conference on Big Data. doi:10.1109/bigdata.2013.6691627 | |
dc.relation | [Ma et al., 2009] Ma, J., Saul, L. K., Savage, S., & Voelker, G. M. (2009). Beyond blacklists. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’09. doi:10.1145/1557019.1557153 | |
dc.relation | [Majestic, s.f.] Majestic. (s.f.). The Majestic Million (Formato CSV) https://majestic.com/reports/majestic-million | |
dc.relation | [Mamun et al., 2016] Mamun, M. S. I., Rathore, M. A., Lashkari, A. H., Stakhanova, N., & Ghorbani, A. A. (2016). Detecting Malicious URLs Using Lexical Analysis. Network & System Security (9783319462974), 467 | |
dc.relation | [Manjeri et al., 2019] Manjeri, A. S., R, K., MNV, A., & Nair, P. C. (2019). A Machine Learning Approach for Detecting Malicious Websites using URL Features. 2019 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA). doi:10.1109/iceca.2019.8821879 | |
dc.relation | [Mohammad et al., 2015] Mohammad, R. M., Thabtah, F., & McCluskey, L. (2015). Tutorial and critical analysis of phishing websites methods. Computer Science Review, 17, 1–24. doi: 10.1016/j.cosrev.2015.04.001 | |
dc.relation | [Nguyen et al., 2013] Nguyen, L. A. T., To, B. L., Nguyen, H. K., & Nguyen, M. H. (2013). Detecting phishing web sites: A heuristic URL-based approach. 2013 International Conference on Advanced Technologies for Communications (ATC 2013). doi:10.1109/atc.2013.6698185 | |
dc.relation | [OpenPhish, s.f.] OpenPhish. (s.f.). “Actividad global de phishing”. Phishing Feeds. https://openphish.com/ | |
dc.relation | [Pedregosa et al., 2011] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830. | |
dc.relation | [PhishTank, s.f.] PhishTank. (s.f.). Developers. “Developer Information”. (Formato CSV) https://www.phishtank.com/ | |
dc.relation | [PhishTank, 2006] PhishTank,2006 [en línea] Disponible: http://phishtank.org/ | |
dc.relation | [Prakash et al., 2010] Prakash, P., Kumar, M., Kompella, R. R., & Gupta, M. (2010). PhishNet: Predictive Blacklisting to Detect Phishing Attacks. 2010 Proceedings IEEE INFOCOM. doi:10.1109/infcom.2010.5462216 | |
dc.relation | [Python, 2021.] Python (24 de febrero 2021). “What´s New In Python 3.9”. Python. https://docs.python.org/3.9/whatsnew/3.9.html | |
dc.relation | [Python, 2021.] Python (24 de febrero 2021). “re-Operaciones con expresiones regulares”. Python. https://docs.python.org/es/3/library/re.html | |
dc.relation | [Python, 2021.] Python (24 de febrero 2021). “math-Funciones matemáticas”. Python. https://docs.python.org/es/3.10/library/math.html | |
dc.relation | [Python, 2021.] Python (24 de febrero 2021). “datatime-Tipos básico de fecha y hora”. Python. https://docs.python.org/es/3/library/datetime.html | |
dc.relation | [Python, 2021.] Python (24 de febrero 2021). “Collections – Tipos de datos de contenedor”. Python. https://docs.python.org/3/library/collections.html | |
dc.relation | [Revista Dinero (7 de abril 2019)]“4 de cada 10 empresas en América Latina sufrieron ciberataques en los últimos años”, 2019[En Línea]Disponible : https://www.dinero.com/tecnologia/articulo/empresas-en-colombia-sufren-de-ataques-ciberneticos-regularmente/273870 | |
dc.relation | [Schapire and Freund, 2012 ] Schapire, R. E., & Freund, Y. (2012). Boosting: Foundations and algorithms. ProQuest Ebook Central https://ebookcentral-proquest-com.ezproxy.javeriana.edu.co | |
dc.relation | [ScikitLearn, s.f.] ScikitLearn. (s.f.). Extracción de características. “Extracción de características de texto”https://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction | |
dc.relation | [ScikitLearn, s.f.] ScikitLearn. (s.f.). Scikit-learn. “Machine Learning in Python” https://scikit-learn.org/stable/ | |
dc.relation | [Shai and Shai, 2014, Capitulo 9, p 125] Shai Shalev-Shawart and Shai Ben-David (2014). Undersanding Machine Learning: From Theory to Algorithms. Cambridge University Press. https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/ | |
dc.relation | [Shai and Shai, 2014, Capitulo 15, p 200] Shai Shalev-Shawart and Shai Ben-David (2014). Undersanding Machine Learning: From Theory to Algorithms. Cambridge University Press. https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/ | |
dc.relation | [Shai and Shai, 2014, Capitulo 18, p 250] Shai Shalev-Shawart and Shai Ben-David (2014). Undersanding Machine Learning: From Theory to Algorithms. Cambridge University Press. https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/ | |
dc.relation | [Shubho et al., 2019] Shubho, S. A., Razib, M. R. H., Rudro, N. K., Saha, A. K., Khan, M. S. U., & Ahmed, S. (2019). Performance Analysis of NB Tree, REP Tree and Random Tree Classifiers for Credit Card Fraud Data. 2019 22nd International Conference on Computer and Information Technology (ICCIT). doi:10.1109/iccit48885.2019.9038578 | |
dc.relation | [Silva et al., 2019] Silva, C. M. R. da, Feitosa, E. L., & Garcia, V. C. (2019). Heuristic-based Strategy for Phishing Prediction: A Survey of URL-based approach. Computers & Security, 101613. doi: 10.1016/j.cose.2019.101613 | |
dc.relation | [Singh and Goyal, 2019] Singh, A. K., & Goyal, N. (2019). A Comparison of Machine Learning Attributes for Detecting Malicious Websites. 2019 11th International Conference on Communication Systems & Networks (COMSNETS). doi:10.1109/comsnets.2019.8711133 | |
dc.relation | [Stackoverflow, 2020.] Stackoverflow. (2020). Fayrix. “Tecnologías mas populares”. https://insights.stackoverflow.com/survey/2020#most-popular-technologies | |
dc.relation | [UNB, s.f.] UNB: University of New Brunswick. (s.f.). “Canadian Institute for Cybersecurity”. URL dataset (ISCX-URL2016). https://www.unb.ca/cic/datasets/url-2016.html | |
dc.relation | [Vanhoenshoven et al., 2016] Vanhoenshoven, F., Napoles, G., Falcon, R., Vanhoof, K., & Koppen, M. (2016). Detecting malicious URLs using machine learning techniques. 2016 IEEE Symposium Series on Computational Intelligence (SSCI). doi:10.1109/ssci.2016.7850079 | |
dc.relation | [Varoquaux et al. 2015] Varoquaux, G., Buitinck, L., Louppe, G., Grisel, O., Pedregosa, F., & Mueller, A. (2015). Scikit-learn. GetMobile: Mobile Computing and Communications, 19(1), 29–33. doi:10.1145/2786984.2786995 | |
dc.relation | [Vazhayil et al., 2018] Vazhayil, A., Vinayakumar, R., & Soman, K. (2018). Comparative Study of the Detection of Malicious URLs Using Shallow and Deep Networks. 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT). doi:10.1109/icccnt.2018.8494159 | |
dc.relation | [Verma et al., 2015] Verma, R., Kantarcioglu, M., Marchette, D., Leiss, E., & Solorio, T. (2015). Security Analytics: Essential Data Analytics Knowledge for Cybersecurity Professionals and Students. IEEE Security & Privacy, 13(6), 60–65. doi:10.1109/msp.2015.121 | |
dc.relation | [Verma and Dyer, 2015] Verma, R., & Dyer, K. (2015). On the Character of Phishing URLs. Proceedings of the 5th ACM Conference on Data and Application Security and Privacy - CODASPY ’15. doi:10.1145/2699026.2699115 | |
dc.relation | [Wainberg et al., 2016] Wainberg, M., Alipanahi, B., & Frey, B. J. (2016). Are random forests truly the best classifiers? Journal of Machine Learning Research, 17, 1–5 | |
dc.relation | [W3C, s.f.] W3C, World Wide Web Consortium (s.f.). Arquitectura de la World Wide Web, volumen uno [en línea] Disponible: https://www.w3.org/TR/webarch/ | |
dc.relation | [Yuan et al., 2014] Yuan, Z., Lu, Y., Wang, Z., & Xue, Y. (2014). Droid-Sec. ACM SIGCOMM Computer Communication Review, 44(4), 371–372. doi:10.1145/2740070.2631434 | |
dc.relation | [Zhang et al., 2008] Zhang, J., Porras, P., & Ullrich, J. (2008). Highly predictive blacklisting. In Proceedings of the 17th USENIX Security Symposium (pp. 107–122). USENIX Association. | |
dc.relation | [Zhang et al., 2011] Zhang, W., Ding, Y.-X., Tang, Y., & Zhao, B. (2011). Malicious web page detection based on on-line learning algorithm. 2011 International Conference on Machine Learning and Cybernetics. doi:10.1109/icmlc.2011.6016954 | |
dc.relation | [Zhao et al., 2018] Zhao, J., Wang, N., Ma, Q., & Cheng, Z. (2018). Classifying Malicious URLs Using Gated Recurrent Neural Networks. Advances in Intelligent Systems and Computing, 385–394. doi:10.1007/978-3-319-93554-6_36 | |
dc.rights | Atribución-NoComercial-SinDerivadas 4.0 Internacional | |
dc.rights | http://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.rights | Derechos reservados de autor, 2021 | |
dc.title | Detección de URLs maliciosas por medio de técnicas de aprendizaje automático | |
dc.type | Trabajo de grado - Maestría | |