dc.contributor | Linares Vásquez, Mario | |
dc.contributor | Mojica Hanke, Anamaría Irmgard | |
dc.creator | Cabra Acela, Laura Helena | |
dc.date.accessioned | 2023-01-31T19:12:34Z | |
dc.date.accessioned | 2023-09-07T02:14:23Z | |
dc.date.available | 2023-01-31T19:12:34Z | |
dc.date.available | 2023-09-07T02:14:23Z | |
dc.date.created | 2023-01-31T19:12:34Z | |
dc.date.issued | 2022-12-15 | |
dc.identifier | http://hdl.handle.net/1992/64399 | |
dc.identifier | instname:Universidad de los Andes | |
dc.identifier | reponame:Repositorio Institucional Séneca | |
dc.identifier | repourl:https://repositorio.uniandes.edu.co/ | |
dc.identifier.uri | https://repositorioslatinoamericanos.uchile.cl/handle/2250/8729082 | |
dc.description.abstract | In this project, we propose a tool for the developers to search for good machine learning (ML) practices appropriate for the software engineering (SE) assignments they are working on. We expect this tool makes ML good practices easily accessible and promotes their use. For this, we defined a structure that described the relationships between stages of the ML pipeline, tasks, and good practices. Moreover,
we implemented and validated an information retrieval (IR) model for the good practices gathered. Furthermore, we developed and validated a platform that allows users to search for good practices in ML for SE. This platform includes three main features: (i) a search bar that uses the implemented IR model. (ii) a tool to filter the practices by tasks. (iii) an interactive tool that classifies the information by the
relationship between stages, tasks, and practices. | |
dc.language | eng | |
dc.publisher | Universidad de los Andes | |
dc.publisher | Ingeniería de Sistemas y Computación | |
dc.publisher | Facultad de Ingeniería | |
dc.publisher | Departamento de Ingeniería Sistemas y Computación | |
dc.relation | M. Alshangiti, H. Sapkota, P. K. Murukannaiah, X. Liu, and Q. Yu. ¿Why is Developing Machine Learning Applications Challenging? A Study on Stack Overflow Posts?. In: 2019 ACM/IEEE International Symposium on Empirical
Software Engineering and Measurement (ESEM). 2019, pp. 1-11 (cit. on p. 3) | |
dc.relation | Saleema Amershi, Andrew Begel, Christian Bird, et al. "Software engineering for machine learning: A case study". In: 2019 IEEE/ACM 41st International
Conference on Software Engineering: Software Engineering in Practice (ICSE-
SEIP). IEEE. 2019, pp. 291-300 (cit. on pp. 3, 9, 23) | |
dc.relation | AWS. Monitor, detect, and handle model performance degradation (cit. on
pp. 26, 27) | |
dc.relation | Stella Biderman and Walter J Scheirer. "Pitfalls in machine learning research: Reexamining the development cycle". In: (2020) (cit. on p. 3) | |
dc.relation | Steven Bird, Ewan Klein, and Edward Loper. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.",
2009 (cit. on p. 9) | |
dc.relation | David M Blei, Andrew Y Ng, and Michael I Jordan. "Latent dirichlet allocation". In: Journal of machine Learning research 3.Jan (2003), pp. 993-1022 (cit. on p. 9) | |
dc.relation | Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, and Gerhard Weikum. "Probabilistic information retrieval approach for ranking of database query results". In: ACM Transactions on Database Systems (TODS) 31.3 (2006), pp. 1134-1168 (cit. on p. 4) | |
dc.relation | Jai Raj Choudhary. What is model validation. 2020 (cit. on pp. 26, 27) | |
dc.relation | CloudFactory. The Ultimate Guide to data labeling for machine learning (cit. on
pp. 26, 27) | |
dc.relation | European Commission. HIGH-LEVEL EXPERT GROUP ON ARTIFICIAL INTELLI-
GENCE. 2019 (cit. on p. 3) | |
dc.relation | Datagen. Model training. 2022 (cit. on pp. 26, 27) | |
dc.relation | dewangNautiyal. ML: Underfitting and overfitting. 2022 (cit. on pp. 26, 27) | |
dc.relation | Universidad Duke. Model maintenance (cit. on pp. 26, 27) | |
dc.relation | Davide Falessi, Natalia Juristo, Claes Wohlin, et al. "Empirical software engineering experts on the use of students and professionals in experiments". In:
Empirical Software Engineering 23.1 (2018), pp. 452-489 (cit. on p. 17) | |
dc.relation | Robert Feldt, Thomas Zimmermann, Gunnar R Bergersen, et al. "Four commentaries on the use of students and professionals in empirical software
engineering experiments". In: Empirical Software Engineering 23.6 (2018), pp. 3801-3820 (cit. on p. 17) | |
dc.relation | Google. Creating instructions for human labelers (cit. on pp. 26, 27) | |
dc.relation | Google. Introduction to transforming data (cit. on pp. 26, 27) | |
dc.relation | Bingbing Jiang, Zhengyu Li, Huanhuan Chen, and Anthony G Cohn. "Latent topic text representation learning on statistical manifold". In: IEEE transac-
tions on neural networks and learning systems 29.11 (2018), pp. 5643-5654
(cit. on p. 8) | |
dc.relation | Markku Lahtela and Philip (Provenance) Kaplan. What is data labeling. 1966
(cit. on pp. 26, 27) | |
dc.relation | Seok Won Lee and David C Rine. "Missing requirements and relationship discovery through proxy viewpoints model. In: Proceedings of the 2004 ACM symposium on Applied Computing. 2004, pp. 1513-1518 (cit. on pp. 4, 5) | |
dc.relation | Michael A. Lones. ¿How to avoid machine learning pitfalls: a guide for academic researchers?. In: CoRR abs/2108.02497 (2021). arXiv: 2108.02497 (cit. on pp. 3, 23) | |
dc.relation | Lotame. What are the methods of data collection?: How to collect data. 2022 (cit. on pp. 26, 27) | |
dc.relation | Andrea De Lucia, Fausto Fasano, Rocco Oliveto, and Genoveffa Tortora. "Recovering traceability links in software artifact management systems using
information retrieval methods". In: ACM Transactions on Software Engineering and Methodology (TOSEM) 16.4 (2007), 13 es (cit. on pp. 4, 5) | |
dc.relation | Anamaria Mojica-Hanke, Andrea Bayona, Mario Linares-Vásquez, Steffen Herbold, and Fabio A. González. What are the Machine Learning best practices reported by practitioners on Stack Exchange? (Cit. on pp. 4, 9) | |
dc.relation | Nicolás Munar González and Nicolás Tobo Urrutia. "Software best practices for machine learning." In: 2022 (cit. on p. 4) | |
dc.relation | Google PAIR. People + AI Guidebook. 2021 (cit. on pp. 3, 4, 9) | |
dc.relation | Harshil Patel. What is feature engineering-importance, tools and techniques for machine learning. 2021 (cit. on pp. 26, 27) | |
dc.relation | Martin F Porter. "An algorithm for suffix stripping". In: Program (1980) (cit. on p. 9) | |
dc.relation | Stephen Robertson, Hugo Zaragoza, et al. "The probabilistic relevance framework: BM25 and beyond". In: Foundations and Trends® in Information Retrieval 3.4 (2009), pp. 333-389 (cit. on p. 9) | |
dc.relation | Gerard Salton, Anita Wong, and Chung-Shu Yang. "A vector space model for automatic indexing". In: Communications of the ACM 18.11 (1975), pp. 613-
620 (cit. on pp. 8, 9) | |
dc.relation | Alex Serban, Koen van der Blom, Holger Hoos, and Joost Visser. "Adoption and effects of software engineering best practices in machine learning". In:
Proceedings of the 14th ACM/IEEE International Symposium on Empirical
Software Engineering and Measurement (ESEM). 2020, pp. 1-12 (cit. on pp. 3,
23) | |
dc.relation | Deval Shah. The Essential Guide to data augmentation in Deep Learning (cit. on pp. 26, 27) | |
dc.relation | Eric J Stierna and Neil C Rowe. "Applying information-retrieval methods to software reuse: a case study". In: Information processing & management 39.1 (2003), pp. 67-74 (cit. on pp. 4, 5) | |
dc.relation | SuperAnnotate. The Ultimate Guide to Data Labeling: How to label data for ML (cit. on pp. 26, 27) | |
dc.relation | Tableau. Guide to data cleaning: Definition, benefits, components, and how to clean your data (cit. on pp. 26, 27) | |
dc.relation | Talend. What is data profiling? data profiling tools and examples (cit. on pp. 26, 27) | |
dc.relation | CFI Team. Data Anonymization. 2022 (cit. on pp. 26, 27) | |
dc.relation | Michail Vlachos. "Dimensionality Reduction". In: Encyclopedia of Machine Learning. Ed. by Claude Sammut and Geoffrey I. Webb. Boston, MA: Springer US, 2010, pp. 274-279 (cit. on pp. 26, 27) | |
dc.relation | Kathleen Walch. How to build a machine learning model in 7 steps: TechTarget. 2021 (cit. on pp. 26, 27) | |
dc.relation | David Weedmark. A 4-step guide to machine learning model deployment. 2022 (cit. on pp. 26, 27) | |
dc.relation | Brett Wujek, Patrick Hall, and Funda Gunes. "Best practices for machine learning applications". In: SAS Institute Inc (2016) (cit. on p. 3) | |
dc.relation | Haining Yao, Letha H Etzkorn, and Shamsnaz Virani. "Automated classification and retrieval of reusable software components". In: Journal of the American
society for information science and technology 59.4 (2008), pp. 613-627 (cit. on pp. 4, 5) | |
dc.relation | Martin Zinkevich. Rules of machine learning: Best Practices for ML Engineering.
2021 (cit. on p. 3) | |
dc.rights | Atribución-CompartirIgual 4.0 Internacional | |
dc.rights | Atribución-CompartirIgual 4.0 Internacional | |
dc.rights | http://creativecommons.org/licenses/by-sa/4.0/ | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.rights | http://purl.org/coar/access_right/c_abf2 | |
dc.title | Spärck: Information retrieval system of machine learning good practices for software engineering | |
dc.type | Trabajo de grado - Pregrado | |