dc.contributorLinares Vásquez, Mario
dc.contributorMojica Hanke, Anamaría Irmgard
dc.creatorCabra Acela, Laura Helena
dc.date.accessioned2023-01-31T19:12:34Z
dc.date.accessioned2023-09-07T02:14:23Z
dc.date.available2023-01-31T19:12:34Z
dc.date.available2023-09-07T02:14:23Z
dc.date.created2023-01-31T19:12:34Z
dc.date.issued2022-12-15
dc.identifierhttp://hdl.handle.net/1992/64399
dc.identifierinstname:Universidad de los Andes
dc.identifierreponame:Repositorio Institucional Séneca
dc.identifierrepourl:https://repositorio.uniandes.edu.co/
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/8729082
dc.description.abstractIn this project, we propose a tool for the developers to search for good machine learning (ML) practices appropriate for the software engineering (SE) assignments they are working on. We expect this tool makes ML good practices easily accessible and promotes their use. For this, we defined a structure that described the relationships between stages of the ML pipeline, tasks, and good practices. Moreover, we implemented and validated an information retrieval (IR) model for the good practices gathered. Furthermore, we developed and validated a platform that allows users to search for good practices in ML for SE. This platform includes three main features: (i) a search bar that uses the implemented IR model. (ii) a tool to filter the practices by tasks. (iii) an interactive tool that classifies the information by the relationship between stages, tasks, and practices.
dc.languageeng
dc.publisherUniversidad de los Andes
dc.publisherIngeniería de Sistemas y Computación
dc.publisherFacultad de Ingeniería
dc.publisherDepartamento de Ingeniería Sistemas y Computación
dc.relationM. Alshangiti, H. Sapkota, P. K. Murukannaiah, X. Liu, and Q. Yu. ¿Why is Developing Machine Learning Applications Challenging? A Study on Stack Overflow Posts?. In: 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 2019, pp. 1-11 (cit. on p. 3)
dc.relationSaleema Amershi, Andrew Begel, Christian Bird, et al. "Software engineering for machine learning: A case study". In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE- SEIP). IEEE. 2019, pp. 291-300 (cit. on pp. 3, 9, 23)
dc.relationAWS. Monitor, detect, and handle model performance degradation (cit. on pp. 26, 27)
dc.relationStella Biderman and Walter J Scheirer. "Pitfalls in machine learning research: Reexamining the development cycle". In: (2020) (cit. on p. 3)
dc.relationSteven Bird, Ewan Klein, and Edward Loper. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.", 2009 (cit. on p. 9)
dc.relationDavid M Blei, Andrew Y Ng, and Michael I Jordan. "Latent dirichlet allocation". In: Journal of machine Learning research 3.Jan (2003), pp. 993-1022 (cit. on p. 9)
dc.relationSurajit Chaudhuri, Gautam Das, Vagelis Hristidis, and Gerhard Weikum. "Probabilistic information retrieval approach for ranking of database query results". In: ACM Transactions on Database Systems (TODS) 31.3 (2006), pp. 1134-1168 (cit. on p. 4)
dc.relationJai Raj Choudhary. What is model validation. 2020 (cit. on pp. 26, 27)
dc.relationCloudFactory. The Ultimate Guide to data labeling for machine learning (cit. on pp. 26, 27)
dc.relationEuropean Commission. HIGH-LEVEL EXPERT GROUP ON ARTIFICIAL INTELLI- GENCE. 2019 (cit. on p. 3)
dc.relationDatagen. Model training. 2022 (cit. on pp. 26, 27)
dc.relationdewangNautiyal. ML: Underfitting and overfitting. 2022 (cit. on pp. 26, 27)
dc.relationUniversidad Duke. Model maintenance (cit. on pp. 26, 27)
dc.relationDavide Falessi, Natalia Juristo, Claes Wohlin, et al. "Empirical software engineering experts on the use of students and professionals in experiments". In: Empirical Software Engineering 23.1 (2018), pp. 452-489 (cit. on p. 17)
dc.relationRobert Feldt, Thomas Zimmermann, Gunnar R Bergersen, et al. "Four commentaries on the use of students and professionals in empirical software engineering experiments". In: Empirical Software Engineering 23.6 (2018), pp. 3801-3820 (cit. on p. 17)
dc.relationGoogle. Creating instructions for human labelers (cit. on pp. 26, 27)
dc.relationGoogle. Introduction to transforming data (cit. on pp. 26, 27)
dc.relationBingbing Jiang, Zhengyu Li, Huanhuan Chen, and Anthony G Cohn. "Latent topic text representation learning on statistical manifold". In: IEEE transac- tions on neural networks and learning systems 29.11 (2018), pp. 5643-5654 (cit. on p. 8)
dc.relationMarkku Lahtela and Philip (Provenance) Kaplan. What is data labeling. 1966 (cit. on pp. 26, 27)
dc.relationSeok Won Lee and David C Rine. "Missing requirements and relationship discovery through proxy viewpoints model. In: Proceedings of the 2004 ACM symposium on Applied Computing. 2004, pp. 1513-1518 (cit. on pp. 4, 5)
dc.relationMichael A. Lones. ¿How to avoid machine learning pitfalls: a guide for academic researchers?. In: CoRR abs/2108.02497 (2021). arXiv: 2108.02497 (cit. on pp. 3, 23)
dc.relationLotame. What are the methods of data collection?: How to collect data. 2022 (cit. on pp. 26, 27)
dc.relationAndrea De Lucia, Fausto Fasano, Rocco Oliveto, and Genoveffa Tortora. "Recovering traceability links in software artifact management systems using information retrieval methods". In: ACM Transactions on Software Engineering and Methodology (TOSEM) 16.4 (2007), 13 es (cit. on pp. 4, 5)
dc.relationAnamaria Mojica-Hanke, Andrea Bayona, Mario Linares-Vásquez, Steffen Herbold, and Fabio A. González. What are the Machine Learning best practices reported by practitioners on Stack Exchange? (Cit. on pp. 4, 9)
dc.relationNicolás Munar González and Nicolás Tobo Urrutia. "Software best practices for machine learning." In: 2022 (cit. on p. 4)
dc.relationGoogle PAIR. People + AI Guidebook. 2021 (cit. on pp. 3, 4, 9)
dc.relationHarshil Patel. What is feature engineering-importance, tools and techniques for machine learning. 2021 (cit. on pp. 26, 27)
dc.relationMartin F Porter. "An algorithm for suffix stripping". In: Program (1980) (cit. on p. 9)
dc.relationStephen Robertson, Hugo Zaragoza, et al. "The probabilistic relevance framework: BM25 and beyond". In: Foundations and Trends® in Information Retrieval 3.4 (2009), pp. 333-389 (cit. on p. 9)
dc.relationGerard Salton, Anita Wong, and Chung-Shu Yang. "A vector space model for automatic indexing". In: Communications of the ACM 18.11 (1975), pp. 613- 620 (cit. on pp. 8, 9)
dc.relationAlex Serban, Koen van der Blom, Holger Hoos, and Joost Visser. "Adoption and effects of software engineering best practices in machine learning". In: Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 2020, pp. 1-12 (cit. on pp. 3, 23)
dc.relationDeval Shah. The Essential Guide to data augmentation in Deep Learning (cit. on pp. 26, 27)
dc.relationEric J Stierna and Neil C Rowe. "Applying information-retrieval methods to software reuse: a case study". In: Information processing & management 39.1 (2003), pp. 67-74 (cit. on pp. 4, 5)
dc.relationSuperAnnotate. The Ultimate Guide to Data Labeling: How to label data for ML (cit. on pp. 26, 27)
dc.relationTableau. Guide to data cleaning: Definition, benefits, components, and how to clean your data (cit. on pp. 26, 27)
dc.relationTalend. What is data profiling? data profiling tools and examples (cit. on pp. 26, 27)
dc.relationCFI Team. Data Anonymization. 2022 (cit. on pp. 26, 27)
dc.relationMichail Vlachos. "Dimensionality Reduction". In: Encyclopedia of Machine Learning. Ed. by Claude Sammut and Geoffrey I. Webb. Boston, MA: Springer US, 2010, pp. 274-279 (cit. on pp. 26, 27)
dc.relationKathleen Walch. How to build a machine learning model in 7 steps: TechTarget. 2021 (cit. on pp. 26, 27)
dc.relationDavid Weedmark. A 4-step guide to machine learning model deployment. 2022 (cit. on pp. 26, 27)
dc.relationBrett Wujek, Patrick Hall, and Funda Gunes. "Best practices for machine learning applications". In: SAS Institute Inc (2016) (cit. on p. 3)
dc.relationHaining Yao, Letha H Etzkorn, and Shamsnaz Virani. "Automated classification and retrieval of reusable software components". In: Journal of the American society for information science and technology 59.4 (2008), pp. 613-627 (cit. on pp. 4, 5)
dc.relationMartin Zinkevich. Rules of machine learning: Best Practices for ML Engineering. 2021 (cit. on p. 3)
dc.rightsAtribución-CompartirIgual 4.0 Internacional
dc.rightsAtribución-CompartirIgual 4.0 Internacional
dc.rightshttp://creativecommons.org/licenses/by-sa/4.0/
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightshttp://purl.org/coar/access_right/c_abf2
dc.titleSpärck: Information retrieval system of machine learning good practices for software engineering
dc.typeTrabajo de grado - Pregrado


Este ítem pertenece a la siguiente institución