dc.creatorIparraguirre Villanueva, Orlando
dc.creatorSierra Liñan, Fernando
dc.creatorHerrera Salazar, Jose Luis
dc.creatorBeltozar Clemente, Saul
dc.creatorPucuhuayla Revatta, Félix
dc.creatorZapata Paulini, Joselyn
dc.creatorCabanillas Carbonell, Michael
dc.date.accessioned2023-10-20T12:09:20Z
dc.date.accessioned2024-05-03T19:33:58Z
dc.date.available2023-10-20T12:09:20Z
dc.date.available2024-05-03T19:33:58Z
dc.date.created2023-10-20T12:09:20Z
dc.date.issued2023-01-25
dc.identifierIparraguirre, O., Sierra, F., Herrera, J. L., Beltozar, S., Pucuhuayla, F., Zapata, J., & Cabanillas, M. (2023). Search and classify topics in a corpus of text using the latent dirichlet allocation model. Indonesian Journal of Electrical Engineering and Computer Science, 30(1), 246-256. http://doi.org/10.11591/ijeecs.v30.i1.pp246-256
dc.identifier.
dc.identifierhttps://hdl.handle.net/11537/34685
dc.identifierIndonesian Journal of Electrical Engineering and Computer Science
dc.identifierhttp://doi.org/10.11591/ijeecs.v30.i1.pp246-256
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/9280556
dc.description.abstractThis work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; and fourth, evaluation of the model performance. For processing, a total of 10,322 "curriculum" documents related to data science were collected from the web during 2018-2022. The latent dirichlet allocation (LDA) model was used for the analysis and structure of the subjects. After processing, 12 themes were generated, which allowed ranking the most relevant terms to identify the skills of each of the candidates. This work concludes that candidates interested in data science must have skills in the following topics: first, they must be technical, they must have mastery of structured query language, mastery of programming languages such as R, Python, java, and data management, among other tools associated with the technology.
dc.languagespa
dc.publisherInstitute of Advanced Engineering and Science
dc.publisherPE
dc.rightshttps://creativecommons.org/licenses/by-nc-sa/3.0/us/
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightsAtribución-NoComercial-CompartirIgual 3.0 Estados Unidos de América
dc.sourceUniversidad Privada del Norte
dc.sourceRepositorio Institucional - UPN
dc.subjectDescubriendo
dc.subjectAsignación latente de Dirichlet
dc.subjectCorpus de texto
dc.titleSearch and classify topics in a corpus of text using the latent dirichlet allocation model
dc.typeinfo:eu-repo/semantics/article


Este ítem pertenece a la siguiente institución