dc.contributor | Niño Vasquez, Luis Fernando | |
dc.contributor | LABORATORIO DE INVESTIGACIÓN EN SISTEMAS INTELIGENTES - LISI | |
dc.creator | Calero Espinosa, Juan Camilo | |
dc.date.accessioned | 2021-05-26T16:54:28Z | |
dc.date.available | 2021-05-26T16:54:28Z | |
dc.date.created | 2021-05-26T16:54:28Z | |
dc.date.issued | 2021 | |
dc.identifier | https://repositorio.unal.edu.co/handle/unal/79567 | |
dc.identifier | Universidad Nacional de Colombia | |
dc.identifier | Repositorio Institucional Universidad Nacional de Colombia | |
dc.identifier | https://repositorio.unal.edu.co/ | |
dc.description.abstract | Topic detection on a large corpus of documents requires a considerable amount of computational resources, and the number of topics increases the burden as well. However, even a large number of topics might not be as specific as desired, or simply the topic quality starts decreasing after a certain number. To overcome these obstacles, we propose a new methodology for hierarchical topic detection, which uses multi-view clustering to link different topic models extracted from document named entities and part of speech tags. Results on three different datasets evince that the methodology decreases the memory cost of topic detection, improves topic quality and allows the detection of more topics. | |
dc.description.abstract | La detección de temas en grandes colecciones de documentos requiere una considerable cantidad de recursos computacionales, y el número de temas también puede aumentar la carga computacional. Incluso con un elevado nùmero de temas, estos pueden no ser tan específicos como se desea, o simplemente la calidad de los temas comienza a disminuir después de cierto número. Para superar estos obstáculos, proponemos una nueva metodología para la detección jerárquica de temas, que utiliza agrupamiento multi-vista para vincular diferentes modelos de temas extraídos de las partes del discurso y de las entidades nombradas de los documentos. Los resultados en tres conjuntos de documentos muestran que la metodología disminuye el costo en memoria de la detección de temas, permitiendo detectar màs temas y al mismo tiempo mejorar su calidad. | |
dc.language | eng | |
dc.publisher | Universidad Nacional de Colombia | |
dc.publisher | Bogotá - Ingeniería - Maestría en Ingeniería - Ingeniería de Sistemas y Computación | |
dc.publisher | Departamento de Ingeniería de Sistemas e Industrial | |
dc.publisher | Facultad de Ingeniería | |
dc.publisher | Bogotá | |
dc.publisher | Universidad Nacional de Colombia - Sede Bogotá | |
dc.relation | Stephen E. Palmer. “Hierarchical structure in perceptual representation”. In: Cogni- tive Psychology 9.4 (Oct. 1977), pp. 441–474. issn: 0010-0285. doi: 10.1016/0010- 0285(77)90016-0. url: https://www.sciencedirect.com/science/article/pii/ 0010028577900160. | |
dc.relation | E. Wachsmuth, M. W. Oram, and D. I. Perrett. “Recognition of Objects and Their Component Parts: Responses of Single Units in the Temporal Cortex of the Macaque”. In: Cerebral Cortex 4.5 (Sept. 1994), pp. 509–522. issn: 1047-3211. doi: 10.1093/ cercor/4.5.509. url: https://academic.oup.com/cercor/article-lookup/doi/ 10.1093/cercor/4.5.509. | |
dc.relation | N K Logothetis and D L Sheinberg. “Visual Object Recognition”. In: Annual Review of Neuroscience 19.1 (Mar. 1996), pp. 577–621. issn: 0147-006X. doi: 10.1146/annurev. ne . 19 . 030196 . 003045. url: http : / / www . annualreviews . org / doi / 10 . 1146 / annurev.ne.19.030196.003045. | |
dc.relation | Daniel D. Lee and H. Sebastian Seung. “Learning the parts of objects by non-negative matrix factorization”. In: Nature 401.6755 (Oct. 1999), pp. 788–791. issn: 00280836. doi: 10.1038/44565. url: http://www.nature.com/articles/44565. | |
dc.relation | David M. Blei, Andrew Y. Ng, and Michael I. Jordan. “Latent Dirichlet Allocation”. In: Journal of Machine Learning Research 3.Jan (2003), pp. 993–1022. issn: ISSN 1533-7928. url: http://www.jmlr.org/papers/v3/blei03a.html. | |
dc.relation | Thomas L. Griffiths et al. “Hierarchical Topic Models and the Nested Chinese Restau- rant Process”. In: Advances in Neural Information Processing Systems (2003), pp. 17– 24. url: https://papers.nips.cc/paper/2466- hierarchical- topic- models- and-the-nested-chinese%20-restaurant-process.pdf. | |
dc.relation | Stella X. Yu and Jianbo Shi. “Multiclass spectral clustering”. In: Proceedings of the IEEE International Conference on Computer Vision. Vol. 1. Institute of Electrical and Electronics Engineers Inc., 2003, pp. 313–319. doi: 10.1109/iccv.2003.1238361. url: https://ieeexplore.ieee.org/abstract/document/1238361. | |
dc.relation | S. Bickel and T. Scheffer. “Multi-View Clustering”. In: Fourth IEEE International Conference on Data Mining (ICDM’04). IEEE, 2004, pp. 19–26. isbn: 0-7695-2142-8. doi: 10.1109/ICDM.2004.10095. url: http://ieeexplore.ieee.org/document/ 1410262/.74 Bibliography | |
dc.relation | Nevin L Zhang and Lzhang@cs Ust Hk. Hierarchical Latent Class Models for Cluster Analysis. Tech. rep. 2004, pp. 697–723. url: https : / / www . jmlr . org / papers / volume5/zhang04a/zhang04a.pdf. | |
dc.relation | David Newman, Chaitanya Chemudugunta, and Padhraic Smyth. “Statistical entity- topic models”. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 2006. Association for Computing Ma- chinery, 2006, pp. 680–686. isbn: 1595933395. doi: 10.1145/1150402.1150487. | |
dc.relation | Li Wei and Andrew McCallum. “Pachinko allocation: DAG-structured mixture models of topic correlations”. In: ACM International Conference Proceeding Series. Vol. 148. 2006, pp. 577–584. isbn: 1595933832. doi: 10.1145/1143844.1143917. url: https: //dl.acm.org/doi/abs/10.1145/1143844.1143917. | |
dc.relation | David M. Blei, Thomas L. Griffiths, and Michael I. Jordan. “The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies”. In: (Oct. 2007). url: https://arxiv.org/abs/0710.0845. | |
dc.relation | David Mimno, Wei Li, and Andrew McCallum. “Mixtures of hierarchical topics with Pachinko allocation”. In: ACM International Conference Proceeding Series. Vol. 227. 2007, pp. 633–640. doi: 10.1145/1273496.1273576. url: https://dl.acm.org/ doi/abs/10.1145/1273496.1273576. | |
dc.relation | Tomoaki Nakamura, Takayuki Nagai, and Naoto Iwahashi. “Multimodal object cat- egorization by a robot”. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, Oct. 2007, pp. 2415–2420. isbn: 978-1-4244-0911-2. doi: 10 . 1109 / IROS . 2007 . 4399634. url: http : / / ieeexplore . ieee . org / document / 4399634/. | |
dc.relation | Chaitanya Chemudugunta et al. “Modeling Documents by Combining Semantic Con- cepts with Unsupervised Statistical Learning”. In: 2008, pp. 229–244. doi: 10.1007/ 978-3-540-88564-1{\_}15. | |
dc.relation | Yi Wang, Nevin L. Zhang, and Tao Chen. “Latent tree models and approximate in- ference in Bayesian networks”. In: Journal of Artificial Intelligence Research 32 (Aug. 2008), pp. 879–900. issn: 10769757. doi: 10.1613/jair.2530. url: https://www. jair.org/index.php/jair/article/view/10564. | |
dc.relation | Nevin L. Zhang et al. “Latent tree models and diagnosis in traditional Chinese medicine”. In: Artificial Intelligence in Medicine 42.3 (Mar. 2008), pp. 229–245. issn: 09333657. doi: 10.1016/j.artmed.2007.10.004. url: https://www.sciencedirect.com/ science/article/pii/S0933365707001443. | |
dc.relation | David Andrzejewski, Xiaojin Zhu, and Mark Craven. “Incorporating domain knowledge into topic modeling via Dirichlet forest priors”. In: ACM International Conference Proceeding Series. Vol. 382. 2009. isbn: 9781605585161. doi: 10 . 1145 / 1553374 . 1553378.Bibliography 75 | |
dc.relation | Jonathan Chang et al. Reading Tea Leaves: How Humans Interpret Topic Models. Tech. rep. 2009. url: http://rexa.info. | |
dc.relation | Tomoaki Nakamura, Takayuki Nagai, and Naoto Iwahashi. “Grounding of word mean- ings in multimodal concepts using LDA”. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, Oct. 2009, pp. 3943–3948. isbn: 978-1-4244- 3803-7. doi: 10.1109/IROS.2009.5354736. url: http://ieeexplore.ieee.org/ document/5354736/. | |
dc.relation | Guangcan Liu et al. “Robust Recovery of Subspace Structures by Low-Rank Repre- sentation”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 35.1 (Oct. 2010), pp. 171–184. doi: 10.1109/TPAMI.2012.88. url: http://arxiv.org/ abs/1010.2955%20http://dx.doi.org/10.1109/TPAMI.2012.88. | |
dc.relation | James Petterson et al. Word Features for Latent Dirichlet Allocation. Tech. rep. 2010, pp. 1921–1929. | |
dc.relation | Nakatani Shuyo. Language Detection Library for Java. 2010. url: http : / / code . google.com/p/language-detection/. | |
dc.relation | Abhishek Kumar and Hal Daumé III. A Co-training Approach for Multi-view Spectral Clustering. Tech. rep. 2011. url: http://legacydirs.umiacs.umd.edu/~abhishek/ cospectral.icml11.pdf. | |
dc.relation | Abhishek Kumar, Piyush Rai, and Hal Daumé III. Co-regularized Multi-view Spectral Clustering. Tech. rep. 2011. | |
dc.relation | David Mimno et al. Optimizing Semantic Coherence in Topic Models. Tech. rep. 2011, pp. 262–272. url: https://www.aclweb.org/anthology/D11-1024.pdf. | |
dc.relation | Tomoaki Nakamura, Takayuki Nagai, and Naoto Iwahashi. “Bag of multimodal LDA models for concept formation”. In: 2011 IEEE International Conference on Robotics and Automation. IEEE, May 2011, pp. 6233–6238. isbn: 978-1-61284-386-5. doi: 10. 1109 / ICRA . 2011 . 5980324. url: http : / / ieeexplore . ieee . org / document / 5980324/. | |
dc.relation | Ehsan Elhamifar and Rene Vidal. “Sparse Subspace Clustering: Algorithm, Theory, and Applications”. In: IEEE Transactions on Pattern Analysis and Machine Intelli- gence 35.11 (Mar. 2012), pp. 2765–2781. url: http://arxiv.org/abs/1203.1005. | |
dc.relation | Jagadeesh Jagarlamudi, Hal Daumé Iii, and Raghavendra Udupa. Incorporating Lexical Priors into Topic Models. Tech. rep. 2012, pp. 204–213. url: https://www.aclweb. org/anthology/E12-1021.pdf. | |
dc.relation | Xiao Cai, Feiping Nie, and Heng Huang. Multi-View K-Means Clustering on Big Data. Tech. rep. 2013. url: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10. 1.1.415.8610&rep=rep1&type=pdf.76 Bibliography | |
dc.relation | Zhiyuan Chen et al. “Discovering Coherent Topics Using General Knowledge Data Mining View project Web-KDD-KDD Workshop Series on Web Mining and Web Usage Analysis View project Discovering Coherent Topics Using General Knowledge”. In: dl.acm.org (2013), pp. 209–218. doi: 10.1145/2505515.2505519. url: http://dx. doi.org/10.1145/2505515.2505519. | |
dc.relation | Zhiyuan Chen et al. “Leveraging Multi-Domain Prior Knowledge in Topic Models”. In: IJCAI International Joint Conference on Artificial Intelligence. Nov. 2013, pp. 2071– 2077. | |
dc.relation | Linmei Hu et al. “Incorporating entities in news topic modeling”. In: Communications in Computer and Information Science. Vol. 400. Springer Verlag, Nov. 2013, pp. 139– 150. isbn: 9783642416439. doi: 10.1007/978-3-642-41644-6{\_}14. url: https: //link.springer.com/chapter/10.1007/978-3-642-41644-6_14. | |
dc.relation | Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. “Linguistic Regularities in Contin- uous Space Word Representations”. In: June (2013), pp. 746–751. | |
dc.relation | Tomas Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. Tech. rep. 2013. url: http : / / papers . nips . cc / paper / 5021 - distributed-representations-of-words-and-phrases-and. | |
dc.relation | Tomas Mikolov et al. “Efficient estimation of word representations in vector space”. In: 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings. International Conference on Learning Representations, ICLR, Jan. 2013. | |
dc.relation | Konstantinos N. Vavliakis, Andreas L. Symeonidis, and Pericles A. Mitkas. “Event identification in web social media through named entity recognition and topic model- ing”. In: Data and Knowledge Engineering 88 (Nov. 2013), pp. 1–24. issn: 0169023X. doi: 10.1016/j.datak.2013.08.006. | |
dc.relation | Yuening Hu et al. “Interactive topic modeling”. In: Mach Learn 95 (2014), pp. 423– 469. doi: 10.1007/s10994- 013- 5413- 0. url: http://www.policyagendas.org/ page/topic-codebook.. | |
dc.relation | Yeqing Li et al. Large-Scale Multi-View Spectral Clustering with Bipartite Graph. Tech. rep. 2015. url: https://dl.acm.org/doi/10.5555/2886521.2886704. | |
dc.relation | Zechao Li et al. “Robust structured subspace learning for data representation”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 37.10 (Oct. 2015), pp. 2085–2098. issn: 01628828. doi: 10.1109/TPAMI.2015.2400461. url: https: //ieeexplore.ieee.org/document/7031960.Bibliography 77 | |
dc.relation | Andrew J. McMinn and Joemon M. Jose. “Real-time entity-based event detection for twitter”. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9283. Springer Verlag, 2015, pp. 65–77. isbn: 9783319240268. doi: 10.1007/978-3-319-24027-5{\_}6. url: https://link.springer.com/chapter/10.1007/978-3-319-24027-5_6. | |
dc.relation | John Paisley et al. “Nested hierarchical dirichlet processes”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 37.2 (Feb. 2015), pp. 256–270. issn: 01628828. doi: 10.1109/TPAMI.2014.2318728. url: https://ieeexplore.ieee. org/abstract/document/6802355. | |
dc.relation | Zhao Zhang et al. “Joint low-rank and sparse principal feature coding for enhanced robust representation and visual classification”. In: IEEE Transactions on Image Pro- cessing 25.6 (June 2016), pp. 2429–2443. issn: 10577149. doi: 10.1109/TIP.2016. 2547180. url: https://ieeexplore.ieee.org/document/7442126. | |
dc.relation | Mehdi Allahyari and Krys Kochut. “Discovering Coherent Topics with Entity Topic Models”. In: Proceedings - 2016 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2016. Institute of Electrical and Electronics Engineers Inc., Jan. 2017, pp. 26–33. isbn: 9781509044702. doi: 10.1109/WI.2016.0015. | |
dc.relation | Peixian Chen et al. “Latent Tree Models for Hierarchical Topic Detection”. In: Artificial Intelligence 250 (May 2017), pp. 105–124. url: http://arxiv.org/abs/1605.06650. | |
dc.relation | Zhourong Chen et al. Sparse Boltzmann Machines with Structure Learning as Applied to Text Analysis. Tech. rep. 2017. url: www.aaai.org. | |
dc.relation | Matthew Honnibal and Ines Montani. “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing”. 2017. | |
dc.relation | Ashish Vaswani et al. “Transformer: Attention is all you need”. In: Advances in Neu- ral Information Processing Systems 30 (2017), pp. 5998–6008. issn: 10495258. url: https://arxiv.org/abs/1706.03762. | |
dc.relation | Jing Zhao et al. “Multi-view learning overview: Recent progress and new challenges”. In: Information Fusion 38 (2017), pp. 43–54. issn: 15662535. doi: 10.1016/j.inffus. 2017.02.007. url: http://dx.doi.org/10.1016/j.inffus.2017.02.007. | |
dc.relation | Xiaojun Chen et al. “Spectral clustering of large-scale data by directly solving normal- ized cut”. In: Proceedings of the ACM SIGKDD International Conference on Knowl- edge Discovery and Data Mining. Association for Computing Machinery, July 2018, pp. 1206–1215. isbn: 9781450355520. doi: 10.1145/3219819.3220039. url: https: //dl.acm.org/doi/10.1145/3219819.3220039. | |
dc.relation | Jacob Devlin et al. “BERT: Pre-training of Deep Bidirectional Transformers for Lan- guage Understanding”. In: (Oct. 2018). url: http://arxiv.org/abs/1810.04805.78 Bibliography | |
dc.relation | Zhao Kang et al. “Multi-graph Fusion for Multi-view Spectral Clustering”. In: Knowledge- Based Systems 189 (Sept. 2019). url: http://arxiv.org/abs/1909.06940. | |
dc.relation | Alec Radford et al. “Language Models are Unsupervised Multitask Learners”. In: (2019). url: http://www.persagen.com/files/misc/radford2019language.pdf. [54] Tom B. Brown et al. “Language Models are Few-Shot Learners”. In: arXiv (May 2020). url: http://arxiv.org/abs/2005.14165. | |
dc.rights | Reconocimiento 4.0 Internacional | |
dc.rights | http://creativecommons.org/licenses/by/4.0/ | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.title | Multi-view learning for hierarchical topic detection on corpus of documents | |
dc.type | Trabajo de grado - Maestría | |