dc.contributorSidorov, Grigori
dc.contributorAdeel Nawab, Rao Muhammad
dc.creatorAmeer, Iqra
dc.date.accessioned2018-03-22T22:47:36Z
dc.date.accessioned2023-06-28T22:02:46Z
dc.date.available2018-03-22T22:47:36Z
dc.date.available2023-06-28T22:02:46Z
dc.date.created2018-03-22T22:47:36Z
dc.date.issued2018-03-08
dc.identifierAmeer, Iqra. (2017). Cross genre author profilling using syntactic N-Grams. (Maestría en Ciencias de la Computación). Instituto Politécnico Nacional, Centro de Investigación en Computación, México.
dc.identifierhttp://tesis.ipn.mx/handle/123456789/24319
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/7128826
dc.description.abstractABSTRACT: The process of automatic identification of an author’s demographic traits like gender, age, native language, geographical location, personality type and others from his/her written text is termed as author profiling. We are living in the era where technology is growing rapidly and arising many challenging problems for researchers one of the such problems is author profiling. The problem of author profiling has become an important problem in the fields like linguistic forensics, marketing and security. Now most of the text is online. People write and share their opinions and ideas behind the curtain of anonymity. In recent years, online social setups like Twitter, Facebook, Blogs, Hotels Review etc have extended remarkably and have allowed lots of users of all age groups to develop and support personal and professional relations. However, a shared characteristic of these digital bodies is that it is easy to provide a wrong name, age, gender and location in order to hide one’s true identity, providing criminals such as pedophiles with new options to prepare their victims. So, the aim of this research is to predict the demographic traits of the authors for a benchmark existing corpus based on Twitter, Hotel Reviews, Social Media and Blogs’ profiles. We explored state of the art techniques for detecting three autor traits including age and gender. We used four set of features including Syntactic n-grams of part-of-speech tags, Traditional n-grams of part-of-speech tags, Combinations of word n-grams, Combinations of character n-grams. To detect an author’s demographic information from his content we applied information gain as feature selection method to select most discriminated set of features. We used word uni-gram and character three-gram as baseline approach. We compared our results with baseline and state-of-the-art results on the same corpora as well. Evaluation was carried out using accuracy measure. Results showed that these approaches are useful in detecting different author traits and performance improves when Combination of word n-grams used.
dc.languageen
dc.subjectIdentificación automática
dc.subjectLocalización geográfica
dc.titleCross genre author profilling using syntactic N-Grams


Este ítem pertenece a la siguiente institución