dc.contributorIzbicki, Rafael
dc.contributorhttp://lattes.cnpq.br/9991192137633896
dc.contributorhttp://lattes.cnpq.br/5022046007587066
dc.creatorVaz, Afonso Fernandes
dc.date.accessioned2018-07-18T17:28:34Z
dc.date.available2018-07-18T17:28:34Z
dc.date.created2018-07-18T17:28:34Z
dc.date.issued2018-05-17
dc.identifierVAZ, Afonso Fernandes. Quantificação em problemas com mudança de domínio. 2018. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2018. Disponível em: https://repositorio.ufscar.br/handle/ufscar/10300.
dc.identifierhttps://repositorio.ufscar.br/handle/ufscar/10300
dc.description.abstractSeveral machine learning applications use classifiers as a way of quantifying the prevalence of positive class labels in a target dataset, a task named quantification. For instance, a naive way of determining what proportion of positive reviews about given product in the Facebook with no labeled reviews is to (i) train a classifier based on Google Shopping reviews to predict whether a user likes a product given its review, and then (ii) apply this classifier to Facebook posts about that product. Unfortunately, it is well known that such a two-step approach, named Classify and Count, fails because of data set shift, and thus several improvements have been recently proposed under an assumption named prior shift. However, these methods only explore the relationship between the covariates and the response via classifiers and none of them take advantage of the fact that one often has access to a few labeled samples in the target set. Moreover, the literature lacks in approaches that can handle a target population that varies with another covariate; for instance: How to accurately estimate how the proportion of new posts or new webpages in favor of a political candidate varies in time? We propose novel methods that fill these important gaps and compare them using both real and artificial datasets. Finally, we provide a theoretical analysis of the methods.
dc.languageeng
dc.publisherUniversidade Federal de São Carlos
dc.publisherUFSCar
dc.publisherPrograma Interinstitucional de Pós-Graduação em Estatística - PIPGEs
dc.publisherCâmpus São Carlos
dc.rightsAcesso aberto
dc.subjectQuantificação
dc.subjectMudança de domínio
dc.subjectAprendizado de máquina
dc.subjectQuantification
dc.subjectDataset shift
dc.subjectPrior shift
dc.subjectMachine Learning
dc.titleQuantificação em problemas com mudança de domínio
dc.typeTesis


Este ítem pertenece a la siguiente institución