dc.contributorElenter Juan, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.contributorEtchebarne Guillermo, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.contributorHounie Ignacio, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.contributorFariello María Inés, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.contributorLecumberry Federico, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.creatorElenter, Juan
dc.creatorEtchebarne, Guillermo
dc.creatorHounie, Ignacio
dc.creatorFariello, María Inés
dc.creatorLecumberry, Federico
dc.date.accessioned2023-04-26T11:47:08Z
dc.date.accessioned2023-07-13T17:32:02Z
dc.date.available2023-04-26T11:47:08Z
dc.date.available2023-07-13T17:32:02Z
dc.date.created2023-04-26T11:47:08Z
dc.date.issued2021
dc.identifierElenter, J., Etchebarne, G., Hounie, I. y otros. Machine Learning methods for genome enabled prediction of complex traits [en línea]. Póster, 2021.
dc.identifierhttps://meetings.cshl.edu/meetings.aspx?meet=PROBGEN&year=21
dc.identifierhttps://hdl.handle.net/20.500.12008/36814
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/7425354
dc.description.abstractA plethora of machine learning and statistical methods have been applied in the context of genome enabled prediction. Here we address the prediction of complex traits from SNP marker data in agriculture. The datasets used present different levels of trait complexity. These are: Yeast yield, Holstein cattle milk yield, German bulls Sire Conception Rate, and Wheat yield. Population structure, number of samples and SNPs also vary among datasets. We benchmark several popular models including bayesian and penalized linear regressions, kernel methods, and decision tree ensembles. Through exhaustive hyperparameter tuning we outperform state-of-the-art results in all datasets.Furthermore, we compare two genome codifications: One hot encoding and Additive encoding, the latter being the standard codification used in quantitative genetics. We show that, in these datasets, additive encoding outperforms categorical encodings despite the fact that the variables are categorical in nature. This difference in performance may be caused by the predominance of additive effects, the dimensionality increase and the loss of the one-to -one correspondence between variables and biological markers. Regarding robustness to random marker elimination, we found that on all datasets most models present a negligible loss in predictive power even when trained on a small, random sample of markers. We argue that sample size limits the amount of SNPs which are informative with respect to the downstream prediction task.
dc.languageen
dc.publisherCold Spring Harbor Laboratory (CSHL)
dc.relationProbabilistic Modeling in Genomics : Virtual Meeting, 14-16 apr. 2021, Cold Spring Harbor, NY, USA.
dc.rightsLicencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
dc.rightsLas obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)
dc.subjectGenomic prediction
dc.subjectMachine learning
dc.subjectDimensionality reduction
dc.titleMachine Learning methods for genome enabled prediction of complex traits : Benchmarking and robustness to marker elimination
dc.typePóster


Este ítem pertenece a la siguiente institución