Machine Learning methods for genome enabled prediction of complex traits : Benchmarking and robustness to marker elimination

Elenter, Juan; Etchebarne, Guillermo; Hounie, Ignacio; Fariello, María Inés; Lecumberry, Federico

dc.contributor	Elenter Juan, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.contributor	Etchebarne Guillermo, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.contributor	Hounie Ignacio, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.contributor	Fariello María Inés, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.contributor	Lecumberry Federico, Universidad de la República (Uruguay). Facultad de Ingeniería.
dc.creator	Elenter, Juan
dc.creator	Etchebarne, Guillermo
dc.creator	Hounie, Ignacio
dc.creator	Fariello, María Inés
dc.creator	Lecumberry, Federico
dc.date.accessioned	2023-04-26T11:47:08Z
dc.date.accessioned	2023-07-13T17:32:02Z
dc.date.available	2023-04-26T11:47:08Z
dc.date.available	2023-07-13T17:32:02Z
dc.date.created	2023-04-26T11:47:08Z
dc.date.issued	2021
dc.identifier	Elenter, J., Etchebarne, G., Hounie, I. y otros. Machine Learning methods for genome enabled prediction of complex traits [en línea]. Póster, 2021.
dc.identifier	https://meetings.cshl.edu/meetings.aspx?meet=PROBGEN&year=21
dc.identifier	https://hdl.handle.net/20.500.12008/36814
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/7425354
dc.description.abstract	A plethora of machine learning and statistical methods have been applied in the context of genome enabled prediction. Here we address the prediction of complex traits from SNP marker data in agriculture. The datasets used present different levels of trait complexity. These are: Yeast yield, Holstein cattle milk yield, German bulls Sire Conception Rate, and Wheat yield. Population structure, number of samples and SNPs also vary among datasets. We benchmark several popular models including bayesian and penalized linear regressions, kernel methods, and decision tree ensembles. Through exhaustive hyperparameter tuning we outperform state-of-the-art results in all datasets.Furthermore, we compare two genome codifications: One hot encoding and Additive encoding, the latter being the standard codification used in quantitative genetics. We show that, in these datasets, additive encoding outperforms categorical encodings despite the fact that the variables are categorical in nature. This difference in performance may be caused by the predominance of additive effects, the dimensionality increase and the loss of the one-to -one correspondence between variables and biological markers. Regarding robustness to random marker elimination, we found that on all datasets most models present a negligible loss in predictive power even when trained on a small, random sample of markers. We argue that sample size limits the amount of SNPs which are informative with respect to the downstream prediction task.
dc.language	en
dc.publisher	Cold Spring Harbor Laboratory (CSHL)
dc.relation	Probabilistic Modeling in Genomics : Virtual Meeting, 14-16 apr. 2021, Cold Spring Harbor, NY, USA.
dc.rights	Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
dc.rights	Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)
dc.subject	Genomic prediction
dc.subject	Machine learning
dc.subject	Dimensionality reduction
dc.title	Machine Learning methods for genome enabled prediction of complex traits : Benchmarking and robustness to marker elimination
dc.type	Póster

Este ítem pertenece a la siguiente institución

Universidad de la República (Uruguay)