Statistical machine-learning methods for genomic prediction using the SKM library

Montesinos-Lopez, O.A.; Mosqueda-Gonzalez, B.A.; Montesinos-Lopez, A.; Crossa, J.

dc.creator	Montesinos-Lopez, O.A.
dc.creator	Mosqueda-Gonzalez, B.A.
dc.creator	Montesinos-Lopez, A.
dc.creator	Crossa, J.
dc.date	2023-06-22T20:10:11Z
dc.date	2023-06-22T20:10:11Z
dc.date	2023
dc.date.accessioned	2023-07-17T20:10:37Z
dc.date.available	2023-07-17T20:10:37Z
dc.identifier	https://hdl.handle.net/10883/22617
dc.identifier	10.3390/genes14051003
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/7514360
dc.description	Genomic selection (GS) is revolutionizing plant breeding. However, because it is a predictive methodology, a basic understanding of statistical machine-learning methods is necessary for its successful implementation. This methodology uses a reference population that contains both the phenotypic and genotypic information of genotypes to train a statistical machine-learning method. After optimization, this method is used to make predictions of candidate lines for which only genotypic information is available. However, due to a lack of time and appropriate training, it is difficult for breeders and scientists of related fields to learn all the fundamentals of prediction algorithms. With smart or highly automated software, it is possible for these professionals to appropriately implement any state-of-the-art statistical machine-learning method for its collected data without the need for an exhaustive understanding of statistical machine-learning methods and programing. For this reason, we introduce state-of-the-art statistical machine-learning methods using the Sparse Kernel Methods (SKM) R library, with complete guidelines on how to implement seven statistical machine-learning methods that are available in this library for genomic prediction (random forest, Bayesian models, support vector machine, gradient boosted machine, generalized linear models, partial least squares, feed-forward artificial neural networks). This guide includes details of the functions required to implement each of the methods, as well as others for easily implementing different tuning strategies, cross-validation strategies, and metrics to evaluate the prediction performance and different summary functions that compute it. A toy dataset illustrates how to implement statistical machine-learning methods and facilitate their use by professionals who do not possess a strong background in machine learning and programing.
dc.language	English
dc.publisher	MDPI
dc.relation	https://www.mdpi.com/article/10.3390/genes14051003/s1
dc.rights	CIMMYT manages Intellectual Assets as International Public Goods. The user is free to download, print, store and share this work. In case you want to translate or create any other derivative work and share or distribute such translation/derivative work, please contact CIMMYT-Knowledge-Center@cgiar.org indicating the work you want to use and the kind of use you intend; CIMMYT will contact you with the suitable license for that purpose
dc.rights	Open Access
dc.source	5
dc.source	14
dc.source	2073-4425
dc.source	Genes
dc.source	1003
dc.subject	AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
dc.subject	Sparse Kernel Methods
dc.subject	R package
dc.subject	Statistical Machine Learning
dc.subject	Genomic Selection
dc.subject	MARKER-ASSISTED SELECTION
dc.subject	MACHINE LEARNING
dc.subject	GENOMICS
dc.subject	METHODS
dc.subject	Genetic Resources
dc.title	Statistical machine-learning methods for genomic prediction using the SKM library
dc.type	Article
dc.type	Published Version
dc.coverage	Basel (Switzerland)

Este ítem pertenece a la siguiente institución

Centro Internacional de Mejoramiento de Maíz y Trigo (México)