Dissertação
Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas
Fecha
2017-02-20Autor
Olivoto, Tiago
Institución
Resumen
Some data arrangement methods currently used may overestimate Pearson correlation coefficient (r) among explanatory traits, increasing multicollinearity in analysis that uses multiple regression. In this sense, the aims of the present research were to reveal the impact of different data arrangement scenarios on the multicollinearity of matrices, on the efficiency of the used methods to adjust it, on the estimates of coefficients and accuracy of the path analysis, as well as to use simulations to reveal the statistical behavior of the r and the optimal sample size for estimating r between maize traits. For this, data from an experiment conducted in a randomized complete design in a 15 × 3 factorial scheme (15 maize hybrids × three growing sites), arranged in four replicates were used. The traits analyzed in five plants of each plot were: plant height, ear insertion height, diameter and length of ear, number of rows per ear, number of kernels per row, diameter and length of cob, cob diameter/ear diameter ratio, number of kernels per ear, kernel mass per ear and thousand-kernel weight. At first, three path analysis methods (traditional, with k inclusion and with the exclusion of traits) having as a dependent trait the kernel mass per ear were tested in two scenarios: 1) with the linear correlation matrix (X’X) between the traits estimated with all sampled observations, n = 900 and 2) with the X’X matrix estimated with the average value of the five sampled plants in each plot, n = 180. Subsequently, aiming to evaluate the statistical behavior of r, in addition to the two described scenarios, the average value of treatments at each site, n = 45, was also considered. In each scenario, 60 sample sizes were simulated by using bootstrap simulations with replacement. Confidence intervals for combinations of different magnitudes were estimated in each scenario and sample size. One hundred and eighty correlation matrices (three scenarios × 60 sample sizes) were estimated and the multicollinearity evaluated. The number of kernels per ear and the thousand-kernel weight presented the most expressive direct effects to kernel mass per ear (r = 0.892 and r = 0.733, respectively). The use of average values reduces the individual variance of a set of n-traits, overestimates the magnitude of the r between the trait pairs, increases the multicollinearity of the matrix, and reduces the effectiveness of the used methods to adjust it as well as the accuracy of the path coefficient estimates. The number of plants required to estimate correlation coefficients with a 95% bootstrap confidence interval is greater when all sampled observations are used and increases in the sense of combination pairs with lower magnitude. By using all sampled observations, 210 plants are sufficient to estimate r between traits of simple maize hybrids in the 95% bootstrap confidence interval < 0.30. A simple method that reduces the multicollinearity of matrices and improves the accuracy of path analysis is proposed.