info:eu-repo/semantics/article
Biases with the generalized euclidean distance measure in disparity analyses with high levels of missing data
Fecha
2019-05Registro en:
Lehmann, Oscar Emilio Rodrigo; Ezcurra, Martin Daniel; Butler, Richard J.; Lloyd, Graeme Thomas; Biases with the generalized euclidean distance measure in disparity analyses with high levels of missing data; Wiley Blackwell Publishing, Inc; Palaeontology; 62; 5; 5-2019; 837-849
0031-0239
CONICET Digital
CONICET
Autor
Lehmann, Oscar Emilio Rodrigo
Ezcurra, Martin Daniel
Butler, Richard J.
Lloyd, Graeme Thomas
Resumen
The Generalized Euclidean Distance (GED) measure has been extensively used to conduct morphological disparity analyses based on palaeontological matrices of discrete characters. This is in part because some implementations allow the use of morphological matrices with high percentages of missing data without needing to prune taxa for a subsequent ordination of the data set. Previous studies have suggested that this way of using the GED may generate a bias in the resulting morphospace, but a detailed study of this possible effect has been lacking. Here, we test whether the percentage of missing data for a taxon artificially influences its position in the morphospace, and if missing data affects pre- and post-ordination disparity measures. We find that this use of the GED creates a systematic bias, whereby taxa with higher percentages of missing data are placed closer to the centre of the morphospace than those with more complete scorings. This bias extends into pre- and post-ordination calculations of disparity measures and can lead to erroneous interpretations of disparity patterns, especially if specimens present in a particular time interval or clade have distinct proportions of missing information. We suggest that this implementation of the GED should be used with caution, especially in cases with high percentages of missing data. Results recovered using an alternative distance measure, Maximum Observed Rescaled Distance (MORD), are more robust to missing data. As a consequence, we suggest that MORD is a more appropriate distance measure than GED when analysing data sets with high amounts of missing data.