Article
Machine learning approach to support taxonomic species discrimination based on helminth collections data
Registro en:
BORBA, Victor Hugo et al. Machine learning approach to support taxonomic species discrimination based on helminth collections data. Parasites & Vectors, v. 14, n. 230, 15 p, 2021.
1756-3305
10.1186/s13071-021-04721-6
Autor
Borba, Victor Hugo
Martin, Coralie
Silva, José Roberto Machado
Xavier, Samanta C. C.
Mello, Flávio L. de
Iñiguez, Alena Mayo
Resumen
Background: There are more than 300 species of capillariids that parasitize various vertebrate groups worldwide.
Species identifcation is hindered because of the few taxonomically informative structures available, making the task
laborious and genus defnition controversial. Thus, its taxonomy is one of the most complex among Nematoda. Eggs
are the parasitic structures most viewed in coprological analysis in both modern and ancient samples; consequently,
their presence is indicative of positive diagnosis for infection. The structure of the egg could play a role in genera or
species discrimination. Institutional biological collections are taxonomic repositories of specimens described and
strictly identifed by systematics specialists.
Methods: The present work aims to characterize eggs of capillariid species deposited in institutional helminth col‑
lections and to process the morphological, morphometric and ecological data using machine learning (ML) as a new
approach for taxonomic identifcation. Specimens of 28 species and 8 genera deposited at Coleção Helmintológica
do Instituto Oswaldo Cruz (CHIOC, IOC/FIOCRUZ/Brazil) and Collection de Nématodes Zooparasites du Muséum
National d’Histoire Naturelle de Paris (MNHN/France) were examined under light microscopy. In the morphological
and morphometric analyses (MM), the total length and width of eggs as well as plugs and shell thickness were con‑
sidered. In addition, eggshell ornamentations and ecological parameters of the geographical location (GL) and host
(H) were included.
Results: The performance of the logistic model tree (LMT) algorithm showed the highest values in all metrics com‑
pared with the other algorithms. Algorithm J48 produced the most reliable decision tree for species identifcation
alongside REPTree. The Majority Voting algorithm showed high metric values, but the combined classifers did not
attenuate the errors revealed in each algorithm alone. The statistical evaluation of the dataset indicated a signifcant
diference between trees, with GL+H+MM and MM only with the best scores.
Conclusions: The present research proposed a novel procedure for taxonomic species identifcation, integrating data
from centenary biological collections and the logic of artifcial intelligence techniques. This study will support future
research on taxonomic identifcation and diagnosis of both modern and archaeological capillariids.