ARTÍCULO
Water quality assessment with emphasis in parameter optimisation using pattern recognition methods and genetic algorithm
Fecha
2017Autor
Sotomayor Valarezo, Gonzalo Patricio
Hampel , Henrietta
Vazquez Zambrano, Raul Fernando
Institución
Resumen
A non-supervised (k-means) and a supervised (k-Nearest Neighbour in combination with genetic algorithm
optimisation, k-NN/GA) pattern recognition algorithms were applied for evaluating and interpreting
a large complex matrix of water quality (WQ) data collected during five years (2008, 2010e2013)
in the Paute river basin (southern Ecuador). 21 physical, chemical and microbiological parameters
collected at 80 different WQ sampling stations were examined. At first, the k-means algorithm was
carried out to identify classes of sampling stations regarding their associated WQ status by considering
three internal validation indexes, i.e., Silhouette coefficient, Davies-Bouldin and Cali nski-Harabasz. As a
result, two WQ classes were identified, representing low (C1) and high (C2) pollution. The k-NN/GA
algorithm was applied on the available data to construct a classification model with the two WQ classes,
previously defined by the k-means algorithm, as the dependent variables and the 21 physical, chemical
and microbiological parameters being the independent ones. This algorithm led to a significant reduction
of the multidimensional space of independent variables to only nine, which are likely to explain most of
the structure of the two identified WQ classes. These parameters are, namely, electric conductivity, faecal
coliforms, dissolved oxygen, chlorides, total hardness, nitrate, total alkalinity, biochemical oxygen demand
and turbidity. Further, the land use cover of the study basin revealed a very good agreement with
the WQ spatial distribution suggested by the k-means algorithm, confirming the credibility of the main
results of the used WQ data mining approach.