Artículos de revistas
Software for the detection of outliers and influential points based on the HAT method
Fecha
2017-04-01Registro en:
Australian Journal of Crop Science, v. 11, n. 4, p. 459-463, 2017.
1835-2707
1835-2693
10.21475/ajcs.17.11.04.356
2-s2.0-85018362256
Autor
Universidade Estadual Paulista (Unesp)
Institución
Resumen
We developed software in Visual Basic for application in Microsoft Excel that identifies outliers (OUTs) and influential datapoints (IPs) of scattered data using the HAT method (Hoaglin and Welsch). OUTs are commonly identified visually, which is susceptible to errors. The identification of IPs is not trivial, and using statistical tests is necessary. HAT is the most common statistical method to select OUTs and IPs in regression analyses and identifies four groups of data: 1) data within the standard range of variability, 2) OUTs, 3) IPs, and 4) both OUTs and IPs (OUT+IPs). The decision to remove or not remove data from the database depends on the researcher, and the HAT method helps to make these decisions. The removal of an OUT usually improves the accuracy of models. The removal of IPs, however, may or may not improve the accuracy. A small hypothetical data set of rainfall from automatic and conventional rain gauges was used to extensively test the software. The amount of data that can be used in the software is limited by the number of lines of the Excel spreadsheet (65 518). The first step in identifying OUTs and IPs is to analyse all the data, which produced an R2 for the raw data in our example of 0.11, indicating weak relationships between the variables. The HAT test identified two OUTs, three IPs, and one OUT+IP in the data. If all OUTs were removed, R2 would increase to 0.19. If the OUT+IP was removed, R2 would increase to 0.86. If all IPs were also removed, R2 would decrease to 0.45. The software is free and can be requested by email from reinaldojmoraes@gmail.com.