A Reliable Method to Reduce Observations and Variables
Registration in:
T016300000194/0
Author
Colmenares L., Gerardo A.
Pérez, Rafael
Institutions
Abstract
A Reliable Method to Reduce Observations and Variables
when building Neural Network Models
(Gerardo Colmenares; Rafael Pérez)
Abstract
This paper describes a method to reduce the number of observations and
variables of large data sets so that reliable neural network models can be built using this
data and the time to build these models can be reduced. This method can also be used to
select, from an original data set, representative data to train, test, and validate models.
This method applies stratification and principal component analysis to select representative
observations and to eliminate redundant variables. The performance of neural network
models built using reduced data sets provided by this method is very similar to that of
neural network models built using the entire data set. The performance is also
significantly better and more consistent than that of neural networks built using data sets
reduced in a random fashion. A comparison using the stratification method alone and
using stratification plus principal component analysis to reduce the data set is also
included. gcolmen@ula.ve perez@csee.usf.edu Nivel monográfico