Hardware acceleration of frequent itemsets mining on data streams
LAZARO BUSTIO MARTINEZ
In recent times, processing of data streams is gaining the attention of the scientific community due to its practical applications, e.g. network traffic analysis and surveillance, sensor networks control, and financial transactions mining, among others. Data streams are unbounded and infinite flows of data arriving at high rates and; therefore, traditional Data Mining approaches (such as Data Classification, Clustering, and Frequent Itemsets Mining among others) cannot be used straightforwardly. Frequent Itemsets Mining is a Data Mining technique focused on obtaining hidden patterns in data, and it has been used to extract useful knowledge from various data sources, recently including data streams. Finding alternatives to improve the processing time in the discovery of frequent itemsets on data streams is an active research topic. One option to improve the discovery of frequent itemsets is to develop single-pass parallel algorithms that can be implemented in hardware architectures (taking advantage of the inherent parallelism of such devices). Two methods for Frequent Itemsets Mining on data streams are proposed in this dissertation. The first method is based on systolic trees, and the other relies on hash functions and the lexicographic order of items received in transactions of data streams. Both methods fulfill the Continuity, Expiration and Infinity constraints of data streams. The general problem of finding frequent itemsets on data streams can be divided into four subproblems: 1. Data streams are composed of long transactions in a large alphabet. 2. Data streams are composed of long transactions in a small alphabet. 3. Data streams are composed of short transactions in a small alphabet. 4. Data streams are composed of short transactions in a large alphabet. The methods reported in this dissertation are oriented to solve the second and fourth subproblems, and by solving these two subproblems, the third subproblem is indirectly also solved. The first subproblem was not faced because it has been attacked from the software perspective. The method based on systolic tree faces the second subproblem by implementing a hardware architecture based on a tree-like data structure where transactions flow from the top to the bottom and from the left to the right concurrently without interruptions. This method outperforms the processing time of the algorithms reported in the state-of-the-art using software and hardware development platforms and can process the incom
Showing items related by title, author, creator and subject.
Material de empaque para biofiltración con base en poliuretano modificado con almidón, metodos para la manufactura del mismo y sistema de biofiltración OLGA BRIGIDA GUTIERREZ ACOSTA; VLADIMIR ALONSO ESCOBAR BARRIOS; SONIA LORENA ARRIAGA GARCIA
Preparación de materiales compuestos mediante la funcionalización covalente de nanotubos de carbono (CNTs) con polímero CLAUDIA GUADALUPE ESPINOSA GONZALEZ
Attachment of transition metal nanoparticles on nitrogen doped carbon nanotubes (MWNTs-CNx) and their further reactions. XAVIER NORBERTO LEPRÓ CHÁVEZ