Hardware architecture for frequent itemset mining in static datasets using a segmentation strategy

MAURO MARTIN LETRAS LUNA

Tesis

Registro en:

http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/214

http://repositorioslatinoamericanos.uchile.cl/handle/2250/2258362

Autor

MAURO MARTIN LETRAS LUNA

Institución

Conacyt (México)

Resumen

In recent years there has been a significant increase in the information generated from distinct domains and the size of datasets overwhelm the human capacity to process them and obtain valuable information. Because of this, Data Mining has emerged as a set of techniques and algorithms dedicated to finding patterns in datasets, and then these patterns are used to classify or predict the behavior of some phenomena related to the data. Association Rules Mining is an important branch inside Data Mining, and it consists in finding relationships among the data in the form of implication rules. The problem is usually decomposed into two subproblems. One is to find those itemsets whose occurrences exceed a predefined threshold in the database; those itemsets are called frequent itemsets. The second problem is to generate association rules from those frequent itemsets. In this research, Frequent Itemset Mining is explored, because the huge amount of data in some cases makes dificult to obtain a response in an acceptable time according to the application requirements, due to the exhaustive nature of the problem. There are many algorithms dedicated to searching frequent itemsets, the most widely used are: Apriori, FP-Growth, and Eclat. They use strategies like breadth-first search and depth-first search to go over to the search space. They have to do a search in datasets, some of them like Apriori, have to access many times the dataset. FP-Growth reads the dataset twice, but it must keep in memory large amounts of data. Frequent Itemset Mining is an exhaustive task since the database must be read many times independently of the way in which the data is stored (in main memory or hard disk). In the literature, there have been reported two ways to accelerate Frequent Itemset Mining: the first one consists in improving the existing software algorithms through proposing new heuristics to save time, and the second one consists in developing hardware architectures dedicated to this task. The main goal of this research is to design a Hardware Architecture to accelerate the Frequent Itemsets Mining process. A segmentation strategy is proposed using equivalence classes to guarantee that all the frequent itemsets will be found independently of the available hardware resources. An implementation in FPGA willbe carried out to validate the proposed architecture and compare it with software only implementations.

Materias

info:eu-repo/classification/Arquitectura de hardware/Hardware architecture

info:eu-repo/classification/Frequent hemset/Frequent hemset

info:eu-repo/classification/FPGA/FPGA

info:eu-repo/classification/cti/1

info:eu-repo/classification/cti/12

info:eu-repo/classification/cti/1203

Mostrar el registro completo del ítem