Control Engineering of China ›› 2019, Vol. 26 ›› Issue (11): 2136-2140.

Previous Articles     Next Articles

Frequent Itemset Mining Using Prefix Tree in Big Data Environment

  

  • Online:2019-11-20 Published:2023-11-29

大数据环境下基于前缀树的频繁项集挖掘

  

Abstract:  For the problems of low efficiency and scalability in frequent itemset mining, a new distributed FIM algorithm is proposed, and implements it on MapReduce framework. Firstly, the algorithm applies the idea of prefix sequence to construct a tree, by which all frequent itemsets can be found without exhaustive search over the transaction databases. Then, it produces frequent itemsets in a breadth-wide support-based approach. In each MapReduce iteration, the infrequent itemsets will be pruned away. It significantly deducts memory consumption and iteration time of each MapReduce job. Finally, the experimental comparison with different algorithms is performed under different scales of business and support degree. The results show the good efficiency and scalability of sequence-growth especially for dealing with big data and long itemsets.

Key words: Frequent itemset mining, MapReduce, prefix sequence tree, fuzzy support; big data

摘要: 针对大数据环境下频繁项查找效率低和可扩展性问题,提出了一种基于MapReduce框架运行的新分布式FIM算法。首先,使用前缀序列树来构建候选序列子集,避免了昂贵的扫描过程。接着,使用宽幅支持度的方法产生频繁项集,每个MapReduce迭代将修剪掉非频繁项集,显著地压缩内存消耗,以及每一个MapReduce作业的迭代时间。最后,在不同事务规模和支持度下,与不同算法进行实验对比。实验结果表明,提出的序列增长算法获得了良好的效率和可扩展性,特别是在处理大数据集和长项集方面。

关键词: 频繁项集挖掘, MapReduce, 前缀序列树, 模糊支持度, 大数据