You are here: Home » News » Albert Bifet defends his Ph.D. thesis
Document Actions

Albert Bifet defends his Ph.D. thesis

Share Share
Today, april 24th, Albert Bifet defended his Ph. D. thesis "Adaptive Learning and Mining for Data Streams and Frequent Patterns". It received the grade "excel·lent cum laude" (outstanding with praise).

The jury members were Jost N. Kok (Leiden U.), Lluís Belanche (UPC), Joao Gama (U. Porto), Minos Garofalakis (Tech. U of Crete), and Christian Borgelt (European Center for Soft Computing, Mieres). The advisors are Ricard Gavaldà and José L. Balcázar

Congratulations to the new doctor!



Thesis summary:


This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees. First, we deal with each of these tasks separately, and then we deal with them together, developing classification methods for data streams containing items that are trees.

In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time. In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time. Our methods are based on using change detectors and estimator modules at the right places. We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data. Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms. We test our methodology with several learning methods as Na¨ıve Bayes, clustering, decision trees and ensemble methods. We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks.

 Trees are connected acyclic graphs and they are studied as link-based structures in many cases. In the second part of this thesis, we describe a rather formal study of trees from the point of view of closure-based mining. Moreover, we present efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees. We include an analysis of the extraction of association rules of full confidence out of the closed sets of trees, and we have found there an interesting phenomenon: rules whose propositional counterpart is nontrivial are, however, always implicitly true in trees due to the peculiar combinatorics of the structures.

 Finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted trees adaptively from data streams that change over time. We introduce a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory. Using this methodology, we then develop an incremental one, a sliding-window based one, and finally one that mines closed trees adaptively from data streams. We use these methods to develop classification methods for tree data streams.

last modified : January 2010

News

RSS RSS  About this web  Accessibility Laboratory for Relational Algorithmics, Complexity and Learning. LARCA.
© UPC (open in new window). Universitat Politècnica de Catalunya BarcelonaTech