Le forage distribué des données : une approche basée sur l'agrégation et le raffinement de modèles

Authors: Aoun-Allah, Mohamed
Advisor: Mineau, Guy W.
Abstract: With the pervasive use of computers in all spheres of activity in our society, we are faced nowadays with the explosion of electronic data. This is why we need automatic tools that are able to automatically analyze the data in order to provide us with relevant and summarized information with respect to some query. For this task, data mining techniques are generally used. However, these techniques require considerable computing time in order to analyze a huge volume of data. Moreover, if the data is geographically distributed, gathering it on the same site in order to create a model (a classifier for instance) could be time consuming. To solve this problem, we propose to build several models, that is one classifier by site. Then, rules constituting these classifiers are aggregated and filtered based on some statistical measures, and a validation process is carried out on samples from each site. The resulting model, called a metaclassifier is, on one hand, a prediction tool for any new (unseen) instance and, on the other hand, an abstract view of the whole data set. We base our rule filtering approach on a confidence measure associated with each rule, which is computed statistically and then validated using the data samples (one from each site). We considered several validation techniques such as will be discussed in this thesis.
Document Type: Thèse de doctorat
Issue Date: 2006
Open Access Date: 12 April 2018
Permalink: http://hdl.handle.net/20.500.11794/18746
Grantor: Université Laval
Collection:Thèses et mémoires

Files in this item:
Description SizeFormat 
23393.pdfTexte1.34 MBAdobe PDFThumbnail
All documents in CorpusUL are protected by Copyright Act of Canada.