Catégorisation automatique de textes et cooccurrence de mots provenant de documents non étiquetés

Authors: Réhel, Simon
Advisor: Mineau, Guy W.
Abstract: Automated text categorization consists of developing computer programs able to autonomously assign texts to predefined categories, on the basis of their content. Such applications are possible thanks to supervised learning, which implies a training phase on manually labeled documents. However, the construction of a training set is long and expensive. This study suggests a way to assist text classifiers in the gathering of the vocabulary when the size of the training set is limited. So, it is proposed to analyze word cooccurrence inside a text collection of many non-labeled documents, to augment the vocabulary produced by the analysis of the labeled texts. The representation of new documents to classify can then be modified in order to better match the vocabulary used by the classifier. What is expected, of course, is an improvement of its ability to categorize texts.
Document Type: Mémoire de maîtrise
Issue Date: 2005
Open Access Date: 12 April 2018
Grantor: Université Laval
Collection:Thèses et mémoires

Files in this item:
Description SizeFormat 
22376.pdfTexte627.06 kBAdobe PDFThumbnail
All documents in CorpusUL are protected by Copyright Act of Canada.