Catégorisation automatique de textes et cooccurrence de mots provenant de documents non étiquetés
|Abstract:||Automated text categorization consists of developing computer programs able to autonomously assign texts to predefined categories, on the basis of their content. Such applications are possible thanks to supervised learning, which implies a training phase on manually labeled documents. However, the construction of a training set is long and expensive. This study suggests a way to assist text classifiers in the gathering of the vocabulary when the size of the training set is limited. So, it is proposed to analyze word cooccurrence inside a text collection of many non-labeled documents, to augment the vocabulary produced by the analysis of the labeled texts. The representation of new documents to classify can then be modified in order to better match the vocabulary used by the classifier. What is expected, of course, is an improvement of its ability to categorize texts.|
|Document Type:||Mémoire de maîtrise|
|Open Access Date:||12 April 2018|
|Collection:||Thèses et mémoires|
All documents in CorpusUL are protected by Copyright Act of Canada.