Similarité statistique pour le CBR textuel
|Advisor:||Lamontagne, Luc D.|
|Abstract:||E-mails have recently become a popular mean of communication for exchanges between companies and their customers. However the increasing volume of messages makes manual processing difficult to achieve and automatic methods are foreseen as a more efficient solution. Automatic management systems help users in the processing of the messages and in the creation of a response from the messages kept in the company databases. One important question in this type of application is how to select existing e-mails to respond to a new request. The creation of new response messages requires texts pertaining to the new request topics. Finding similarity between documents is also an important task. Our goal for this research effort was to study how to detect similarity between small documents. To accomplish it, we followed a two-pronged approach: - finding similarity between words in order to augment a document’s vocabulary; - estimating similarity between documents, using all the similar words resulting from the previous step. We dedicated our work to determine the most interesting techniques to detect textual similarity between documents, and to improve those techniques using cooccurrences detection and lexical semantic similarity. During our experimentations, we tried different combinations, using cooccurrences detection and lexical similarity. We proposed techniques to augment the vocabulary of each message, based on different kind of reasoning to improve the estimation of similarity between documents. Our results indicate that the proposed augmentation techniques improve significantly the estimation of document similarity. The best results were obtained when using a combination of cooccurrences filter and cosine metric. However our experiments clearly indicate these results do not overcome the performance of similarity techniques based on tf*idf weights.|
|Document Type:||Mémoire de maîtrise|
|Open Access Date:||13 April 2018|
|Collection:||Thèses et mémoires|
All documents in CorpusUL are protected by Copyright Act of Canada.