VENCE : un modèle performant d'extraction de résumés basé sur une approche d'apprentissage automatique renforcée par de la connaissance ontologique

Authors: Motta, Jesus Antonio
Advisor: Capus, LaurenceTourigny, Nicole
Abstract: Several methods and techniques of artificial intelligence for information extraction, pattern recognition and data mining are used for extraction of summaries. More particularly, new machine learning models with the introduction of ontological knowledge allow the extraction of the sentences containing the greatest amount of information from a corpus. This corpus is considered as a set of sentences on which different optimization methods are applied to identify the most important attributes. They will provide a training set from which a machine learning algorithm will can abduce a classification function able to discriminate the sentences of new corpus according their information content. Currently, even though the results are interesting, the effectiveness of models based on this approach is still low, especially in the discriminating power of classification functions. In this thesis, a new model based on this approach is proposed and its effectiveness is improved by inserting ontological knowledge to the training set. The originality of this model is described through three papers. The first paper aims to show how linear techniques could be applied in an original way to optimize workspace in the context of extractive summary. The second article explains how to insert ontological knowledge to significantly improve the performance of classification functions. This introduction is performed by inserting lexical chains of ontological knowledge based in the training set. The third article describes VENCE , the new machine learning model to extract sentences with the most information content in order to produce summaries. An assessment of the VENCE performance is achieved comparing the results with those produced by current commercial and public software as well as those published in very recent scientific articles. The use of usual metrics recall, precision and F_measure and the ROUGE toolkit showed the superiority of VENCE. This model could benefit other contexts of information extraction as for instance to define models for sentiment analysis.
Document Type: Thèse de doctorat
Issue Date: 2014
Open Access Date: 23 April 2018
Permalink: http://hdl.handle.net/20.500.11794/26076
Grantor: Université Laval
Collection:Thèses et mémoires

Files in this item:
SizeFormat 
30781.pdf3.89 MBAdobe PDFView/Open
All documents in CorpusUL are protected by Copyright Act of Canada.