Inference algorithms for the regression approach to sequence prediction

Authors: Rolland, Amélie
Advisor: Marchand, MarioLaviolette, François
Abstract: Sequence prediction algorithms have many applications in natural language processing, bioinformatics, and computer vision. However, the computational complexity required to find the optimal sequence among an exponential number of possibilities limits the use of such algorithms. In this thesis, we propose an approach to solve this search efficiently for two types of sequence prediction problems. More precisely, we address the pre-image problem encountered in structured output prediction, which consists of finding the sequence associated with an arbitrary input, and the problem of finding a sequence maximizing the prediction function of various kernel-based classifiers and regressors. We demonstrate that these problems reduce to a common combinatorial problem valid for many sequence kernels. For this problem, we propose an upper bound on the prediction function which has low computational complexity and which can be used in a branch and bound search algorithm to obtain optimal solutions. On the practical tasks of optical word recognition and grapheme-to-phoneme prediction, the proposed approach is shown to be competitive with state-of-the-art structured prediction algorithms. Moreover, the exact solution of the pre-image problem is shown to significantly improve the prediction accuracy in comparison with an approximation found by the best known heuristic. On the task of finding a sequence maximizing the prediction function of kernelbased classifiers and regressors, we highlight that existing methods can be biased toward long sequences that contain many repeated symbols. We demonstrate that this bias is removed when using normalized kernels. Finally, we present results for the discovery of lead compounds in drug discovery. The source code can be found at
Document Type: Mémoire de maîtrise
Issue Date: 2016
Open Access Date: 24 April 2018
Grantor: Université Laval
Collection:Thèses et mémoires

Files in this item:
32781.pdf1.7 MBAdobe PDFView/Open
All documents in CorpusUL are protected by Copyright Act of Canada.