Stochastic systems divergence through reinforcement learning
|Advisor:||Desharnais, Josée; Laviolette, François|
|Abstract:||Modelling real-life systems and phenomena using mathematical based formalisms is ubiquitous in science and engineering. The reason is that mathematics offer a suitable framework to carry out formal and rigorous analysis of these systems. For instance, in software engineering, formal methods are among the most efficient tools to identify flaws in software. The behavior of many real-life systems is inherently stochastic which requires stochastic models such as labelled Markov processes (LMPs), Markov decision processes (MDPs), predictive state representations (PSRs), etc. This thesis is about quantifying the difference between stochastic systems. The main contributions are: 1. a new approach to quantify the divergence between pairs of stochastic systems based on reinforcement learning, 2. a new family of equivalence notions which lies between trace equivalence and bisimulation, and 3. a refined testing framework to define equivalence notions. The important point of the thesis is that reinforcement learning (RL), a branch of artificial intelligence particularly efficient in presence of uncertainty, can be used to quantify efficiently the divergence between stochastic systems. The key idea is to define an MDP out of the systems to be compared and then to interpret the optimal value of the MDP as the divergence between them. The most appealing feature of the proposed approach is that it does not rely on the knowledge of the internal structure of the systems. Only a possibility of interacting with them is required. Because of this, the approach can be extended to different types of stochastic systems. The second contribution is a new family of equivalence notions, moment, that constitute a good compromise between trace equivalence (too weak) and bisimulation (too strong). This family has a natural definition using coincidence of moments of random variables but more importantly, it has a simple testing characterization. moment turns out to be part of a bigger framework called test-observation-equivalence (TOE), which we propose as a third contribution of this thesis. It is a refined testing framework to define equivalence notions with more flexibility.|
|Document Type:||Thèse de doctorat|
|Open Access Date:||13 April 2018|
|Collection:||Thèses et mémoires|
All documents in CorpusUL are protected by Copyright Act of Canada.