FATMAS: a methodology to design fault-tolerant multi-agent systems
|Advisor:||Mineau, Guy W.; Moulin, Bernard|
|Abstract:||A multi-agent system (MAS) consists of several agents interacting together. In a MAS, each agent performs several tasks. However, each agent is prone to individual failures so that it can no longer perform its tasks. This can lead the MAS to a failure. Ideally, the MAS should be able to identify the possible sources of failures and try to overcome them in order to continue operating correctly ; we say that it should be fault-tolerant. There are two kinds of sources of failures to an agent : errors originating from the environment with which the agents interacts, and programming exceptions. There are several works on fault-tolerant systems which deals with programming exceptions. However, these techniques does not allow the MAS to identify errors originating from an agent’s environment. In this thesis, we propose a design methodology, called FATMAS, which allows a MAS designer to identify errors originating from agents’ environments. Doing so, the designer can determine the sources of failures it could be able to control and those it could not. Hence, it can determine the errors it can prevent and those it cannot. Consequently, this allows the designer to determine the system’s boundary from its environment. The system boundary is the area within which the decision-taking process of the MAS has power to make things happen, or prevent them from happening.We distinguish between the system’s environment and an agent’s environment. An agent’s environment is characterized by the components (hardware or software) that the agent does not control. However, the system may control some of the agent’s environment components. Consequently, some of the agent’s environment components may not be a part of the system’s environment. The development of a fault-tolerant MAS (FTMAS) requires the use of a methodology to design FTMAS and of a reorganization technique that will allow the MAS designer to identify and control, if possible, different sources of system failure. However, current MAS design methodologies do not integrate such a technique. FATMAS provides four models used to design and implement the target system and a reorganization technique to assist the designer in identifying and controlling different sources of system’s failures. FATMAS also provides a macro process which covers the entire life cycle of the system development as well as several micro processes that guide the designer when developing each model. The macro-process is based on an iterative approach based on a cost/benefit evaluation to help the designer to decide whether to go from one iteration to another. The methodology has three phases : analysis, design, and implementation. The analysis phase develops the task-environment model. This model identifies the different tasks the agents will perform, their resources, and their preconditions. It identifies several possible sources of system failures. The design phase develops the agent model and the agent interaction model. The agent model describes the agents and their resources. Each agent performs several tasks identified in the task-environment model. The agent interaction model describes the messages exchange between agents. The implementation phase develops the implementation model, and allows an automatic code generation of Java agents. The implementation model describes the infrastructure upon which the MAS will operate and the development environment to be used when developing the MAS. The reorganization technique includes three techniques required to design a fault-tolerant system : a fault-prevention technique, a fault-recovery technique, and a fault-tolerance technique. The fault-prevention technique assists the designer in delimiting the system’s boundary. The fault-recovery technique proposes a MAS architecture allowing it to detect failures. The fault-tolerance technique is based on agent and task redundancy. Contrary to existing fault-tolerance techniques, this technique replicates tasks and agents and not only agents. Thus, it minimizes the system complexity by minimizing the number of agents operating in the system. Furthermore, FATMAS helps the designer to deal with possible physical component failures, on which the MAS will operate. It proposes a way to either control these components or to distribute the agents on these components in such a way that if a component is in failure, then the MAS could continue operating properly. The FATMAS methodology presented in this dissertation assists a designer, in its development process, to build fault-tolerant systems. It has the following main contributions : 1. it allows to identify different sources of system failure ; 2. it proposes to introduce new tasks in a MAS to control the identified sources of failures ; 3. it proposes a mechanism which automatically determines which tasks (agents) should be replicated and in which other agents ; 4. it reduces the system complexity by minimizing the replication of agents ; Abstract vii 5. it proposes a MAS reorganization technique which is embedded within the designed MAS and assists the designer to determine the system’s boundary. It proposes a MAS architecture to detect and recover from failures originating from the system boundary. Moreover, it proposes a way to distribute agents on the physical components so that the MAS could continue operating properly in case of a component failure. This could make the MAS more robust to fault prone environments. FATMAS alows to determine different sources of failures of a MAS. The MAS controls the sources of failures situated in its boundary. It does not control the sources of failures situated in its environments. Consequently, the reorganization technique proposed in this dissertation will be proven valid only in the case where the sources of failures are controlled by the MAS. However, it cannot be proven that the future system is fault-tolerant since faults originating from the environment or from coding are not dealt with.|
|Document Type:||Thèse de doctorat|
|Open Access Date:||11 April 2018|
|Collection:||Thèses et mémoires|
All documents in CorpusUL are protected by Copyright Act of Canada.