Using Interface Patterns for Compositional Discovery of Distributed System Models1

Process mining offers various tools for studying process-aware information systems. They mainly involve several participants (or agents) managing and executing operations on the basis of process models. To reveal the actual behavior of agents, we can use process discovery. However, for large-scale processes, it does not yield models, which help understand how agents interact since they are independent and their concurrent implementation can lead to a very sophisticated behavior. To overcome this problem, we propose interface patterns, which allow getting models of multi-agent processes with a clearly identified agent behavior and interaction scheme as well. The correctness of patterns is provided via morphisms. We also conduct a preliminary experiment, results of which are highly competitive compared to the process discovery without interface patterns.


Introduction
Process mining is the relatively new direction in studying process-aware information systems. They include information systems managing and executing operational processes, which involve people, applications and information resources through process models [1]. Examples of these systems include workflow management systems, business process management systems, and enterprise information systems. 1 22 The underlying interactions among participants (also called agents) of process-aware information systems are intrinsically distributed multiagent systems. An agent acts autonomously, but it can interact with the others via shared resources, restrictions, and other means. Process mining helps to extract a model of this system for further study from a record of its implementation called an event log. However, extracted models are hard for analysis since there might be complex interactions among process participants the number of those can be significant. In this paper, we propose a compositional approach to address this problem. Given an event log of a distributed system, we can filter it by agents and mine a model of each agent. Then, agent models can be composed to get a complete model of a multiagent distributed system, which might be simulated. Composing agent models allows us to obtain more structured models compared to models extracted from complete logs since the behavior of an agent can be clearly identified. We compose agent models via interface patterns, which describe how they intercommunicate. This approach was presented at TMPA-2017 [2], the conference proceedings will be available later. The formal proof of the composition correctness is based on using net morphisms [3]. Moreover, interface patterns allow us to inherit deadlock-freeness and proper termination from agents by construction. We conduct a preliminary experiment on using one interface pattern for mining multiagent models. The outcomes are evaluated with the help of conformance checking quality dimensions [1,4] and complexity metrics proposed in [5]. This paper is structured as follows. The next section provides an overview of process discovery and compositional approaches. In Section 3 we introduce basic terms which are used in the paper. Section 4 shows a general description of the compositional approach to process discovery. Section 5 briefly introduces how we compose agent models using interface patterns and net morphisms. In Section 6 we describe the preliminary experiment and analyze results.

Related Work
There exist three types of process mining, namely discovery, conformance, and enhancement. Process discovery produces a process model out of an event log -a record of implemented activities. Existing discovery approaches can yield a model in a variety of notations including Petri nets, heuristic nets, process trees, BPMN, and EPC. Petri nets are the most widespread process model representations discovered from event logs. Conformance checking is used to check whether a discovered model corresponds to an input event log and to identify probable deviations. The main idea of enhancement is to improve existing processes using knowledge of actual processes (usually denoted AS-IS) obtained from event logs. Process discovery offers several methods to be used for constructing models from event logs. One of the first and the most straightforward discovery approach is αalgorithm, which identifies ordering relations among activities in logs, but it has severe usage limitations connected with cycles and the overall quality of obtained models [1]. It has several refined versions and improvements, for example [6], but there are other more sophisticated and efficient discovery algorithms. S. Leemans et al. [7] has proposed inductive miner allowing to extract process models from logs containing infrequent or incomplete behavior as well as dealing with activity lifecycle when there are separate actions of start and finish for each activity. Apart from that, inductive miner always produces well-structured models in the form of Petri nets. HeuristicsMiner is another process discovery algorithm proposed by A. Weijters et al. [8]. It can process event logs with a lot of noise (excessive activities) and also deals with infrequent process behavior. HeuristicsMiner uses intermediate casual matrices and produces heuristics net, which can easily be converted into Petri nets and applied for other notations including EPC, BPMN, and UML. S. van Zelst et al. [9] proposed the approach to process discovery based on integer linear programming and theory of regions. Their algorithm can produce Petri nets with complex control flow patterns, and its recent improvements guarantee the structural correctness of discovered models. C. Gunther and W. van der Aalst have proposed adaptive fuzzy mining approach [10] to deal with unstructured processes extracted from event logs since they can produce different abstractions of processes distinguishing "important" behavior. Since state-of-the-art process discovery algorithms can deal with complex process behavior, the other problem is to obtain models that are appropriate concerning their structure. A good process model is readable and well-structured, i.e. there is no redundant elements or unnecessary structural complications. There is a so-called continuum of processes ranging from highly structured processes (Lasagna models) to unstructured processes (Spaghetti models) [1]. The problem of obtaining wellstructured models is extensively studied in the literature. Researchers offer different techniques to improve model structure [11], and to produce already well-structured process models [12,13,14]. In the case of multi-agent and distributed systems using well-structured models should also allow us to identify agent behavior clearly for the model understandability improvement. We suggest discovering models of agents independently and then composing them together to produce a structured multiagent system model with the clearly visible behavior of each agent. Several compositional approaches for process discovery have been proposed. In [15] A. Kalenkova et al. have shown how to obtain a more readable model from an event log by decomposing extracted transition systems. A special technique to deal with cancellations in process implementation and to produce clear and structured process models which can contain cancellations have been studied in [16]. Also, in [17] authors have proposed a technique for compositional process discovery based on localizing events using region theory to improve overall quality of discovered models. Correct coordination of system components is an error-prone task. Their interaction can generate complex behavior. The majority of process discovery tools produce Petri nets, and a large amount of literature has investigated the problem of composing Petri nets. They can be composed via straightforward merging of places and transitions [18], but the composition result will not preserve component properties. One of the 24 possible ways to achieve inheritance of component behavioral properties is to use morphisms [19]. Special constructs for composing Petri net based on morphisms were studied in [3,20,21]. The key idea of this approach is that distributed system components refine an abstract interface describing the interactions between them. In [22] I. Lomazova has proposed a compositional approach for a flexible re-engineering of business process by using a system of interacting workflow nets. There also exists a several techniques for compositional synthesis of web services [23]. However, in [24] R. Hamadi and B. Benatallah have proposed an algebraic approach to the regular composition of services. These compositional approaches do not let specify the explicit order of inner behavior of two interacting components. This situation is schematically represented in Fig. 1. Having two discovered component models with always executable actions A and B, we want to require that they interact in a way that A is implemented before B. This way of intercommunication is also shown in the form of Petri net.

Fig.1. Defining relations on inner actions of components
In [2] we have proposed a solution to this problem and two other patterns for composing two interacting components. The obtained composition inherits properties, such as deadlock-freeness and proper termination, from components. In this paper, we show how these patterns can be used for discovering a multi-agent system model from an event log in a compositional way. Applying compositional patterns allows us to obtain a more readable model improving time complexity due to the parallelization of process discovery. We can assess process models obtained from event logs against four standard quality dimensions -fitness, precision, generalization, and simplicity [4]. Fitness identifies how accurately an extracted model can replay a source event log. Precision indicates a fraction of a behavior allowed by the model but not seen in the event log. Generalization tries to measure the extent to which the model will be able to implement the behavior of the process unseen so far in the log. Simplicity focuses on assessing structural complexity alongside with other graph characteristics -a number of elements and a structuredness measure [5].
2. T={t1, t2, …, tm} -a finite non-empty set of transitions, P∩T=∅. Pictorially, places are shown as circles, and transitions are shown as boxes (silent transitions are depicted by black boxes). A flow relation is depicted by directed arcs (see Fig. 2). Let X=P∪T. We call a set • x={y∈X | (y,x)∈F} a preset of x and a set The behavior of Petri nets is defined by the firing rule, which specifies when an action can occur, and how it modifies the overall state of the system.

Event Logs
Process discovery techniques allow generating process models from event logs containing information on executed actions. In a simple case, event logs may contain actions names and a corresponding implementation order. We can augment this record with a timestamp (when an action occurs) and executor (what agent implements it). Definition 3: Let be a set of action names and be a set of agent names. An activity is a triple (n, e, t), where n∈ , e∈ , and t corresponds to a timestamp. The set of all activities is denoted by Act. A trace σ∈Act + is a sequence of activities. An event log L is a multiset over Act + , L∈m(Act + ). Different traces can be combined to form a case corresponding to a process implementation scenario. XES is a standard representation format adopted by IEEE [25] for logging events and processing them via process mining tools.

General Outline
To support the compositional discovery of models from event logs generated by multi-agent systems, we assume a record of each action has a corresponding label of an agent implementing it. The procedure of the compositional synthesis includes several steps to be implemented: 1. Capturing a complete event log L from multi-agent system operation.
2. Filtering the event log L by agent labels and producing a set of event logs Le (|Le|=| |), each trace consists of actions implemented by e only.
3. Discovering a model for each agent separately from the set of event logs Le;

Defining interface pattern which describes how agents intercommunicate;
5. Composing agent models and producing a multi-agent system model.
The step of defining interface pattern for agent interaction is implemented manually so far. We rely on an expert view on how agents should intercommunicate.

Software Overview
A wide range of process discovery tools is implemented within the context of the open-source project ProM [26] continuously improving nowadays. However, there also exist many commercial tools using process mining approach to analyze and improve business process. They include Disco [27], QPR ProcessAnalyzer [28], myInvenio [29] to name but a few. Contrary to ProM, they provide more businessrelated solutions for process performance analysis and further improvement.
To process event logs we use the advanced ProM plugin GENA [30] which allows to generate event logs with timestamps and originator labels as well as to augment logs with artificial events representing noise.

Composing Petri Nets via Interface Patterns
This section provides a brief introduction to our approach to Petri net composition using interfaces and net morphisms.

Composing Petri Nets via morphisms
The notion of ω-morphism on Petri nets was first introduced in [3] for elementary net systems and can be applied for safe nets. Definition 4: Let Ni = (Pi, Ti, Fi, m0 i , Li) be two safe Petri nets for i=1,2. The ωmorphism is a total surjective map φ: N1 → N2 such that: To use morphisms for Petri net composition, we need to define morphisms from agent nets towards an interface net, which describes how they intercommunicate. Then we merge transitions having common labels and images. Figure 4 shows how two Petri nets are composed via ω-morphisms represented as dotted arrows.

. Composing two Petri nets via ω-morphism
As it was proved in [19], the use of morphisms allows us to preserve properties of interacting components in a composed process net. A composition obtained via ω-morphisms is deadlock-free and properly terminates iff source component nets and interface net are deadlock-free and terminate properly as well.

Compositional Interface Patterns
To facilitate Petri net composition, we use compositional patterns for typical interface we have proposed in [2]. One of such patterns called the simple causality is schematically shown in Fig. 1, and Fig. 5 provides its instantiation. A pattern includes component and interface net which might be merged according to the morphism composition rules if there is a need to produce a model for comprehensive simulation.

30
It also has to be mentioned that to preserve concurrency in the implementation of interacting agents we expand interface nets with additional places and transitions keeping them weakly bisimilar with original interfaces. Consequently, extended interfaces allow us to obtain composition results with the clearly identified behavior of each component. Figure 5(b) shows how we have expanded interface net for this pattern. We use expanded interfaces only for our inner purposes. The end user does not need to know the underlying theoretical aspects of our approach.

Some Experimental Evaluation
In this section, we describe a preliminary experiment on using the simple causality pattern for compositional process discovery. To test our approach we use artificial event logs obtained from the instantiated simple causality pattern. Then we also assess quality metrics of discovered models and provide a balanced consideration.

Processing Event Logs
Using GENA and the composition result obtained from the instantiated simple causality pattern (see Fig. 5) we have generated the event log with 3000 traces. Then we have filtered the initial log by executors using ProM. The obtained event logs have the characteristics presented in Table 2. Generation results for Agent A show bigger values due to cycles.  Figure 6 shows the fragment of the Petri net discovered from the event log L using Inductive Miner and ProM. The behavior of agents is distinguished by colors. Fig.6. The fragment of the system model discovered from L This discovered model is quite well-structured (constructed out of clear blocks) but it does not allow to identify the behavior of different agents. That is why, it is hard to yield the complete picture of agent intercommunication scheme. Figure 7 shows the fragment of the composed Petri nets we have discovered from the agent event logs LA and LB also using Inductive Miner and ProM. It has to be mentioned that Petri nets discovered by Inductive Miner are always safe. Hence we can apply the approach based on morphisms to compose separately discovered models of agent behavior. The merged model allows us to identify the behavior of agents clearly and how they intercommunicate. Using morphisms guarantees inheritance of properties such as deadlock-freeness and proper termination of agents by the entire net. 3. event logs with lifecycle events (start/finish of events);

Discovering and Composing Models from Logs LA and LB
4. exhaustive k-successor algorithm.
We do not work with incomplete logs or with lifecycle logs for now. So, in our experiment we have discovered models of system and agents shown in previous subsections in accordance with options 1 and 4 and compared them using structural process discovery metrics. Table 3 provides the comparison of structural characteristics for the directly discovered and composed system models. We have compared obtained models with respect to the number of Petri net elements and structure metric which assess the overall complexity of a model by breaking it into trivial constructs and assigning weights to each reducing step. Models discovered with infrequent configuration are denoted as INFR, models discovered with exhaustive configuration are denoted as EXHS.
The experiment results show the increase in transition numbers because of adding silent transitions. Compositional patterns obviously decrease a number of arcs, compared to direct discovery, as long as we simplify agent intercommunication. Composed models also preserve complex control flows as shown by structuredness measure. Separately discovered agent models and their composition exhibit more precise cycle discovery. We have also conducted conformance checking for directly discovered and composed models. As it was mentioned above, there are four standard quality dimensions, namely fitness, precision, simplicity, and generalization. Simplicity is analyzed above via structural analysis. We do not estimate generalization since there are no complex cyclic or concurrent constructs to instantiate the simple causality pattern. Table 4 shows values obtained for fitness and precision of discovered and composed system models. Both discovered and composed system models preserve the appropriate level of fitness, the composition does not block its preservation. What is more important, using compositional patterns produces models with precision nearer to that of the source model compared to direct discovery results. Composed models approximately 30% more precise than discovered ones.
To sum up, we used the simple causality pattern to produce the model of the multiagent system. Assessment results showed that the composed models are highly competitive with the models directly discovered from complete event logs in the context of their relative structural complexity evaluations and conformance checking results.

Conclusion and Future Work
In this paper, we have proposed the solution to the problem of discovering structured models for the processes with several participants (agents). The key idea is to automatically obtain the correct and complete process models from the separate source models of its components. The interaction between agents is defined by experts.
To prove the correctness of the composition we adopt the approach based on Petri net morphisms. We refer to the compositional patterns proposed for the correct synthesis of models for multi-agent processes. In the context of this work, we conducted the preliminary experiment on using the simple causality pattern for constructing the complete model from discovered agent models. The analysis of experimental results (conformance and complexity) showed that composed models are highly competitive compared to the models obtained directly. Moreover, our compositional approach to process discovery allows producing models with the clearly identified behavior of interacting agents. We aim to continue developing of compositional patterns for typical interfaces and providing experimental process discovery implementations for them using also reallive event logs. Also, we will proceed with complex synchronization patterns with relations on action sets and their correct combinations.