A Technique for Parameterized Verification of Cache Coherence Protocols

. This paper introduces a technique for scalable functional verification of cache coherence protocols that is based on the verification method, which was previously developed by the author. Scalability means that verification efforts do not depend on the model size (that is, the number of processors in the system under verification). The article presents an approach to the development of formal Promela models of cache coherence protocols and shows examples taken from the Elbrus-4C protocol model. The resulting formal models consist of language constructs that directly reflect the way protocol designers describe their developments. The paper describes the development of the tool, which is written in the C++ language with the Boost.Spirit library as parser generator. The tool automatically performs the syntactical transformations of Promela models. These transformations are part of the verification method. The procedure for refinement of the transformed models is presented. The refinement procedure is supposed to be used to eliminate spurious error messages. Finally, the overall verification technique is described. The technique has been successfully applied to verification of the MOSI protocol implemented in the Elbrus computer systems. Experimental results show that computer memory requirements for parameterized verification are negligible and the amount of manual work needed is acceptable.


Introduction
Shared memory multiprocessors constitute one of the most common classes of highperformance computer systems. In particular, multicore microprocessors, which combine several processors (cores) on a chip, are widely used [1]. The number of cores is constantly increasing. The presence of cache memories that are local to each core determines the need for ensuring coherent memory state. To satisfy the need, Burenkov V.S. A Technique for Parameterized Verification of Cache Coherence Protocols. Trudy ISP RAN/Proc. ISP RAS, vol. 29, issue 4, 2017, pp. 231-246. 232 microprocessor developers design and implement in hardware cache coherence protocols [2]. Cache coherence mechanisms are extremely complex. Therefore, both the design and their implementation are error-prone. Being especially critical, protocol bugs should be revealed before implementing the hardware. The widely recognized method for protocol verification is model checking [3]. It is fully automated, but suffers from a principal drawback -it is not scalable due to the state space explosion problem. Verification of a cache coherence protocol for five or more processors is impossible (at least, highly problematic) with the traditional methods [4]. To overcome the problem and develop scalable verification technologies, researchers focus mostly on verification of parameterized designs [3]. Previous articles of the author [5][6][7][8] presented a method for parameterized verification of cache coherence protocols. The author successfully applied the method to verification of the cache coherence protocol of the Elbrus-4C computing system. This paper presents an approach to the development of formal Promela models that can be analyzed by the verification method, describes the development of the tool that performs transformations of Promela models according to the method and presents the overall verification technique. The paper is structured as follows. Section 2 takes a brief look at related work and provide the necessary links. Section 3 considers the question development of Promela models of cache coherence protocols. In Section 4, we describe how to perform parameterized verification of the Promela models in a semi-automatic way. We examine the development of the tool that automates parts of the verification method used. We present a technique for cache coherence protocols verification. Section 5 provides experimental results on using the technique for verifying the Elbrus-4C protocol. Section 6 summarizes the work and defines further research directions.

Related Work
This work extends the previous works [5][6][7][8] by dealing with the question of practical application of the method for parameterized verification of cache coherence protocols presented in those works. Article [5] presents a review of related work and gives the motivation for development of a new method. The developed method is based upon works [9][10][11][12][13] that present a method of compositional model checking, which is based on syntactical transformations of models written in the Mur language and counterexample-guided abstraction refinement. The method [5][6][7][8] is used in the context of the following verification process: 1) Development of formal models of cache coherence protocols.
2) Parameterized verification by means of the method.

Development of Formal Models
It is highly desirable to have a modeling language that allows us to conveniently describe cache coherence protocols. To choose or develop such a language, we need to define a mathematical model of cache coherence protocols.
In accordance with the microprocessor system model that is used in work [2] for representation and analysis of cache coherence protocols, I chose to model cache coherence protocols as a set of communicating finite-state machines. An element of this set may be either a cache controller or the system commutator. Let us define these notions. Each memory device of the microprocessor is operated by a coherence controller, which is a finite-state machine. Coherence controllers are coordinated by a special device -the system commutator -that is also a finite-state machine. A set of these machines constitutes a distributed system, in which the machines communicate by message passing in order to maintain cache coherence. Each coherence controller connected with cache memory logically implements a set of independent and identical finite-state machines, one for each cache line. These machines are called cache controllers. Due to the independence and identity of cache controllers, it is customary to reflect only one cache line in the models of cache coherence protocols. The states of cache controllers are divided into two classes: Stable states and transient states. Stable states of cache controllers are often the subset of the common set Modified, Owned, Exclusive, Shared, Invalid [2]. Transitions between these states are not atomic and occur through transient states. Transient states are specific to each microprocessor and their presence is one of the factors that determine high verification complexity.
Conditions that define correctness of cache coherence protocols are formulated as statements about stable states, for example: "Cache line can never be in Modified state in two caches simultaneously" [5]. Such statements belong to the class of invariant properties [14].
Usage of a set of communicating finite-state machines as the model of cache coherence protocols and invariant properties for specification defined the choice of the Promela language for modeling cache coherence protocols:  In contrast to other languages (for example, Mur and NuSMV), Promela provides process types and the means of synchronous and asynchronous interprocess communication (channels).  Promela provides convenient specification language, which is Linear Temporal Logic (LTL).  Spin -the system that implements Promela -provides different verification algorithms and optimizations, and is a modern and constantly developing tool. The question of development of formal models of cache coherence protocols is insufficiently covered in the literature. Here, I present an approach to the construction of such models. According to the approach, a formal model of a cache coherence 234 protocol of a system with cores consists of Promela processes for cache controllers and one Promela process for the system commutator. For the considered cache coherence protocols, the following property holds: Only one initial request may be in process at a given point in time. System commutator performs a sequence of steps during the request processing, for example, the reception of the initial request and its analysis, sending of snoop-and other requests according to the results of the analysis, reception of the answers to these requests. Initial requests correspond to the memory access instructions that the processor core is executing. Reception of messages from other devices can only occur at particular steps. Thus, it is convenient to represent the system commutator as a Promela process whose body simply consists of operators that follow each other ( Cache controllers operate differently. On the one hand, we still may identify a number of steps, for example, sending an initial request, changing state from stable to transient, receiving snoop-requests. On the other hand, the relative order of these steps is often unspecified, and the same messages from other devices may be processed in different states of a cache controller. Thus, it is convenient to represent processes of this kind as infinite do-cycles consisting of the guarded commands ( See papers [5,6,8] for more details on how to organize processes and their communication. For example, modeling of a situation in which cache controller sends an initial request and the system commutator receives it, may be performed as follows:

Parameterized Verification of Cache Coherence Protocols
The method for parameterized verification of cache coherence protocols presented in works [5,6,8] consists of two stages: 1. Performing the syntactical transformations of Promela models.
2. Refining the obtained model in accordance with the proposed procedure.
Model transformations have the following effect: 1. Reduction of the number of processes from n+1 (n cache controller processes and one system commutator process) to 4: two fully functioning cache controller processes, one abstract cache controller process that models the environment of the two processes, and the system commutator process. This transformation is possible due to the symmetry inherent in models of cache coherent protocols (all cache controller processes are identical and interchangeable, they do not have behaviors that depend on a particular process index value) and because the specification of cache coherence protocols only contains properties that regard the state of cache line in two caches.
2. Syntactical transformations of Promela operators constituting the model.
These transformations preserve invariant properties. This means that if such a property is true for the reduced model, then it is true for the initial model. A mathematical proof of the corresponding theorem is presented in articles [5,6,8].

Performing the Syntactical Transformations
The syntactical transformations presented in [5,6,8] may be performed manually. However, manual model modification is a very tedious, laborious and error-prone process. Moreover, some of the errors made may go undetected, as they will only lead to incorrect state space reduction and not to counterexamples. Therefore, it is highly desirable to perform the transformations automatically. To achieve that, I have developed a dedicated tool. With this tool, the verification engineer simply provides their Promela model as input to the tool, and the tool generates the transformed Promela model. To automate the syntactical transformations, I have used a widespread approach to this kind of problems, according to which a tool builds the abstract syntax tree that represents the syntactical structure of the source code and then performs the transformations upon the tree traversal (Fig. 3).
Internal representation  Abstract syntax trees are usually constructed by parsers. There are two ways of parser implementation: manual and by means of a parser generator tool (for example, Bison, ANTLR, Boost.Spirit). Due to the unnecessary complexity of the first approach, I have chosen the second one. The Boost.Spirit library was chosen as the parser generator, because:  Boost.Spirit promotes modern usage of the C++ language that allows us to work with abstractions, which are suitable for a given domain, without performance loss.  Boost.Spirit eliminates the need for additional tools like Bison or ANTLR: The only tools needed are a C++ compiler and the Boost library.  The grammars that Boost.Spirit accepts are attributed, which results in a very convenient way of abstract syntax tree generation.
 Boost.Spirit contains a number of built-in parsers.  The generated parsers are very efficient [15]. The mechanism of synthesized and inherited attributes allows us to simplify the task of abstract syntax tree generation by dividing it into two sequentially performed subtasks: 1. Development of the grammar, testing and debugging of the grammar. During this step, we only need to focus on the question of whether the grammar can correctly determine the syntactical correctness of a Promela model.
2. Development of data structures for the nodes of the abstract syntax tree and definition of the types of attributes of the grammar rules. The attribute mechanism allows Boost.Spirit to generate abstract syntax trees automatically, without any need for the addition of node construction operators to the grammar.
Usage of the abstract syntax tree generated by Boost.Spirit as an intermediate representation of Promela models allowed us to divide the task of performing the syntactical transformations automatically into three subtasks: 1. Development of Promela grammar in the C++ language by means of Boost.Spirit.
2. Development of data structures for abstract syntax tree representation.

Development of algorithms for abstract syntax tree traversal and abstract model generation.
Promela grammar is presented in [16]. Its implementation in C++ using Boost.Spirit looks similarly to that description. However, as Boost.Spirit generates recursive descent parsers, I have eliminated left recursion from the grammar. Data structures for the nodes of abstract syntax tree are developed according to the information that we want the nodes to represent and attribute propagation rules defined in Boost.Spirit's documentation. In the developed tool, data structures that correspond to the synthesized attributes of the Promela grammar rules, contain information about nonterminals that are part of the rules. This is a very straightforward and convenient way of implementation of these data structures. For example, the following rule that describes the nonterminal "module" of the Promela grammar

Abstraction Refinement
Execution of each type of initial requests consists of a particular sequence of events presented in the cache coherence protocol documentation. Considerations about the ordering of the events inspired the following refinement procedure: 1. For each type of initial requests define (according to the documentation) a partially ordered set ( , ≺) of events (≺ is a strict partial order): ∀ , ∈ : ≺ , if action occurs earlier than action .
2. While there are false counterexamples: 2.1. Find action that lead to the appearance of the counterexample. Find set that contains action : ∈ . In set find action such that ≺ .

Introduce a logical variable
with the initial value . In the model, replace with the atomic sequence ; ≔ .
3. By means of the logical AND, add to the guard of the command that contains action . Replace with the atomic sequence ; ≔ .
For example, for one type of initial requests defined for the Elbrus-4C microprocessor, the set ( , ≺) is as follows. Here, denotes the th cache controller. { = processing of the previous request from process , 1 ≤ ≤ is finished, = requester sends an initial request, = _ receives the initial request, = _ sends snoop-requests to all , 1 ≤ ≤ , ≠ , = receives a snoop-request, 1 ≤ ≤ , ≠ , = sends an answer to the snoop-request to the requester, = the requester receives the coherent answer from , = the requester sends the operation completion message to _ , = _ receives the operation completion message}. The relation ≺ is defined as follows: ∀ , = 0, … , | | − 1: < ⇒ ≺ . We identify the auxiliary variables with the elements of the set . Refinement of the abstract model of the Elbrus-4C cache coherence protocol required us to introduce two auxiliary variables, because there were two spurious counterexamples. Let us examine the introduction of the first variable. The analysis of the first counterexample showed that the abstract process had sent the operation completion message to _ before _ received a coherent answer. Examination of the set allows us to conclude that action happening at the wrong time led to the counterexample. According to the refinement procedure, in the set we find action and introduce an auxiliary variable ack_received with the initial value . Then we replace the operator that corresponds to with the atomic sequence consisting of this operator and the operator that assigns to ack_received. After this, we add ack_received to the guard of the command of the abstract process that contains and replace the operator that corresponds to with the atomic sequence consisting of this operator and the operator that assigns to ack_received. Thus, we guarantee that the behavior of the abstract process that led the false counterexample will no longer be exhibited.

Verification Technique
According to the results obtained by the author in this and the previous works, the proposed verification technique consists of the following steps (Fig. 4): 1. Development of a concrete Promela model of the cache coherence protocol under verification. Using the proposed approach to model description, verification engineer develops Promela processes that model cache controllers and the system commutator and the necessary infrastructure elements (channel definitions, process creation). Specific actions performed by the processes correspond to the cache coherence protocol documentation.
2. Development of the abstract Promela model of the cache coherence protocol under verification. This step is performed automatically by the developed tool.
3. Verification of the abstract model. This step is the usual verification process of Promela models using the Spin model checker [17].
4. Analysis of the verification report generated by Spin. If there are no errors, then the verification process is finished with the conclusion that the cache coherence protocol is correct. If the report states the presence of an error, then the verification engineer should analyze the corresponding counterexample. If the engineer concludes that the counterexample is spurious because the corresponding sequence of steps is impossible in a real system, then the engineer refines the model in accordance with the proposed procedure and goes to step 3. Otherwise, if the counterexample represents an actual error in the cache coherence protocol, then the error is reported. When the protocol developers fix the error, the verification engineer incorporates the changes into the model and starts the verification process again (goes to step 1).
This sequence of steps is repeated until there are no counterexamples.

Experimental Results
The proposed method was used to verify the MOSI family cache coherence protocol implemented in the Elbrus-4C computer system. The abstraction refinement step was completed after the introduction of two auxiliary variables. Table 1 and Table 2 2 show that even for = 3 there is a gain in state space size and memory consumption. The needed amount of manual work is acceptable. Meanwhile, verification of the constructed abstract model means verification of the protocol for any ≥ 3. The task has been reduced to checking of ~10 states, which consumes ~100 Mb of memory.