Predicate Abstractions Memory Modeling Method with Separation into Disjoint Regions

203


Introduction
Software verification is a type of activity focused on software quality control and detection of errors in software [1].Static verification is a verification without the execution of software source code.

204
Special software -tools for static verification -often work with program's source code.Depending on the tools used for static verification it is possible to conduct analysis of the source code to search for errors in program's behavior.One of the tools that can be used for static verification is a tool called CPAchecker.It takes program's source code as an input, creates a CFA (control-flow automaton) and uses it to run the analysis.One of the analyses the instrument is capable of is a reachability analysis.In this paper we consider reachability properties that can be expressed as checking if the call to an error function is reachable.Its strong side is that the CPA (configurable program analysis) [2] concept allows to use a composition of several analyses for program verification.The tandem of Value Analysis and Predicate Analysis produces good results in terms of verification precision / verification time ratio.

Definitions and notations
We will call a model of program's memory or just a memory model a strategy of organization and representation of program's memory.By region we will refer to the set of lvalues with the following restriction: if two lvalues are taken from two different regions they necessarily reference disjoint memory locations [3].For example, different regions may be safely assigned to the lvalues referring distinct structure fields under the following conditions:  the fields do not occur as an argument to the address taking operator (&);  the fields do not become targets of some pointers by the usage of pointer type conversion or address arithmetic.The situation when a program's error state is reachable due to the imprecisions of abstraction employed in the analysis is called a false alarm.

CPAchecker's memory model
Existing memory model employed by Predicate Analysis of the CPAchecker tool uses uninterpreted functions.Each of those functions has only a name and a number of arguments.If f (x) is an uninterpreted function, a and b are any of its arguments for which a = b is true then f (a) = f (b) [4].Uninterpreted functions in the CPAchecker tool are used to establish a correspondence between a memory location and the value stored at this memory location.Depending on the type of the expression different uninterpreted functions should be used.Existing memory model of the CPAchecker tool uses typed regions.This means that all lvalues of the same type exist in the same region.However, a large number of lvalues of the same type is present in any big enough program written in the C programming language.This leads to the addition of a big number of logical constraints for each event of a pointer's memory update.The constraints express checks for potential equality of the updated lvalue to each memory location in the region.Those checks allow to determine precisely what memory should also be updated but noticeably increase the length of path formulas.
The problem of the current memory model used by the tool is that if a function returning a pointer to program's memory lacks a body, arbitrary assumptions can be made about its return value in the process of verification.In other words, it is considered possible for this pointer to point at any lvalue in the region.Although possible, this situation is also practically very improbable.In those cases it is hard to determine if a path leading to an error label really does or doesn't exist.One of the approaches capable of resolving this issue suggests the introduction of smaller regions that divide a bigger typed region.

Memory model overview
B&B memory model was proposed by Richard Bornat and had been based on the work of Rod Burstall [5], [6].It is used in Frama-C verification tool in Jessie plugin which is capable of performing verification of the C programs.In its foundation are assumptions that can introduce regions of smaller sizes instead of having very big one for a type.These assumptions state that if struct data type fields never occur as arguments to the address taking operator (&) in program's source code then those fields can be placed to separate regions.Otherwise they must belong to the same region as the normal pointers of the same type.This memory model has some flaws.It does not take into account that the struct fields can be accessed through address arithmetic and pointer conversions.It also needs mentioning that some overhead costs are required for region support.Taking into account the pros and cons of the model it is possible to say that the B&B memory model looks promising.

Formal specification
For ease of specification we will assume the following:  variables can only be of struct s * types;  struct s fields can only be of int type;  struct s has n fields: struct s { int f1, f2, …, fn; }; Program's memory location can be represented by an lvalue expression like pointer dereference.To model changes to the program's state when assignments to lvalues arise the CPAchecker tool uses uninterpreted functions [4].We assume absence of pointer arithmetic and restrict pointer dereferences to the applications of the arrow operator ( →  ), where p is a pointer to the struct type and  is one of the struct fields).Let Υ be a set of uninterpreted functions.It consists of the uninterpreted function G that is used for accessing a memory location in global region, a finite number of uninterpreted functions  , where each function  represents the state of the memory region corresponding to lvalues of the form  →  ,  = 1,  and the uninterpreted

206
function undef_ptr with zero arity that models the usage of the program's functions returning an unknown pointer.Let () be an uninterpreted function used for global memory location modeling and  (),  = 1,  -a finite set of uninterpreted functions used for memory location modeling in regions corresponding to  uninterpreted functions.For address representation it is suggested to use expressions like a, where a is a variable.The axioms of the memory model (positivity of addresses and their non-intersection within one region) can be represented as follows:  a > 0 ;  () =  , where k is a unique number for each such variable.The tool uses SSA representation to model the varying state of program variables and memory regions.In this representation usage of a name splits into usages of its versions.Each time an assignment happens to a program variable or a memory region represented by the corresponding variable or uninterpreted function in the path formula, the version number (index) of that variable or an uninterpreted function increases.Let Index : Υ → ℕ be a mapping of a set of uninterpreted functions Υ to a numerical set of their indices.
Let  : Υ →  be a mapping of a set of uninterpreted functions Υ to the set of subsets of memory locations :  = 2 .We will use a supplementary function mem_upd: that defines a check for address equality for all of the lvalues in the same region as pointer p (locations in the Alloc(f) region are modeled by the uninterpreted function We define (,  ) as a constant offset of a field  from the base address of struct type variable .Because we assume that there is only one structure type struct  in our programs, (,  ) can be made just ( ).
In B&B memory model implemented on top of CPAchecker's existing memory model the operator of a strongest post-condition is defined as (()) =  ∧ Γ(), where  is a symbolic abstract state and constraints Γ() are defined by table 1.

Example
The Heap variable allocation p = alloc() Why the conjunction is unsat? 1) In the existing memory model memory allocated for pointers p1 and p2 cannot intersect because it was allocated using the known () function (the corresponding path formula is not given).2) In the given Γ constraints for this path (using the B&B model) the following contradicting elements are present:   ( (1) + (1)) =  ( (2) + (2));   ( (1) + (1)) = 5;   ( (2) + (2)) = 6.Let's take a look at the example program below.In the program's source code there are calls to the function undef_ptr() that returns an unknown pointer.The pointer p2 is initialized using this function.Γ constraints in terms of B&B memory model for the program are shown in table 3. Path formula can be made as conjunction of all formulas in Γ column of the table 3.
void * undef_ptr(); struct s { int f1, f2; }; struct s * p1; struct s * p2; p1 = alloc(); p2 = undef_ptr(); p1 -> f1 = 6; p2 -> f2 = 5; assume(p1 -> f1 == p2 -> f2); In B&B memory model 1 → 1 and 2 → 2 exist in the separate memory regions.In Γ constraints for this path the same contradicting elements as for the previous example are present.Thus, the update of one of them wouldn't affect the other one.Because of that the result of verification would be that the error state is unreachable (path formula is still unsat).However, in the existing memory model fields 1 and 2 of struct s exist in the same memory region and it uses only one uninterpreted function for them (see table 2 in [4]).Memory for their base pointers p1 and p2 was allocated using known alloc() function and function undef_ptr() returning unknown pointer respectively.It cannot be confirmed that an update to a field 2 of the p2 wouldn't affect the access to the 1 struct field of p1.In the formula the location for field 2 of the p2 is ( (2) + (2)) which is _ + (2).Locations ( (1) + (1)) and ( (2) + (2)) exist in the same region and may be equal.Thus the formula is satisfiable.It means that the result of verification with existing memory model will be a reachable path to the program's error state.Usually such situations in practice are false alarms because different fields of different structures do not normally intersect.Thus, the assumptions related to this behavior in the existing memory model aren't really incorrect but they are quite improbable in practice.Usage of the B&B memory model will be able to reduce the number of false alarms caused by these assumptions (continued in section 6).

Implementation notes
The creation of memory regions is an automated process.In CPAchecker verification tool CFA (control-flow automaton) is used as an inner representation of the program.It is sufficient to go through it and find in it all of the struct field accesses.This allows to distinguish those fields that don't have their address taken somewhere in the program.
In the implementation we do not take into consideration the possibility of field accesses through pointer arithmetic and through the usage of pointer conversions because of the high improbability of such field accesses in program's source code.

Experiments
To determine the efficiency of B&B memory model implementation in comparison to existing memory model of the CPAchecker tool a number of launches were performed on the predefined sets of Linux kernel modules.To use the implemented memory model one must have:  CPAchecker verification tool with revision number 23271 or higher from the branch trunk;  option cpa.predicate.useMemoryRegionsshould be set to 'true'. The following experiments were made using the revision trunk:23271 of the tool.

False alarm set
The review of error traces obtained during the verification of Linux kernel 3.14 allowed to determine situations when reachability of error state was present due to updates to same-typed pointers' memory.This set consists of those 26 kernel modules that caused false alarms due to the updates to pointer's memory.The goal of this experiment was to find out what effect the usage of B&B memory model will have on the tools precision.Tables 4 and 5 hold information about changes of the tool's verdicts.The launch was performed for rule that checks correctness of functions working with usb_get_* and usb_put_* functions of usb-system.Launch results can be found in tables 6, 7. Launch configuration:  time limit -15 minutes;  memory limit -15 Gb;  number of CPU cores -4; The differences in the regions the models have led to the difference in program's paths that are covered by the tool.This explains Unsafe → Unknown, Unknown → Safe and Unknown → Unsafe transitions, where Safe means that program's error state is unreachable, Unsafe -error state is reachable, Unknown -timeout or runtime error.This experiment's results show that the improvement to the tool's precision is present while the verification speed remains competitive.

table 2
following program will be considered correct if we use either of the memory models.Γ constraints in terms of B&B memory model for the program are shown in table2.Path formula can be made as a conjunction of all formulas in Γ column of the . It is unsat in terms of either of the memory models.This means that the tool cannot go by this path (i.e.won't consider it as a potential error trace candidate).

Table 1 .
constraints creation rules

Table 2 .
Example build of path formula for the correct program

Table 3 .
Example build of path formula for the program with unknown memory function Mandrykin M. Predicate Abstractions Memory Modeling Method with Separation into Disjoint Regions.Trudy ISP RAN/Proc.ISP RAS, vol.29, issue 4, 2017, pp.203-216.

Table 5 .
Verdict changes set of Linux kernel drives (version 4.2-rc1) was selected to study the efficiency of B&B memory model implementation in comparison to the existing memory model of the CPAchecker tool. A