Static Verification of Linux Kernel Configurations

. The Linux kernel is often used as a real world case study to demonstrate novel software product line engineering research methods. It is one of the most sophisticated programs nowadays. To provide the most safe experience of building of Linux product line variants it is necessary to analyse Kconfig file as well as source code. Ten of thousands of variable statements and options even by the standards of modern software development. Verification researchers offered lots of solutions for this problem. Standard procedures of code verification are not acceptable here due to time of execution and coverage of all configurations. We offer to check the operating system with special wrapper for tools analyzing built code and configuration file connected with coverage metric. Such a bundle is able to provide efficient tool for calculating all valid configurations for predetermined set of code and Kconfig. Metric can be used for improving existing analysis tools as well as decision of choice the right configuration. Our main goal is to contribute to a better understanding of possible defects and offer fast and safe solution to improve the validity of evaluations based on Linux. This solution will be described as a program with instruction for inner architecture implementation.


Introduction
Nowadays, software is used to solve increasingly important and complex tasks, due to this fact the complexity of software architectures is also constantly growing. With the increasing complexity of programs, the complexity of development, analysis and maintenance arises. There are many methods that allow you to reduce the costs of supporting the software life cycle. One such method is the creation of variable systems (or family of systems, software product families, software product lines). The superiority over the usual development method is that systems are manufactured with the condition of multiple elements used for several systems with a similar set of functions, taking into account a specific target audience of users. At the same time, a complex and widely known representation of variable software is the Linux operating system [1][2][3][4][5].
In the development of variable systems, a variability model and a variation mechanism play a fundamental role. The variability model specifies the space of possible variants of this family of systems. Usually it is determined by a set of features or configuration parameters, by the sets of their possible values and constraints on possible combinations of these values, each variant of the system corresponds to a certain set of values of all features. The variation mechanism provides the ability to build all possible system variants from a limited set of created and followed artifacts. In Linux, the variability model and its relationship to the variation mechanism is built on the basis of Kconfig files, Makefile files and additional scripts. Kconfig describes all possible features, as well as their relationship with each other in a special language. Then on the basis of Kconfig, the configuration file .config is defined, which describes the version of the system. It consists of a set of configuration variables described in Kconfig and values that satisfy the constraints of Kconfig. During kernel assembly, the values of variables specified in .config are passed to the code as constants for the preprocessor and to Makefiles, which will be used in the variation mechanism. Makefiles contain information about objects in kernel: what files are included and which mode of compilation will be used [5]. In the field of operating systems (hereinafter the operating system we mean the kernel and the underlying OS libraries, providing the interfaces for work with computing resources and hardware), a mechanism of conditional compilation of C / C ++ languages is widely used as a mechanism for variability of the mixed type (based on macros #ifdef, #if, else #else). Blocks that are surrounded with variability mechanism macros, are called variable blocks. It allows you to compose code at the build stage that combines various variables specified by a set of characteristic values that are conditional compilation parameters in this case (defined by the #define and #undef macros, as well as preprocessor setup parameters). The expression after macros is called block precondition, if configuration turns it into 'true', block gets compiled [5]. The complexity of variability models for modern operating systems is very high, for example, the Linux kernel version 2.6.32 has 6319 characteristics, more than 10,000 constraints that can be used up to 22 individual characteristics, with the majority of characteristics depending on at least 4 others, and the maximum depth of the dependency tree is 8 [6]. This complexity causes a large number of errors, primarily due to the difficulty of taking into account all the factors that a developer of a separate code element should do. To identify and cope with these errors, it is necessary to use specialized techniques of analysis and verification. Complexity of analysis that is typical for systems with such a variability mechanism arise because of the huge size of the possible variants space (which makes it completely unrealistic to check them all). Due to using of conditional compilation, each fragment does not have to be a separate component with a certain behavior that can be analyzed separately from the rest of the code, usually such fragments are just insertions into the common code, and can only be checked in certain combinations with each other. The need to solve these problems imposes special requirements on the tools and analysis methods used for complex variable operating systems and system software in general. These requirements are specific for analysis and verification -the methods used to create such systems, by themselves, do not facilitate their analysis [7,8]. The main goal of this work is to propose a method capable of coping with the verification of the Linux OS taking into account the variability, to give acceptable accuracy of verification and speed of execution comparable to the verification of common programs. An errors analysis of such complex systems as Linux can be done with a lot of different tools. The most convenient of them are static analyzers and static verifiers. Static code analysis is the analysis of software produced (as opposed to dynamic analysis) without real execution of the programs under investigation. Existing static analyzers (such as Coverity [9] or Svace [10]) and static verifiers (such as BLAST, CPAchecker [11]) are only designed to work with the code already compiled. Accurate analysis requires pre-assembling of a specific configuration, and only after that start of the actual analysis. As a result, the total inspection time becomes unacceptably large. There is a class of tools that are focused on analysis of a set of possible configurations, they do not split the phase of building configurations and code analysis. As the example of such tools we can take a look at TypeChef and Undertaker. These tools are designed to solve special problems in the sphere of variability. Undertaker is looking for a "dead code" -such a problems when different configurations give the same product, besides it has a lot of built-in helper modules to provide main task, and one more function -assembling the minimal Linux kernels for individual use cases. TypeChef is looking for linking and compile errors with a variability-aware method. It is important to say, that TypeChef can not find difficult problems in code like complex static analysers. Suggested in this paper tool should analyze code deeply like BLAST or CPAchecker and, on the other hand, should do it in variability-aware way without all-configuration brute forcing. For this task we suggest to use CPAchecker due to its outstanding abilities in the error findings, despite CPAchecker's expenditure of time [11]. The maximum reduction in verification time should be achieved through the optimal selection of configurations that will be directed to the input of the static analyzer.

Configuration Set Selection
The problem of selecting configurations can be considered as a splitting of the configuration space into equivalence classes. Each of the selected configurations will belong to one of the classes. To split the configuration space into classes, it is suggested to use the MC/DC test coverage criterion. The advantage of this choice is that it allows you to significantly reduce the number of classes, even for large software systems. In addition, methods of analysis/construction of configurations for preprocessor directives are similar to methods for test data generation for coverage of programs in programming language, where the MC/DC criterion has proven itself well. If we take a look at standard usage of MC/DC for code coverage, we can find out that conditionals in usual programming language has the same structure as conditional compilation directives, the difference is in compilation (conditional compiling directives will not compile code under the block if condition is not true, which is the same as not giving control to code inside usual conditionals. There are 2 basic notions in MC/DC: decision and condition. Decision is a propositional formula which consists of conditions. If it is true, then control will be given to block with such decision (in the case of preprocessor -code will be compiled). Otherwise, control will not be given to this block (code will not be compiled). Condition is a logical part of decision which connects to other conditions with logical functions. A set is considered to reach 100% coverage by this metric, if: 1. Each decision takes every possible outcome.
2. Each condition in a decision takes every possible outcome.  Now we will find those pairs of conditions values where the change of one of them affects on decision. For each condition we have to get only one pair, and then choose minimal amount of them to cover all conditions. This will satisfy third MC/DC point. Pairs will be ( Table 2)

Each condition in a decision
According to table we will test only pairs that are marked as A1, B1 and C3 or C2; These 2 are minimal sets for MC/DC coverage. We can notice that from 8 possible combinations, we can use only 4 to get full coverage according to MC/DC method (0-1-0, 0-1-1,1-0-0,1-1-0). In general, the MC / DC metric allows 2n different situations to be used instead of n 2 condition combinations. [14]

Kernel Check Stages
For each of the found configurations, you should run a kernel verification. The program will scan the kernel in 6 stages: configuration, search for a "dead code", preprocessing, compiling, linking, searching for run-time errors. It is also worth noting that we will look for not only errors, but also configuration defects. Defectsis a broader concept, and it includes not only system errors, but also possible errors of the kernel without processing the interrupt.
Description of the stages: A. Search for a "dead code". A "dead" code is such a code in which control is not transferred under any circumstances. This code contributes to a common configuration error when two different configurations produce the same product at the output. This problem can be solved using the built-in tools of the Undertaker.
B. Configurations. At the time of build, Linux itself checks the configuration file, but it's worth checking it for recursive dependencies and non-existent variables in the code, but existing in Kconfig (and vice versa).

C. Preprocessing.
Preprocessing is performed just before compilation and at this stage we will get the code that will be compiled into the F. Execution Errors. The most difficult part of the test is using the LDV and CPAchecker tools. LDV assigns labels to the code of the program according to the preset rules, while CPAchecker searches for them and builds accessibility graphs to them so we can see how a label is achieved.

Configuration Set Selection
To solve this task we suppose the workflow for a program called OStap (Fig. 2). This program will find all variable blocks, get all the propositional formulas for entering each of the variable blocks. After that program will extract all conditions from those formulas and will use MC/DC metric to get all necessary configurations that will be checked with static verifier and dead code detector to perform verification stages A-F.

Feature model extraction
First of all, we transform Kconfig files using Kconfigdump module of Undertaker to get feature model of Linux kernel. Feature model is splitted by architectures sets of formulas. All dependencies implemented in Kconfig file are represented in a dump as a set of logical formulas. So it is possible to get full precondition for each feature just looking at line with it's name in model file. For example: if we have to turn on option A to turn on option B there will be …&&A addition to any decision where config B is have to be turned on.

Coarse block precondition extraction
Secondary, we need to get all variable blocks in the given code, which is a module or a set of modules that consists of different .c files. These modules can be compiled according to Makefiles, where modules are marked as compiled, compiled as LKM, non-compiled. This mechanism of variability (Kbuild variability) in Linux is made for modules and helps to organize builds. Due to the fact, that another variability mechanism in Linux uses preprocessors we can find all #ifdef, #if, #else, #ifndef blocks to be sure that we have found all the variable blocks in the given code. This may be done using standard tools of most of popular operating systems or using programming language tools for work with file system. When we have all necessary blocks and their positions in the files we can get propositional formulas using Undertaker [13] tool for them. As a result of the undertaker infrastructure we can easily calculate preconditions for preprocessor blocks in a file, just by specifying the file and line number. If a configuration model is loaded, it will also fetch all interesting items from the model. Say you want to have the block precondition for line 359 and line 370 in init/main.c. Then for each block we use Undertaker's option -blockpc to get blocks precondition based on dumped feature model and code. This precondition is a decision in MC/DC metric theory. It is also necessary to say that later we will use SMT-solver to work with MC/DC metrics, so we patched Undertaker for providing output in a SMT-lib way (Polish notation). C expressions with operands, are not able to be converted due to architecture of undertaker, such code may be marked with special symbols to be changed in the future.

Detailed block precondition extraction
When we have formula by Undertaker, we have to change all the marked code to prefix view, in other words: to represent decision in SMT-lib way. Due to the fact that such code will be in a usual C representation, we can use any C parser that is able to build expression tree to rebuild the string. For example we can use Pycparser [15]. Pycparser can parse C code and represent it as ast-tree. Ast-tree is an expression tree with all operators of C language. Running through the tree, we can rewrite any C expression from infix to prefix view. To do this thing we need to apply this algorithm:

MC/DC driven configuration generation
Now we have ready-to-use prefix formula that is the same as 'decision' term in MC/DC metrics so it is time to start extracting conditions from a decision. To pass this stage we need to get all the single variables inside unary operator or without unary operator. Those variables will be the same as conditions. When we have all conditions and decision for a variable block, we need to build a truth table for conditions. For example, we extracted 3 conditions: a > 0, b == true, a < 0, and out decision is (a > 0) && (b == true) || (a < 0). We look over all of their combinations that we got from truth table like it was described in II chapter (Table  3). To calculate decision column we need to put such equalities and full decision formula into SMT-solver, so it calculates possible values of variables through equalities and then put it into decision formula to find solution. In our example in case of a > 0 = true and a < 0 = true SMT-solver will return error, so these sets of variables will not be used in future (Table 4). In classical MC/DC theory we have to say that this example can not be covered by MC/DC due to the fact that 6 and 8 line gave us impossible conditions to resolve, but in a real case adaptation we can say that we covered all of possible conditions [14]. When the table is ready, we can find influencing variables, like it was described in chapter II (Table 5).

Verification
Finally, we have got necessary configurations to get 100% coverage using MC/DC metrics. Now we will push these configurations to static verifier (for example, CPAchecker). Also we will check code with Undertaker tool to find dead code blocks (Stage A of kernel check), which is also can be declared as configuration error. Next step for pushing configurations is to create launch file for LDV-Klever tool and launch it. Launch file is filled with modules-to-check and rules for finding unsafe modules [16]. Before the launch, code will be compiled and stages B-E will be passed with compiler default methods. After that LDV-Klever main activity will be invoked to check code for a runtime defects (stage F). Output is a module with result: safe, unsafe or unknown. If module is unsafe, verifier shows error trace for this module. During verifier launches we push results into the dataset where the structure unite modules, that we are checking, each module is splitted with configurations and results of LDV-Klever, LDV-Klever results are splitted with verdict and unsafe traces.

Conclusion
This article describes the method, which allows you to check the Linux operating system for errors without being depended on the configuration. This approach provides high code coverage and an improved speed of verification comparing to brute force method.
The prototype of this program is already implemented. It will be finished and fixed in a nearest time.