Proceedings of ISP RAS


Platform-independent and scalable tool for binary code clone detection

H.K. Aslanyan (ISP RAS, Moscow, Russia)
S.F. Kurmangaleev (ISP RAS, Moscow, Russia)
V.G. Vardanyan (ISP RAS, Moscow, Russia)
M.S. Arutunian (ISP RAS, Moscow, Russia)
S.S. Sargsyan (ISP RAS, Moscow, Russia)

Abstract

During the software development developers often copy and paste fragments of code to achieve the desired result. Copying of code can lead to variety of errors, as well as can increase the size of the source and binary code. The problem of finding semantically similar pieces of code (clones) in binary code becomes actual due to the unavailability of source code of many software programs. The first part of the article is dedicated to the analysis of the existing methods for finding code clone in binary code. In the second part we provide a newly developed tool for finding code clones in binary code. The work of the tool is divided into three main stages. The first stage is based on the Binnavi [1] framework, which is responsible for generation of program dependence graphs (PDG). Program dependence graphs are generated using REIL (Reverse Engineering Intermediate Language). The usage of REIL language allows to generate graphs for multiple architectures (x86, x86-64, ARM, MIPS, PPC), thus providing the independence of the tool from the target architecture. In the second step code clones are found based on previously created graphs. Maximum common subgraph is built for each pair of graphs and based on it, code clones are detected. In the third stage, the detected clones are visualized for convenient analysis of the results.

Keywords

code clone, semantic analysis of binary code, REIL, program dependence graph

Edition

Proceedings of the Institute for System Programming, vol. 28, issue 5, 2016, pp. 215-226

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2016-28(5)-13

Full text of the paper in pdf (in Russian) Back to the contents of the volume