Recovery the structure of binary data on the program traces.
In this paper we consider the problem of recovery of binary data formats and describe the format recovery system implemented in ISP RAS. First, we enumerate general approaches to this problem, their advantages and constraints: static, dynamic and network trace analysis. Here we also describe the fundamental dynamic analysis constraint (incomplete code coverage) and several possible methods to partly compensate it in this particular problem. Second, we discuss data sources and features of analysis of such objects as files, network packets of different levels and different kinds of protocols (stateful and stateless), incoming and outgoing messages. We also discuss the problem of protocol analysis and specifically the problem of recovering the protocol state machine. Third, we describe our function specification facility that allows us to define models of functions and their parameters and brings additional accuracy to our format recovery approach through taking into consideration user's knowledge about the features of a specific software environment. In this paper we also present the general scheme of our approach and test results of the implemented system. Finally, we discuss future research directions: encrypted traffic analysis and several possible applications for recovery results.
Proceedings of the Institute for System Programming, vol. 22, 2012, pp. 95-118.
ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).
DOI: 10.15514/ISPRAS-2012-22-7Full text of the paper in pdf (in Russian) Back to the contents of the volume