Binary Code Analysis Platform Based on QEMU Emulator
QEMU is a full-system multi-target open source emulator. It is widely used for software cross-development. Many large companies (e.g., Google, Samsung, Oracle) prototype and emulate their hardware platforms and peripheral devices on QEMU.
QEMU 2.9 emulates 20 different hardware platform families, including x86, PowerPC, Sparc, MIPS, ARM.
Open-source code allows extending Qemu features to use Qemu for:
- creating new virtual platforms,
- prototyping peripheral device models,
- debugging OS kernel code, firmware code, drivers for emulated devices,
- malware analysis,
- recording virtual machine execution for later replay and analysis.
Remote debugging in the emulator
QEMU supports remote debugging of virtual machine through the GDBcompatible interface. Debugging service works within the emulator and does not affect virtual machine behavior.
GDB (open source debugger) can connect to the emulator via network sockets and inspect processor registers, memory cells, call stack, and so on. One can debug either application or kernel code in the virtual machine. Popular binary analysis tools and IDE such as IDA and Eclipse can also connect to QEMU for debugging and analysis of the virtual machine, because they support GDB-compatible remote debugging interface.
Virtual machine execution recording and replay
Debugging usually needs to trace from the failure to the line of code where an error actually appeared. It implies moving "back in time". To restore past program state one has to re-run it and try to find failure source. This operation is usually performed multiple times, moving backward step-by-step.
Debugging is significantly more difficult if its manifestation is unstable: it is affected by "random" factors, such as multithreaded execution, hardware behavior, user interactions with graphical interface and so on.
Deterministic replay provides stable reproduction of a program (or virtual machine) run, and thus facilitate debugging.
Deterministic replay reconstructs program execution using previously recorded input data. The first program run is used to record these inputs into the log. Then all following runs will reconstruct the same behavior, because the program uses only recorded inputs. Deterministic replay reconstructs the sequence of program (or virtual machine) states including CPU registers, memory cells, peripheral devices' state, and hard disk contents. Replay proceeds between these states executing CPU instructions and passing previously recorded inputs to the program. These inputs include user input, network packets, serial and USB communications.
Full-system replay may be used for analysis of user-level applications, system kernels, firmwares, and multi-threaded programs. Every guest operating system supported by the emulator may be recorded and replayed.
Every replay run produces equivalent executions (the same sequence of the instructions and hardware states) and therefore may be used for convenient debugging of volatile bugs. Debugger and other analysis tools do not alter program execution, because they work outside of the guest system.
Deterministic replay in QEMU is created by ISP RAS. QEMU allows recording and replaying virtual machine executions for x86, ARM, and MIPS platforms.
Reverse debugging may be used to inspect past states of the program. Developer starts debugging from the point where an error manifests itself or exception occurs. Then he tries to determine the reasons of such behavior. Usually the failure is caused by some operations performed in the past.
Reverse debugging does not require restarting of the program, because it assumes assumes faster "rewind" to the past. GDB interface includes "reverse step" and "reverse continue" commands. Implementation of these operations in QEMU uses deterministic replay and virtual machine snapshots for faster recovering of the past states. These patches later will be included into QEMU mainline.
Guest system analysis
Virtual machine debugging requires information about programs and modules location in memory. We have developed introspection mechanism which gets such information from virtual machines with Windows or Linux inside. Introspection can be used for retrieving:
- instruction execution sequence,
- memory access sequence,
- executing system calls,
- created processes,
- loaded modules,
- file accesses.
New platform and peripheral devices emulation
Emulating new devices and platforms in QEMU requires a complete set of documentation describing the instruction set architecture of the processor, memory map and peripherals. Every new peripheral device must be provided with its own documentation.
Development of a new platform in the Qemu emulator from scratch requires implementation of:
- new virtual CPU and translator for its instructions into intermediate,
- virtual memory management unit (MMU),
- virtual peripheral devices,
- new platform which integrates all of the above,
- extension of QEMU interfaces for new devices connected to the real world.
Even when QEMU already includes implementations of virtual CPU, MMU, and peripheral devices, all of these parts need to be interconnected with virtual system buses into one virtual platform.
In case of lack of documentation or its incompleteness, virtual platform debugging becomes very difficult. Information about the platform may be extracted only from available binary code for the existing devices. Then code execution failures provide information about virtual hardware implementation flaws. Emulator development requires more efforts in this case, because binary code analysis is used to recover expected behavior of the virtual device.
We provide semi-automatic scripts on Python to simplify new virtual platform development. It provides declarative API for configuration description and graphical interface for making this configuring simpler.