Evaluating a number of cache coherency misses based on a statistical model
False cache sharing happens when different threads executing in parallel on distinct processor cores simultaneously update the variables that reside in the same cache line. This results in the invalidation of the current memory state data that is saved in the cache utilized by the first thread’s core, and also in the necessity to stall for the second thread on the memory state data update.
We suggest in this paper to evaluate the number of cache misses using code instrumentation and post-mortem trace analysis: the probability of the false sharing cache miss (defined as a memory write issued by one thread between two consecutive memory accesses issued by another thread) is calculated based on the gathered event trace, where each observed event is a memory access with a timestamp. The tracer tool is implemented as a GCC compiler pass inserting necessary tracing instructions before each memory access. The pass is scheduled after all other optimization passes that allow to use the tracer for optimized code. The post-mortem analyzer is a separate application that gets the trace collection gathered on a sample application input data as its own input. Program slowdown in our approach is ~10 times, and it is dependent on a sampling probability but it does not depend on a cache line size.
Proceedings of the Institute for System Programming, vol. 27, issue 4, 2015, pp. 39-48.
ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).
DOI: 10.15514/ISPRAS-2015-27(4)-3Full text of the paper in pdf (in Russian) Back to the contents of the volume