I recently gave a talk at QCon London entitled “Understanding CPU Microarchitecture for Performance” on the details of CPU internals and how they affect the speed of programs that run on them.
I’ve given this talk twice recently; once at QCon London, and a further virtual event for the London Java Community (LJC). Although both of the presentations are similar content, I marginally updated the slides for the LJC event to include a new release of one version of the software that I recommended, and for a new project that wasn’t open-sourced at the time.
The QCon London presentation has the advantage that there’s a transcript, and synchronised slides, so depending on which form you find more useful it’s up to you. Here are the links:
The abstract for both is the same:
Microprocessors have evolved over decades to eke out performance from existing code. But the microarchitecture of the CPU leaks into the assumptions of a flat memory model, with the result that equivalent code can run significantly faster by working with, rather than fighting against, the microarchitecture of the CPU.
This talk, given for the (QCon London| London Java Community) in 2020, presents the microarchitecture of modern CPUs, showing how misaligned data can cause cache line false sharing, how branch prediction works and when it fails, how to read CPU specific performance monitoring counters and use that in conjunction with tools like perf and toplev to discover where bottlenecks in CPU heavy code live. We’ll use these facts to revisit performance advice on general code patterns and the things to look out for in executing systems. The talk will be language agnostic, although it will be based on the Linux/x86-64 architecture.
If you have any comments or questions, feel free to reach out to me via Twitter, e-mail or any other means you have at your disposal.