How LinkedIn Identified a Kernel Lock Contention Issue Causing Recurring System Freezes

Wait 5 sec.

When LinkedIn engineers encountered short-lived, recurring outages where the database powering their user feed became unavailable and then recover without leaving helpful traces, they had to devise a novel approach to uncover the root cause using off-CPU profiling with eBPF. By Sergio De Simone