Resolving Lock Contention in Apache Solr: A Performance-Analysis Detective Story
Problem Statement and Initial Approach
Performance Analysis in Complex Systems
Beginning the Exploration
If that were true, the stack of these memory-allocation events would be inside one of the lock critical sections. To verify, we needed to collect the stack of each suspicious memory allocation event, aggregate the stacks, and compare the source code to see if it matched any lock critical section. The answer was yes; the profiler revealed that heavy lock contention in Solr NRT indexing caused by excessive Block Cache-related memory allocation in a critical section (see profiler screenshot below)!
Solution & Results
Furthermore, at Cloudera, the efforts to improve Solr performance never end. Although this particular issue was resolved, it has revealed more opportunities for optimization that have a place on the Cloudera roadmap. We’ll describe some of those in future posts as the work is completed.
Michael Sun is a Software Engineer at Cloudera, working on the Cloudera Search team.