Uber focused on reducing the cost of compute capacity by improving efficiency. They saved 70K Cores Across 30 Mission-Critical Services by doing Go GC Tuning. Here, we are going to summarize how they implemented it, what they have achieved, and why GC tuning is important. If you want to read more about their highly effective, low-risk, large-scale, semi-automated Go GC tuning mechanism, you can refer to the article written by Cristian Velazquez, a software engineer for Uber, titled ‘How We Saved 70K Cores Across 30 Mission-Critical Services (Large-Scale, Semi-Automated Go GC Tuning @Uber).

How are they implemented?

Initially, they had a ticker to run every second to monitor the heap metrics, and then adjusted GOGC value accordingly. But, they started to have a considerable overheard. Later, they found an alternative. Go had finalizers, which runs when the object is garbage collected. So, they employ a self-referencing finalizer that resets itself on every GC invocation. This allows them to reduce any CPU overhead.

What have they achieved?

Once they deployed GoGC Tuner across a few of their services, they got double-digit improvement in their CPU utilization. Accumulated cost savings from these services alone are around 70K cores. Following are 2 such examples:

  1. Observability service that operates on thousands of compute cores with high standard deviation for live_dataset (max value was 10X of the lowest value), showed ~65% reduction in p99 CPU utilization.
  2. Mission critical Uber eats service that operates on thousands of compute cores, showed ~30% reduction in p99 CPU utilization.

Why is GC tuning important?

From their story, we can see that Garbage collection is one of the most elusive and underestimated performance influencers of an application. Also, Garbage Collection (GC) tuning is crucial for efficient memory management, minimizing performance-hindering pauses, and ensuring predictable application behavior. It optimizes resource usage, scalability, and cost-effectiveness, enhancing application stability and user experience. You can also try to optimize your application and save your expense by GC tuning, similar to how Uber has done the same.