Few of my developer friends say: “Garbage Collection is automatic. So, I do not have to worry about it”. The first part is true, i.e., “Garbage Collection is automatic” on all modern programming languages – Java, .NET, Golang, Python… But I tend to respectfully disagree with the second part of the statement i.e., “I don’t have to worry about it”. – may not be true. Ignoring GC analysis entirely can be a costly mistake. Let me present my case here.
What are the benefits of GC tuning?
Tuning garbage collection performance provides the following key benefits to your application:
1. Boost Application Performance
a. Response Time
b. Throughput
c. Avoid Long pauses
2. Slash Computing Costs
a. Reduce CPU compute
b. Identify optimal Memory Size
c. Reduce Software Licensing Cost
3. Quickly Troubleshoot Production Problems
a. Unearth OutOfMemoryError easily
b. Forecast Memory Problems Several Minutes Ahead
Let’s review these benefits with real case studies in this post.
1. Boost Application Performance
By studying & tuning GC behavior, your entire application’s performance characteristics (such as response time and throughput), can be improved dramatically.
a. Response Time
One of the primary focuses of garbage collection tuning is to reduce the garbage collection pause time. When you reduce GC pause time, the overall application’s response time will also improve. Here is a real case study of one of the world’s largest automobile manufacturers, who improved their application’s response time by 49.46%—from 1.88 seconds to 0.95 seconds—just by tuning GC settings. The percentage of transactions taking more than 25 seconds also dropped from 0.7% to 0.31%, a 55% improvement.
All other forms of application-wide response time improvement often require infrastructure changes, architectural changes, or code-level changes. These are expensive changes and even if you embark on these high-risk, expensive changes, there is no guarantee of response time improvement. In contrast, GC tuning is a low risk change because you are only changing the GC arguments that you are passing to your application.
b. Throughput
An insurance company was suffering from back-to-back long GC pauses, making the application intermittently unresponsive. Upon studying GC performance, it was found that the application’s heap size was under-allocated. Increasing the heap size from 8GB to 12GB stopped the long GC pauses, improving the overall application’s throughput by 23%.
c. Avoid Long Pauses
When a garbage collector runs, it pauses the entire application to mark the objects that are in use and sweep away objects which are not active. During this pause period, all customer transactions in motion are stalled. Depending on the GC & memory settings, pause times can range from milliseconds to minutes. A robotics application in a warehouse experienced 5+ minute daily pauses due to garbage collection, causing confusion as robots took autonomous decisions. Switching the GC algorithm reduced GC pause time to 2 seconds, tremendously improving the shipment process.
2. Reduce Computing Cost
Here is the white paper we published, which explains how enterprises waste millions of dollars in computing cost due to garbage collection. The cost of computing rises significantly due to enormous CPU consumption and pauses incurred during garbage collection.
a. Reduce CPU Compute
Uber, the major ride-sharing app, faced ballooning cloud computing costs. When they studied their application’s performance characteristics, they observed that a significant portion of their CPU consumption originated from garbage collection. By tuning their GC behavior, they lowered CPU consumption by 70,000 CPU cores. This optimization resulted in several million dollars in cost savings for them.
b. Identify Optimal Memory Size
Most applications have either under-allocated or over-allocated memory. Analyzing GC behavior helps to determine if your memory size is under-allocated or overallocated. Under-allocation results in poor application throughput, while over-allocation increases computing costs. Effective GC behavior analysis can avoid over-allocation, thereby reducing computing costs.
c. Reduce Software Licensing Cost
When you lower the CPU computing and memory size (i.e. #a and #b ), automatically you can run on a lesser number of containers/devices. When you run on a lesser number of containers, the vendor software licensing cost associated with your application will also come down.
3. Quickly Troubleshoot Production Problems
By studying Garbage Collection behavior, you can troubleshoot several production performance problems effectively.
a. Unearth OutOfMemoryError easily
There are actually 9 different types of OutOfMemoryError. Several of these OutOfMemoryError impacts are reflected in the GC behaviour. When OutOfMemoryError surfaces, GC events will start to run quite frequently, and objects in the memory wouldn’t get reclaimed at all as shown in the below figure:


Once you observe this behaviour, you can quickly confirm that application is suffering from OutOfMemoryError
b. Forecast Memory Problems Several Minutes Ahead
Most of the monitoring tools that we use are reactive in nature when it comes to monitoring OutOfMemoryError i.e., only after OutOfMemoryError surfaces, they start to generate alerts. However when you study the Garbage Collection behavioral patterns, you can forecast the memory related problems much earlier before they surface in the production environment.
Conclusion
I wish my developer friends will give second thought to garbage collection tuning and take advantage of the immense benefits it offers. Here is a post which shares 9 tips to optimize GC pause times, which might be a good starting point to start tuning GC performance.
FAQ
How can application throughput be improved?
To improve application throughput, one needs to reduce GC pause times. One effective way to reduce these pauses is by properly allocating heap memory.
What are the ways to reduce long GC pauses?
There are several ways to reduce long GC pauses, depending on application requirements. Some commonly used approaches include:
- Properly allocating heap memory
- Changing the GC algorithm
What are the key metrics to consider for application performance?
- Response time
- Throughput
- GC Pause time
- CPU usage
How can production performance problems be identified?
Common production issues include:
- Impending memory exhaustion
- OutOfMemoryError
How can computing costs be reduced?
One effective way to reduce computing costs is by minimizing GC pause times. If computing costs increase, running a detailed GC analysis can help identify GC behavior and uncover optimization opportunities.


8 Pingback