There’s a famous saying: “If you can’t measure it, you can’t improve it.” This is especially true for Garbage Collection (GC) tuning. However, there are numerous GC metrics to consider, each providing insights into different aspects of your application’s performance and behavior. But not all metrics are equally important. In this post, we’ll focus on the key GC metrics that have the most significant impact on your application’s efficiency and stability. By understanding and monitoring these critical metrics, you can make informed decisions when optimizing GC Performance.
Key Garbage Collection Metrics
Following are the key GC metrics:
- GC Latency
- GC Throughput
- Footprint
- Memory Size
- CPU Consumption
- Application Metrics
- Response Time
- Throughput
Note: Application Metrics (i.e., #4) are not GC Metrics. However, the ultimate goal of GC tuning is to enhance the overall application’s performance. Thus, GC metrics should be studied in conjunction with Application Metrics to ensure that the tuning efforts lead to tangible improvements in application performance.
1. GC Latency
This is the primary GC performance metric. When a GC event runs, it pauses your application. Until the GC event completes, no customer transactions can be processed. The measure of how long each GC event paused your application is called ‘GC Latency’. It is reported in time units (milliseconds, seconds, minutes). It’s essential to study average GC pause time, maximum GC pause time and the distribution of pause times. Typically, high performing applications should aim for low latency.
Real case study: Here is a robotics application which was suffering from 5+ minutes GC Pause time. However, with proper GC tuning, Max GC pause time was reduced to 2 seconds.
What is the ideal GC Pause Time? Answer is, it depends from application to application.
- High-performance applications (e.g., stock trading platforms, space mission software) require pause time in milliseconds.
- Enterprise business applications can typically tolerate pauses of 1 – 5 seconds.
- Batch applications running asynchronously can tolerate several seconds of pause time
2. GC Throughput
GC Throughput reports the cumulative amount of time JVM spends in processing customer transactions versus the amount of time JVM spends in processing garbage collection activities. It is reported as a percentage. If an application’s GC throughput is 98%, it means the application spends 98% of its time processing customer transactions and 2% of its time in processing GC activities. One should always aim for a high GC throughput Percentage. If your GC throughput is on the lower side, then your computing cost will go up. Here is a white paper which discusses how enterprises are wasting millions of dollars in computing costs when GC throughput goes lower.
Real Case Study: Here is a case study of an Insurance application that was suffering from 96% GC throughput, however with proper GC tuning, they were able to improve their GC throughput to 98.5%.
What is the ideal GC Throughput? The ideal throughput varies by application. For Customer facing applications, one should target a GC throughput of >98%.
3. Footprint
Footprint metrics provide insights into the effects of memory size adjustments on GC efficiency and the influence of GC settings on CPU consumption.
a. Memory Consumption: Garbage Collector needs native memory to do its job. Thus whenever any changes are made to GC Settings (like switching to new GC algorithms, introducing new GC arguments…), it can result in variation in memory consumption. Memory consumption should be monitored whenever tuning GC settings.
b. CPU Consumption: Garbage Collection is a CPU-intensive operation. Changes to GC algorithms or settings can significantly impact CPU consumption. Optimizing GC behavior can lead to considerable savings in CPU usage.
Real Case study: Uber, the major ride-sharing app, optimized their GC behavior, and lowered their CPU consumption by 70K CPU cores, which resulted in several million dollars in cost savings.
4. Application Metrics
GC performance can’t be studied in isolation. It has to be evaluated in conjunction with the overall application’s performance. Here are the application metrics one should study, when doing GC tuning:
a. Response Time: This metric measures the time taken for the application to respond to user requests. The overall application’s response time is directly affected whenever GC settings are tweaked. Monitoring response time helps ensure that changes in GC settings do not adversely impact the user experience, particularly in applications where quick responses are critical.
Real Case Study: Here is a case study of one of the world’s automobile manufacturers, who tweaked their GC settings and improved their overall application’s response time by 50%.
b. Throughput: Optimizing GC settings also enhances the overall application’s throughput, which is the number of transactions processed per unit of time (e.g., 803 transactions/sec). This metric reflects the application’s capacity to handle load and should not be confused with GC throughput, which measures the efficiency of GC activities. Improved application throughput means that the application can process more transactions in a given time frame, leading to better performance and scalability.
Real Case Study: Here is a case study of an application which modified their heap size and reaped considerable increase in their application throughput.
How to source GC Metrics?
The first 3 metrics (GC Latency, GC Throughput, Footprint (i.e. Memory Size, CPU consumption)) can be sourced from your application’s Garbage Collection log. You can upload your GC log to a GC log analysis tool like GCeasy, IBM GC Visualizer, HP JMeter, Garbage Cat. This tool provides instant reports on these GC KPIs, allowing you to quickly assess the GC performance.

Application metrics such as response time and throughput can be sourced using traditional monitoring tools or by analyzing access logs.
Trade-offs in GC Tuning
Optimizing GC settings involves trade-offs between key performance indicators (KPIs) like GC Latency, GC Throughput, and Footprint (Memory Size and CPU Consumption). Typically, you can optimize for a maximum of two KPIs simultaneously. For example, reducing GC latency (pause times) often increases CPU consumption due to more frequent collections. Similarly, aiming for high GC throughput can result in longer individual GC pause times, which may not be suitable for applications requiring low latency.
For instance, high-performance trading applications prioritize low GC latency and high throughput, accepting higher CPU consumption. In contrast, batch processing applications can afford longer GC pauses but focus on minimizing CPU usage and maximizing throughput. Balancing these trade-offs requires careful monitoring and tuning based on your specific application needs.
Conclusion
We hope this post has highlighted the key Garbage Collection metrics you need to focus when tuning GC performance. Understanding and optimizing these metrics, in conjunction with application metrics, will help you to build high-performing systems.
FAQ
What are the main application metrics?
Latency and throughput are the main application metrics that are commonly used as performance benchmarks for any application.
What is GC latency?
GC latency refers to the duration for which an application is paused during garbage collection.
What is throughput?
Throughput measures the proportion of time the JVM spends processing user transactions compared to the time it spends performing garbage collection. Throughput is typically expressed as a percentage.
What is the ideal GC throughput?
For customer-facing applications, an ideal GC throughput is generally considered to be greater than 98%.
How does garbage collection affect the CPU usage?
Garbage collection is a CPU-intensive operation. Changes in GC algorithms can significantly impact CPU usage. By optimizing the GC algorithm, it is possible to reduce CPU consumption and improve overall application efficiency.



15 Pingback