Garbage Collection plays a major role in influencing the performance of the application. Besides that, it’s a vital artifact to diagnose production problems. In this post, we will share an effective technique to analyze Garbage Collection behavior at scale across your fleet of production containers/servers.

Why study GC Behavior in Production?

Analyzing Garbage Collection behavior across your fleet of production containers/servers offers several valuable benefits. Some of the key benefits are:

a. Application Performance Monitoring: GC behavior reflects how efficiently memory is managed at runtime, helping you to identify bottlenecks that impact responsiveness and throughput.

b. Troubleshoot Memory Problems: Studying GC patterns can highlight memory inefficiencies like leaks or excessive churn, allowing you to pinpoint and resolve memory-related issues quickly.

c. Effective Capacity Planning: By understanding allocation patterns and memory needs, you can correctly size JVM’s internal memory regions, avoid over-provisioning, and optimize infrastructure costs.

Challenges in Studying GC Behavior in Production

Studying Garbage Collection behavior in your production environment poses following challenges:

1. APM Tools Overhead & Lack of Details: APM Tools shows insightful metrics on application performance, however their reporting are shallow when it comes to GC behavior. On top of it, APM tools are quite expensive and add considerable performance overhead to your application. The best approach to analyze GC behavior is by studying Garbage Collection Logs. These logs are directly generated by the JVM & they add almost zero overhead to your application. It provides highly insightful micro-metrics about the GC performance.

2. Hundreds/Thousands of JVMs: In our production environment we have hundreds/thousands of JVMs. Trying to study each JVM’s Garbage Collection log is a highly tedious and cumbersome work. There are GC log analysis tools, however trying to upload each GC log file manually to the tool and analyze them is a complex, mundane task. 

GC Log Analysis REST API

This is where GCeasy tool’s GC log analysis REST API comes handy. You can invoke this API through a simple HTTP POST call from your production servers directly, by passing the location of your GC Log file. Here is the CURL command to invoke the REST API:

curl -X POST --data-binary @{GC_LOG_FILE_PATH} https://api.gceasy.io/analyzeGC?apiKey={YOUR_API_KEY} --header "Content-Type:text"

How to invoke GCeasy REST API?

If you are interested in automating GC log analysis, you may refer to this post: How to the invoke GCeasy REST API? It’s quite trivial and straightforward to use!

You can either have the GCeasy server run on your premise or use the cloud edition based on your corporate security mandates. The Tool will analyze the GC log file, send back the JSON response containing problem statements, insightful metrics and GC behavior graph images and archive the analysis report in the dashboard. 

REST API Response

Fig: GCeasy REST API Response

Above is the excerpt from the GCeasy REST API’s JSON response. It contains insightful information. Here are few important metrics that I would like to highlight:

a. GC/Memory Problem Detection: GCeasy uses pattern recognition technologies and ML algorithms to detect memory related performance problems. Such detected problems will be reported in the ‘problem’ element of the response. If a ‘problem’ element is present in the response, you can correct actions such as generating alerts, restarting JVMs, capturing troubleshooting artifacts.

b. Insightful Metrics: GC Throughput, GC Pause Time, GC Overhead (in terms of CPU, Memory) are considered as Key Performance Indicators of GC Study. JSON response not only contains these KPIs, but also deeper micro-metrics such as Object Creation Rate, GC event’s internal phases (such as initial-remark, concurrent-scanning, concurrent marking, final-mark, …) statistics, reasons why GC events were triggered and several more metrics.

c. GC Graph Images: Heap Usage Graph, GC Pause Time Graph & other graph images are sent as hyperlinks. You can embed these image hyperlinks into your reports, dashboards, JIRA tickets, emails.

One Dashboard

Fig: GCeasy Dashboard

Besides sending back the JSON response, GCeasy tool also stores the analyzed GC log report in the dashboard, which you can access from any location, any time. You will reap the following benefits by using GCeasy dashboard:

a. Tagging/Search: By default, GC log analysis reports are tagged by application name, host address from which GC log was captured and problem statement hints. You can do additional tagging by release number, JIRA Ticket number… Since reports are tagged, they can be searched by custom tags, application name, host name, time period.

b. Historical Look up/Compare Reports: All the GC logs uploaded through the API are archived in the GCeasy server. Thus all the historical GC log reports can be looked up from the dashboard. Besides that, ‘Compare Report’ capability in the dashboard facilitates you to compare GC analysis reports side by side and see the KPIs improvement (or degradations), GC trends in a single window. Using this feature, you can compare the application’s GC performance before and after the release, between two different applications …

c. Any Time, Any Location: Since GCeasy dashboard is an online web application, it can be accessed from any location through your desktop browsers, mobile phones and anytime.

d. Safety & Security: Since GC logs are captured from the production environment, several organizations classify them as confidential information. GCeasy tools’ security features like data sanitization, encrypted transmission, SSO authentication, granular access controls, data masking, and automated incident cleanup, ensuring sensitive information stays protected throughout the debugging workflow. For more details on security features refer to this post.

Fig: GCeasy Analysis Report

Conclusion

Automating GC log analysis at scale can be helpful for maintaining optimal application performance, troubleshooting memory issues, and managing infrastructure efficiently. With tools like GCeasy’s REST API, you can seamlessly integrate GC analysis into your production workflow, gain valuable insights, and ensure system stability without the overhead of manual log parsing.