tFileInputJSON gives "GC Overhead Limit" error

Highlighted
Six Stars

tFileInputJSON gives "GC Overhead Limit" error

Hello Champ,

 

I have created a simple job:

tFileInputJSON --> (main) --> tLogRow

 

The source JSON file is around 2GB size. I even increased JVM in RUN to Xms2G, Xmx4G. But always fails with memory issue.

 

NOTE: the JSON works great if it is a simple file.

 

Is there a way to extract big size file? or is it a product limitation? I seen some articles for CSV file, nothing for JSON Smiley Sad

Looking forward to hear some valuable answers. Thanks.

 

regards,

K


Accepted Solutions
Six Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

Thanks Sabrina. It didn't work at all. We have found alternative solution to split JSON file into multiple small files and processed successfully.


All Replies
Forteen Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

@rvkiruba,what is the system RAM size? and if you are using enterprise edition you can execute on remote system.

Manohar B
Don't forget to give kudos/accept the solution when a replay is helpful.
Six Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

@manodwhb my PC has 8GB RAM. Is it not sufficient?

 

Win 7 Enterprise. Can you pls brief me about remote system?

 

Thanks.

K

Moderator

Re: tFileInputJSON gives "GC Overhead Limit" error

Hello,

Could you please let us know if this article helps?

https://community.talend.com/t5/Migration-Configuration-and/GC-overhead-limit-error-when-running-Job...

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Six Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

Hi Sabrina,

Thanks for your reply. The article is not helping me. I get below error now.

 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

 

I too followed some other instructions and updated JVM like below, but no luck. 

With 8 GB of memory available on 64-bit system, the optimal settings can be:
-Xms1024m -Xmx4096m -XX:MaxPermSize=512m -Dfile.encoding=UTF-8

 

Thanks.

K

Moderator

Re: tFileInputJSON gives "GC Overhead Limit" error

Hello,

Could you please try to open the Job to which you want to allocate more memory and in the Run view, open the Advanced Settings tab and select the Use specific JVM arguments box. Please allocate more memory to the active Job by double-clicking the default JVM arguments and editing them.

This change only applies for the active Job. The JVM settings will persist in the job script, and take effect when the job is exported and is executed outside Talend Studio.

Let us know if it is OK with you.

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Six Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

Hi Sabrina,

As i said in my previous post, i have made necessary changes to JVM in Run --> Advanced Settings and ran the job in my PC. Screenshot attached fyr.

Are you saying that, the JVM setting change work only when the job runs in cloud. Is that correct?

 

Thanks,

K

 

Moderator

Re: tFileInputJSON gives "GC Overhead Limit" error

Hello,

It means the change only applies for your current Job not for the whole studio. And the JVM settings will persist in the job script, and take effect when your job is exported and is executed outside Talend Studio(.bat or .sh file).

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Six Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

Thanks Sabrina. It didn't work at all. We have found alternative solution to split JSON file into multiple small files and processed successfully.

Moderator

Re: tFileInputJSON gives "GC Overhead Limit" error

Hello,

Thanks for sharing your solution with us on forum.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

I am having same issue with json processing. My file size is 3GB. did you split the input at source?

Or you are processing the input file with 2GB and splitting it inside the job?

Employee

Re: tFileInputJSON gives "GC Overhead Limit" error

@mamohan 

 

If you are having a complex JSON, it would be a good idea to have smaller files from source stage itself since it will avoid the overhead of splitting the big files to smaller files.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Five Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

Thanks Nikhil,
The file is not complex and we are in the process of creating a POC. I would like to know if the component will copy the whole data to memory and start the next step. I have an XML file as well which is of similar size and didnt faced any issue. Why this is a problem only for JSON.
Six Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

Hi Mohan,

 

Option 1:

Check your PC's RAM size (I would suggest to have 16GB as min). Based on that, increase your JVM arguments size

-Xms (for ex, -Xms2048m)

-Xmx (for ex, -Xmx4096m)

To do,

1. go to RUN in your job

2. click "Advanced settings"

3. enable "Use specific JVM arguments"

4. change values

5. then 'run' job

 

Option 2:

split the file into smaller files and run

 

Let me know how it goes.

 

regards,

kiruba

Five Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

Hi Kiruba, 

Option 1: increase Memory

I have tried increasing memory and it still fails. Server has total of 16GB memory and we have some more process running in the same server. I was able to allocate max of 4GB. Anything above that was failing to allocate memory. My inpur file is around 2.9 GB. Based on discussion with talend support, job execution itself take some memory. Apparently the json object is read as a single object of 2.9GB and load it to memory. This is the place where it fails. 

Option 2: Split the input file into multiple files of smaller files. 

Our POC was an input of size 2.9GB. We can do the split but for anything we need to use talend components only. Is there any way that we can do it in talend?

My thought is that we need to read the file at least once to process it and will fail there. Correct me if i am wrong here. 

Four Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

You could try using the component tJSONDocInputStream by Jan Lolling from Talend Exchange. It is designed specifically for reading very large files.

Five Stars

Re: tFileInputJSON gives "GC Overhead Limit" error

Thanks Fred for the suggestion. Finally we went ahead with splitting the file outside talend and process it. It worked fine for us. Proalem with the json file generated in talend was it was stored as a single json object and talend tries to load the object to memory. So we created file with each record as json object and process it to avoid the GC issue. 

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Why Companies Move to the Cloud: 7 Success Stories

Learn how and why companies are moving to the Cloud

Read Now