OutOfMemoryError:GC overhead limit exceeded

Four Stars

OutOfMemoryError:GC overhead limit exceeded

Hi,
Lets say i have two hdfs files A and B. File A with 40 columns. I am generating 10 million data for file A. While generating data for file B, i need to take random records from file A (like 7 or 8 columns only primary key - foreign key relationship). As i am giving 10 million data as lookup, i am getting GC overhead issue. I have 8GB RAM and i have given XMX till 8096. I tried with 4096 as well.

Please give a solution to solve this issue / an alternate method to take random records.

As i am generating data dynamically, i think breaking the files in to small files and fetching random records is not possible. Please clarify.

I have used a temporary location also for tmap and increased buffer size too. But i couldn't find whether its working. Please let me know if there is a way to check whether temp data is getting stored in this location and there is a effect in increasing buffer size.
Employee

Re: OutOfMemoryError:GC overhead limit exceeded

Hi,

 

    When the lookup size is more, you will have to use Store temp data in disk option in tMap. The details can be referred from below link.

 

https://help.talend.com/reader/EJfmjmfWqXUp5sadUwoGBA/J4xg5kxhK1afr7i7rFA65w

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

Four Stars

Re: OutOfMemoryError:GC overhead limit exceeded

Hi Nikhil,

Thank you for your suggestion. But as i have said already i used temporary location already for storing temp data. Still I get memory issue. XMX size of 8096 is not enough for this lookup but i cannot give higher xmx either.

When i store temp data on disk, i am getting error message in my desktop too (Saying something like- " an application is using lot of space,close the program?")
Forteen Stars

Re: OutOfMemoryError:GC overhead limit exceeded

@lbhavya345 ,you need to split the data and do the required operations otherwise need to increase the RAM size and do the process

Manohar B
Don't forget to give kudos/accept the solution when a replay is helpful.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now