Five Stars

OutOfMemoryError: GC overhead limit exceeded

Hello

 

We have a problem with a job that failed because  of java.lang.OutOfMemoryError: GC overhead limit exceeded.

the job is reading 13 million records from oracle to thashoutput . I would like to know what cause this error .

I know that there is xmx and xms parameter but I dont realy know how should I config those parameter.

is there any legality for changing those parameters.

 

Thanks

14 REPLIES
Ten Stars

Re: OutOfMemoryError: GC overhead limit exceeded


Boof1977 wrote:

Hello

 

We have a problem ...I would like to know what cause this error .

 


the job is reading 13 million records from oracle to thashoutput

if You want use memory for speedup Job - You must have this memory available for Talend

 

All depend from size of this 13M records (and not forget fro other part of Job)

default value 1024m, try with 4096 or b bigger if You have this memory free on Talend Machine

 

or stop use tHashInput it make work slower but less memory usage

-----------
Five Stars

Re: OutOfMemoryError: GC overhead limit exceeded

Hi

 

Thanks for your answer. we have 65 GB memory ram at the Talend Linux machine ,we execute the job from the studio on Linux server and we try to put

xmx 5G xms 15G but it failed. 

a. how can I know how much memory the job will consume?

b. what should be the relation between xmx and xms and what exactly the purpose of  those  parameters?

 

Thanks

 

 

Ten Stars

Re: OutOfMemoryError: GC overhead limit exceeded

xms - minimum memory available for Java

xmx - maximum

 

You not provide enough information about Your Job:

- what data structure?

- what columns You try to load, and what really need for lookup?

- what other data flows in Your Job?

 

without information - it just guessing

 

13M * 1kb = 13Gb , 13M * 2kb = 26Gb

 

other question - why You try to lookup over so huge table right in Talend? may be best place for this - database.

 

Think about in memory lookup as point of failure,

even if You fix problem now (just put xmx - 48000M, of course You have 64 bit JDK installed) ... but what You will do when next day it will be 20M of records? 

-----------
Five Stars

Re: OutOfMemoryError: GC overhead limit exceeded

Hi

 

I read that "This component loads data to the cache memory to offer high-speed access, facilitating transactions involving a large amount of data"

what does it mean large amount? is there any number?

how do I know when to use DB lookup or thash? what is prefer?

is there any way to calculate the memory the job will consume?

Thanks

 

 

Ten Stars

Re: OutOfMemoryError: GC overhead limit exceeded

You never found 100% correct answer rather than - logic and try to guess ;-)

 

- what also run of this server?

- what real free memory, not used by file cache and other processes

- what other tasks You plan run in feature at same time with this

- etc, etc ,etc

 

it not possible to answer without full information even on simple question - what is Your table structure?

very simple answer - SUM all columns lengths and multiply on 1.5 to be sure

 

13M * INT = 64Mb

13M * Long = 128Mb

13M * VARCHAR(512) = 6.5Gb (in worst case) 

What prefer?

Also no single answer.

What size of main flow? what resources of database server? what must be final result?

Normally:

Hash - for small number of lookup data, which must be used more than in 1 tMap or lookup

 

even if You use lookup from database - Talend also will load all data into memory, in this case better to load data to database and make Lookup by SQL.

 

-----------
Five Stars

Re: OutOfMemoryError: GC overhead limit exceeded

Hi

 

Regarading your first questions in the future it will be hard to know because the consumption of the resources are something that dynamically change.

the only thing I can tell you for now is that its a new server that nothing is running on it .

 

can you please explain again the calculation of

13M * INT = 64Mb

13M * Long = 128Mb

13M * VARCHAR(512) = 6.5Gb (in worst case) 

 

for example if I have a table with 5 columns and 13 million record :

column 1 - number

column 2 - varchar(20)

column 3 - long

column 4 - varchar(20)

column 5 - Bigdecimal

a. what should be the result of the calculation?

b. what should I put at the xmx and xms and what is the relation between of them.

c. what is the different between the memory utilization of using DB lookup and thash?

 

Thanks

 

 

 

 

 

 

 

 

Ten Stars

Re: OutOfMemoryError: GC overhead limit exceeded

number - 20bytes

varchar(20) - for UTF-8 2x2x20 = 80bytes, could be more

big decimal - 4bytes

long - 8bytes

 

total 20+80+4+8=112bytes per row == 1.5Gb,

 

but this is only for Hash, and if talend not add something, so better to take approximate 2Gb, and You have other part of Job

 

if server empty - start from biggest and reduce till not stop work

-----------
Ten Stars

Re: OutOfMemoryError: GC overhead limit exceeded


Boof1977 wrote:

Hi

 

 

c. what is the different between the memory utilization of using DB lookup and thash?

 

 

DB Lookup in Talend will load data every time when You se this table for Lookup. From this point of view Hash look more effective - load once, use in 10 tMap after. But I mean operation in database - if source and target on same server, filter first by query, than load to Talend. If different - load all to target or staging server and then filter by SQL queries again. Databases (most of them) - designed for work with data many times bigger than memory.

-----------
Five Stars

Re: OutOfMemoryError: GC overhead limit exceeded

hi

 

does the length of the field at the "edit schema" represent the number of bytes?

what is the relation between xmx and xms?

 

Ten Stars

Re: OutOfMemoryError: GC overhead limit exceeded


Boof1977 wrote:

 

what is the relation between xmx and xms?

 


You ask me - what relations between minimum value and maximum value? ;-)

second - is bigger :-)

 

Sorry for humour, but - really what are You try to ask? 

https://stackoverflow.com/questions/14763079/what-are-the-xms-and-xmx-parameters-when-starting-jvms

https://docs.oracle.com/cd/E15523_01/web.1111/e13814/jvm_tuning.htm#PERFM150

https://www.ibm.com/support/knowledgecenter/SSYKE2_7.0.0/com.ibm.java.lnx.70.doc/diag/appendixes/cmd...

 

 


does the length of the field at the "edit schema" represent the number of bytes?

 


do not know - may be, not sure. in any case - minimum size already calculated on previous answers ... and this is only for Hash

 

I try to explain You simple idea - something could be wrong in Job design, may be in idea of Job

Think different

-----------
Five Stars

Re: OutOfMemoryError: GC overhead limit exceeded

regarding the humour Smiley Happy I ment ,does the maximum value is need to be twice bigger or maybe fourth bigger from the min value ? is there any rule of how much bigger should it be?

 

do you have any useful  link I can see the size of the types you wrote (number = 20 bytes )

you wrote "... and this is only for Hash" -   can you explain?

 

Thnks

 

Ten Stars

Re: OutOfMemoryError: GC overhead limit exceeded


Boof1977 wrote:

 

do you have any useful  link I can see the size of the types you wrote (number = 20 bytes )

 


 

Google and Oracle Docs ;-)

https://stackoverflow.com/questions/24240087/oracle-numberp-storage-size

https://asktom.oracle.com/pls/asktom/f?p=100:11:0:::Smiley Tongue11_QUESTION_ID:1856720300346322149

https://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#CNCPT1824

https://docs.oracle.com/cd/B19306_01/server.102/b14237/limits001.htm#i287903

 

You not provide full information, so all about 20 - is just guessing

is it NUMBER(38) or NUMBER(2) - ?!!

 

tHASHInput - must store all data in memory for feature use

Dose talend also must have some memory to manage this data?

 

for example You have flow with another 100M rows and want split it to Matched and rejected ... dose Talend need any memory for make this?

Dose Your Job have only 1 component - tHASHInput and nothing more? Dose Talend need memory for this components as well? Because memory used for Hash - not released until Your Job not finished

 

 

You just describe the error message ... and nothing more

 

-----------
Five Stars

Re: OutOfMemoryError: GC overhead limit exceeded

Thank for your help.

Moderator

Re: OutOfMemoryError: GC overhead limit exceeded

Hello Boof 1977,


 
Can we consider your issue as resolved?
 
 
 
 
              
 
 
 
Best regards
  Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.