Five Stars

Heap space and GC issues

Hello Experts,

Can you please advice in increasing the performance of my jobs I developed.

In my job my source table oracle contains 4 million records and have couple of lookup tables with same no of records.

But i am facing heap space and Garbage collector issues. When I set min and min JVM run arguments it worked fine. But going forward the count of records will increase, so can you give your valuable suggestions to handle the situation with out facing heap and garbage collector issues.


Thanks in advance.


Nine Stars

Re: Heap space and GC issues

- If possible do your lookup in Oracle, that will save a lot of mem.
- Store your 4 mln records in multiple splitted-files and process them seperately.
Four Stars

Re: Heap space and GC issues

The only issue I've seen with this in practice is that the disk can fill up really quickly with heap dumps if the service is configured to auto-restart itself upon failure (which I imagine is the case for most services). I've mostly encountered OOM issues where it would keep happening over and over and over...

This isn't a problem if there is only one service running on that particular box, but can impact other services & applications if it is a multi-use box.


Five Stars

Re: Heap space and GC issues

My lookup is on oracle source and lookup model is load once. Do you mean the same thing?

Nine Stars

Re: Heap space and GC issues

No, If possible, match it in Oracle, your (left/right) joins with those big table(s). Not in Talend tMap.
Your DB engine is designed to perform these kind of tasks...
Talend/java job will eventually crash if there's no memory left... inevitable heap space shizzle

So your matching strategy should be.... (if you insist on Talend doing the matching)
0 - only get data which you need to match. Reduce data footprint and calculate max needed heap space.
1 - Make sure Cartesian products don't occur by mistake, if it should be 1:n and it turns out to be n:n... check for duplicates on the unique side
2 - Match on relevant columns ... store match.... and match it back and add those needed additional columns. Reduction of bytes!

Never used it but I will in the future, mem analyzer...
However if a virus program runs in the background it could prevent allocating memory above 512MB... switch it off. And finally dont get into garbage collection stuff, your jvm does it for you,

Min heap space is to increase startup time... but Max heap space is what is max allowed by jvm, so focus on your max value calculation. Some processes need so much memory you can only run it separately ... probably this 4mln matching records, its the elephant in the room.