Handling Large Lookup in tMap(10 Million+)

One Star

Handling Large Lookup in tMap(10 Million+)

Hi,
We have Talend job (Loading Fact Table in DW) in which we use more than 4 lookup tables. Among those lookup tables only one table has large amount of data (10 Million). Whenever the Job is executed we get error as "Out of Heap Memory".
RUN 1 :
As suggested in Talend Help Site i have tried to Increase the JVM parameters. Even after increasing the Job am unable to execute the Job. 
JVM Parameters:
Xms256M
Xmx1610M
Source - SQL Server
Lookup/ Target - Oracle.
In Each of the lookup table we have enabled CURSOR. 
RUN 2:
Also tried to load  with lookup data stored in the Local Directory by enabling the Store Temp Data directory in tMap.
The problem with this method is We are unable to Load all the data from source to Target. For Example : If the Source has 10 Millions records am able to Load only Half Million record into Target (Meaning lookup fails for not processed records).
Also the time taken is more to process.
Please Note:
RAM Speed        -    4GB
In both these we were unsuccessful, is there any way in talend to Handle the lookup effectively.??
If so please let us know..!! Inputs would be helpful..

Below i have also attached my Job Screen Shot:
 
Two Stars

Re: Handling Large Lookup in tMap(10 Million+)

Hi,
Use appropriate Xms and Xmx values which means increase both Xms and Xmx values accordingly.
Also Increase the cursor size.
Read only those columns which are required to look up.
Enable parallel look up option in tMap.
Thanks,
Bhanu
One Star

Re: Handling Large Lookup in tMap(10 Million+)

Hi Bhanu,
1.Can u please provide us the JVM argument Maximum limit for 4GB RAM.
2. We have tried increasing the Cursor size after certain limit it is throwing error
3. We took only 2 id columns that are required
4.Can u please give us little explanation on Lookup in Parallel? So that we will justify our approach.

Thanks
Arul
One Star

Re: Handling Large Lookup in tMap(10 Million+)

Any Inputs on this.. Still we are facing same issue..!!
Four Stars

Re: Handling Large Lookup in tMap(10 Million+)

Hi,
If you have 4GB RAM, then you can think of using 4096m for -Xmx.
More over I would suggest you to split the job into two subjobs. As far as your two columns are concerned, then length of two columns could be 16 bytes, then total space required could be 10m * 16 bytes = 160m bytes
1000000kb = 1GB
160m bytes = 0.1 GB.
There should not be any problem in doing this.
You can perform similar calculations on your actual data and estimate the memory requirement. If you can break the job into two subjobs, then memory management would be efficient.
Thanks
Vaibhav
One Star

Re: Handling Large Lookup in tMap(10 Million+)

Hi,
As said we have 4GB RAM, even then i could not increase the Java XMX more than 1610M. If i do then i get below error message
"Could Not reserve enough space for object Heap"..
Thanks
Arul
Seventeen Stars

Re: Handling Large Lookup in tMap(10 Million+)

hi all,
as your large volume is a lookup try to reload data at each time filtering data in a where clause if it's possible :
https://help.talend.com/search/all?query=Handling+Lookups&content-lang=en
regards
laurent
One Star

Re: Handling Large Lookup in tMap(10 Million+)

Hi,
We went through this before, but haven't implemented as it suggests that it is useful for Large Lookup where the source data is pretty small.   In our case the Source data is also equally large as Target Lookup.
Thanks
Arul
Four Stars

Re: Handling Large Lookup in tMap(10 Million+)

What is the java version you are using. I have setup 10240m at my client place with java 1.7.0.
Have you tried with 1610m?
Have you tried breaking your job into two sections..?
Vaibhav
Seventeen Stars

Re: Handling Large Lookup in tMap(10 Million+)

did you try the option store on disk for tMap ??
regards
One Star

Re: Handling Large Lookup in tMap(10 Million+)

Hi sanvaibhav,

Java Version which we are using is 1.6.0_35..
Yeah i did try with 1610M, but same Result.
I didnt try breaking the job, i will give a try on that.
Is any thing that can be done to increase the Java Xmx Size.?

Hi kzone,
Now am trying that. To store the lookup data in Disk
I do have a doubt on that,
As i have more that 4 lookup tables, should i store all the lookup data in the Disk or the One which has more lookup data.?

Thanks
Arul
Four Stars

Re: Handling Large Lookup in tMap(10 Million+)

If your job fails again after using store on disk option, think of upgrading java version if you have no issues. Also disable second tMap and execute first tmap and lookup. Store all lookup data to disk and try again.
One Star

Re: Handling Large Lookup in tMap(10 Million+)

Hi,
After using the Store on Disk option in tMap the Job is not failing whereas it is not processing all the source records into target.
My source - 8Million record
Expected O/P- 8Million Record
Actual O/P - 3 Million Record
Thanks
Arul
Four Stars

Re: Handling Large Lookup in tMap(10 Million+)

Disable that store on disk option and check again...
Seventeen Stars

Re: Handling Large Lookup in tMap(10 Million+)

I didnt try breaking the job, i will give a try on that

could be a better solution Smiley Happy
If it's possible join your data in several tmap & release memory between them (a temp stored in file for exemple as I/O in file is the fatest way)
Could be 
1. join with big data storing on disk or reload each time - empiric testing Smiley Wink  & store result in flat file
2 read result and join other data
ps: solution by increasing jvm heap size is not a perennial solution as if data volume increase , it failed again (in production) .. so it could become a none solution as you cannot increase jvm indefinitely.
Could be a solution for stable volume of data.
hope it help
regards
One Star

Re: Handling Large Lookup in tMap(10 Million+)

Hi Guys,
Many Thanks for your Time and Inputs..
I was able to run after breaking Single Job into 3 separate jobs.
Thanks
Arul
Four Stars

Re: Handling Large Lookup in tMap(10 Million+)

That's great....
How you managed to do that with tMap in addition to splitting the job? Have you identified performance improvement with temp storage on disk space as well as in memory for tMap settings?
Thanks
Vaibhav