Hi, We have Talend job (Loading Fact Table in DW) in which we use more than 4 lookup tables. Among those lookup tables only one table has large amount of data (10 Million). Whenever the Job is executed we get error as "Out of Heap Memory". RUN 1 : As suggested in Talend Help Site i have tried to Increase the JVM parameters. Even after increasing the Job am unable to execute the Job. JVM Parameters: Xms256M Xmx1610M Source - SQL Server Lookup/ Target - Oracle. In Each of the lookup table we have enabled CURSOR. RUN 2: Also tried to load with lookup data stored in the Local Directory by enabling the Store Temp Data directory in tMap. The problem with this method is We are unable to Load all the data from source to Target. For Example : If the Source has 10 Millions records am able to Load only Half Million record into Target (Meaning lookup fails for not processed records). Also the time taken is more to process. Please Note: RAM Speed - 4GB In both these we were unsuccessful, is there any way in talend to Handle the lookup effectively.?? If so please let us know..!! Inputs would be helpful..
Hi, Use appropriate Xms and Xmx values which means increase both Xms and Xmx values accordingly. Also Increase the cursor size. Read only those columns which are required to look up. Enable parallel look up option in tMap. Thanks, Bhanu
Hi Bhanu, 1.Can u please provide us the JVM argument Maximum limit for 4GB RAM. 2. We have tried increasing the Cursor size after certain limit it is throwing error 3. We took only 2 id columns that are required 4.Can u please give us little explanation on Lookup in Parallel? So that we will justify our approach.
Hi, If you have 4GB RAM, then you can think of using 4096m for -Xmx. More over I would suggest you to split the job into two subjobs. As far as your two columns are concerned, then length of two columns could be 16 bytes, then total space required could be 10m * 16 bytes = 160m bytes 1000000kb = 1GB 160m bytes = 0.1 GB. There should not be any problem in doing this. You can perform similar calculations on your actual data and estimate the memory requirement. If you can break the job into two subjobs, then memory management would be efficient. Thanks Vaibhav
Hi, We went through this before, but haven't implemented as it suggests that it is useful for Large Lookup where the source data is pretty small. In our case the Source data is also equally large as Target Lookup. Thanks Arul
Java Version which we are using is 1.6.0_35.. Yeah i did try with 1610M, but same Result. I didnt try breaking the job, i will give a try on that. Is any thing that can be done to increase the Java Xmx Size.?
Hi kzone, Now am trying that. To store the lookup data in Disk I do have a doubt on that, As i have more that 4 lookup tables, should i store all the lookup data in the Disk or the One which has more lookup data.?
If your job fails again after using store on disk option, think of upgrading java version if you have no issues. Also disable second tMap and execute first tmap and lookup. Store all lookup data to disk and try again.
Hi, After using the Store on Disk option in tMap the Job is not failing whereas it is not processing all the source records into target. My source - 8Million record Expected O/P- 8Million Record Actual O/P - 3 Million Record Thanks Arul
I didnt try breaking the job, i will give a try on that
could be a better solution If it's possible join your data in several tmap & release memory between them (a temp stored in file for exemple as I/O in file is the fatest way) Could be 1. join with big data storing on disk or reload each time - empiric testing & store result in flat file 2 read result and join other data ps: solution by increasing jvm heap size is not a perennial solution as if data volume increase , it failed again (in production) .. so it could become a none solution as you cannot increase jvm indefinitely. Could be a solution for stable volume of data. hope it help regards
That's great.... How you managed to do that with tMap in addition to splitting the job? Have you identified performance improvement with temp storage on disk space as well as in memory for tMap settings? Thanks Vaibhav