How effectively handle Large volume of data

One Star

How effectively handle Large volume of data

Hi Team,
Our company is one of the Prestigious customer for Talend. The Issue is we are now handling the client having minimum 5 million records in every table.
In this scenario we need to do lot of lookups and computations for building a warehouse.
The below image is having 5 Oracle sources, each table is having 20 Million records nearly. We have restricted with "where clause" and tuning with 'ini' files also
tried. But still we have atleast 2 million records to take from each table and do lookup.
But as for as we tried we cant eliminate the" java out of memory error".
Even we tried store temp data on disk also. But still we cant run. The other thing if we do joins and other things with sql in oracle input, then the load will be pointed to database.
Then we have the question of what is the purpose of the tool.
When comparing with other tools like Informatica, they can process million and Trillion records with multiple table lookups and all very taken care by the tool effectively.
So pls tell us
1.how we can process this huge volume of data with this TOS and give some sample mapping design able to process such large volume of data.
2.And also is there any tuning kind of things that we need to do for proceesing such large volume of data.
3.How effectively the TOS can perform with large source tables.
One Star

Re: How effectively handle Large volume of data

Is there any option like uncached lookup in informatica present in Talend?, so that atleast we can fire the select query for evry single row of source row.
Becoz we dont know how to came out from "java heap space error ".
how we can handle larger volums of source and lookup tables.
pls help,
Regards,
John
One Star

Re: How effectively handle Large volume of data

Is it not a possibility for you to simply assign more memory to the java machine?
Employee

Re: How effectively handle Large volume of data

Hello Paul,
when you do the lookup with tMap you have to activate HDD caching by pressing the "grey box" icon on the lookup input. This will tell your process to store data in chunks on the disc. In the component settings of tMap you then should specify the folder where those chunks are to be placed. If the records are too big in size you should reduce the chunk size by decreasing the value of "Max buffer size" (take away one "0" at the end).
It is also a good idea to reduce the row width to the needed columns and not to use the "default full record" which TOS provides. Instead use only the columns that are required for further processing.
Yes, it is also possible to do a query on the lookup-tables for each processed row. Press the green arrow on the left of the lookup section in the tMap editor and choose "reload at each row" instead. Find more details on that use case in the documentation.
Basically it is always a good idea to provide as much memory as possible to the JVM that runs your process.
One Star

Re: How effectively handle Large volume of data

Hi Thomas,
Thanks for your reply. I have tried with the Storing temp data on disk. But its slow down the process. Any way thanks. But can u tell me where i can increase the memory for JVM ?
John
One Star

Re: How effectively handle Large volume of data

Hi Team,
We have already 4GB RAM memory in our machine. We have four source tables as shown in picture.
First one : 2000000 Rows of records Main
2nd One : 500000 Rows Lookup
3rd one : 2000000 Rows Lookup
4th one : 50 rows. Lookup
While loading 2nd table it self its throwing error.
When we try with storing temp data on disc. But after loading source records to tmap, its throwing error- image 2. the flow is not completed.
Pls help us how to allocate maximum JVM memory size and perform such a large data processing.
Regards,
Paul