One Star

million records

Hi,
I have few questions
Q1 . Can we able to process 60-80 million records in Talend ?
Q2. What is the best way on performance wise , if I have 4-6 million records on lookup ?
Q3. which is best option to store lookup data, if I have 4-6 million look up records  ? either file or DB ?
Regards.
2 REPLIES
Moderator

Re: million records

Hi,
There is no standard answer for the maximum volume of data handled by Talend. It depends on project scale, job design, data source and so on.
Do you want to use SQL query or tMap?making a join? to handle Lookups?
 Please provide the details. ELT components(all tables are in same DB), bulk execute are better way to load large data in a faster way.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: million records

Probably a late reply, but it might useful for others.
Q1. Assuming that you are reading from RDBMS Table: Yes, but it should be in stream mode to avoid Java heap error. I have loaded 33 million records extracted from 1 billion records table in MySQL through Talend.
Q2. For better performance, we need to consider the total number of input data  and output records count. For the above scenario, I need only 33 millions of records out of 1 billion records. I used a inner join query in extract query with stream mode, instead of lookup. Look up table's data needs to reside in the Server RAM. So, size of lookup table/file depends on the Server RAM, on which Talend installed.
Q3. I advice you to store the lookup data in the DB and join them in the extract query, as the lookup data is huge.
Thanks,
Srini,
AgilitX