HELP. join with table with many records

One Star

HELP. join with table with many records

Hi

I have to read 2 tables in join and update a third one. Simple !!

But when talend performs a join on tMap, it first reads all records in the second table and then runs. this table can have hundreds of thousands of records. Is it possible not to read all the records first? Can you help me?

 

Best Regards

Benny


Accepted Solutions
Sixteen Stars

Re: HELP. join with table with many records

There are a couple of ways to approach this problem.

 

If the tables are from the same database, you should join in one query. Always filter your data at the database where possible. It is incredibly inefficient to load all of your data into Talend to filter it if you do not need to.

 

If the data is from different databases then it is important to filter out as much data as you can in your initial queries. Since you are comparing one row with potentially thousands (hundreds of thousands?), removing even a handful of rows from your Main or Lookup will help with performance. You also need to think about whether it is more efficient to bring all of your Lookup data in at once (at the beginning), or fire a Lookup query for every Main row that arrives? If you want to fire a query for each Main row that arrives, you can filter the query using data from the Main row. This will massively reduce the number of rows returned BUT comes at the expense of potentially hundreds of thousands of queries. To do this you need to select "Reload at each row" from the "Lookup Model" dropdown in the Lookup settings within the tMap. I have a tutorial which shows how this is used (although for a very different problem) here: https://www.rilhia.com/quicktips/quick-tip-row-multiplication

 


All Replies
Sixteen Stars

Re: HELP. join with table with many records

There are a couple of ways to approach this problem.

 

If the tables are from the same database, you should join in one query. Always filter your data at the database where possible. It is incredibly inefficient to load all of your data into Talend to filter it if you do not need to.

 

If the data is from different databases then it is important to filter out as much data as you can in your initial queries. Since you are comparing one row with potentially thousands (hundreds of thousands?), removing even a handful of rows from your Main or Lookup will help with performance. You also need to think about whether it is more efficient to bring all of your Lookup data in at once (at the beginning), or fire a Lookup query for every Main row that arrives? If you want to fire a query for each Main row that arrives, you can filter the query using data from the Main row. This will massively reduce the number of rows returned BUT comes at the expense of potentially hundreds of thousands of queries. To do this you need to select "Reload at each row" from the "Lookup Model" dropdown in the Lookup settings within the tMap. I have a tutorial which shows how this is used (although for a very different problem) here: https://www.rilhia.com/quicktips/quick-tip-row-multiplication

 

One Star

Re: HELP. join with table with many records

I'm just a fool. The table are in the same database and I can work with SQL, view, index, etc.

 

Thank you for your suggestions

 

Benny