Lookups on large Oracle table without reading the whole table

One Star

Lookups on large Oracle table without reading the whole table

Hello
in doing a lookup against a large oracle table using an exact primary key column match, I find that TOS402 reads the whole table and caches it (either in memory or disk) before doing the lookups. If the lookup data set is very large compared with the source data set, then this is very slow. I would like it to do a lookup query for each record in the flow.
I tried using the options in tMap to specify reload at each row and this just reads the whole table for every single row. I realy need the query itself to effectiovely use a where clause so it queries for the specific primary key value.
How can I do this?
Thanks
Paul
Community Manager

Re: Lookups on large Oracle table without reading the whole table

Hello
Yes, you are right, the lookup flow load the whole table and caches records into memory or disk. In this case, you can iterate each row of main table. for example:
tOracleInput("select * from from main table")--row1--tFlowToIterate---iterate--tOracleInput("select * from lookup table where id="+(Integer)globalMap.get("row1.id"))---->tLogRow
here id is a primary key of main table.
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star jgo
One Star

Re: Lookups on large Oracle table without reading the whole table

This is quite an old post, but I am having similar issue.
I am loading data into Salesforce using tSalesforceOutputBulkExec. I have two lookups to Salesforce huge objects, one on them is Account which contain a list of all clients. It take a VERY LONG time to spool through the records before it can even start the upload process.
I am very new to talend, so I am having a hard time understanding how to implement the tFlowtoIterate component to my existing job(see screenshot), I dont know where will it fit? And where do I put query (remember I am using Salesforce so it does not use sql query language)
Please advice.
Thanks.
One Star

Re: Lookups on large Oracle table without reading the whole table

Can you leverage Oracle database dictionary.  In case of Oracle select NUM_ROWS from DBA_TABLES where table_name = 'XYZ'.  The only catch is that statistics need to be collected on these tables.  Consult your dba and see if you can leverage this option.
Five Stars

Re: Lookups on large Oracle table without reading the whole table

You can push joins down to Salesforce.
As one of your tables is a custom table, you may be limited in what you can do.
This is not really a Talend issue so, personally, I'd refer this to a Salesforce SOQL expert. If it can be written in SOQL, then it can be executed from Talend.
Firing an efficient SOQL query and then letting Talend process the result-set may be your best bet.
You'll still need to cache your Salesforce lookup data but at least you have the possibility of reducing the input dataset; which is what you really need to do.
Sucking out huge datasets from Salesforce and then pushing it back is never going to be great, Talend or not.
I don't know what you're trying to do but it does seem unusual that you need to extract two large datasets from Salesforce only to push it back so maybe you'll need to rethink the architecture?
The documentation for the standard Salesforce API does say that it is for small data sets, although I have used to for large datasets with out issue - other than the long time it takes to get the data out.
Maybe if you can explain what the update is trying to achive and how the data is joined, there may be some suggestions of how this can be better achieved.