[resolved] Help with Tmap Lookup table optimization

I have a job that fetches data from a table in batches for example if a batch is of 1000 Records. I have a Table A which will fetch all the 1000 records this is then joined to Tables B , C , D , E , F , G , H and more (30+ tables) using TMap. I would like to filter the lookup tables such that data relevant to this batch job are fetched from the database when the job is run. Currently when I join using Tmap it fetches all the records from the lookup tables which is highly inefficient when dealing with millions of records in each lookup table joined.
Is there some way I can store the ID’s in tHashmapInput and pass it as a parameter to each of the lookup tables and then using the IN clause of the SQL query I can fetch only those records that are relevant to the current batch?
The only solution that I can think of right now is to first store all the ID’s fetched in a batch to a temporary table and join the temp table in my query for the input tables.
Any suggestions how this can be done in a better way in Talend Studio will be much appreciated.
I have already configured all my lookup tables to store temp data.
Talend Version: 5.6.1
Build id: 20141207_1530
3 REPLIES
One Star

Re: [resolved] Help with Tmap Lookup table optimization

Your idea of creating a temporary table and joining is how I would do it.  Is there a reason you do not like this design?  Very large IN () clauses can cause troubles of their own. 

Re: [resolved] Help with Tmap Lookup table optimization

For a bigger batch size of say 100000 records it takes about 70-75 secs to commit before the job starts hence I was looking for a better way to do this to speed up the overall job time.
Employee

Re: [resolved] Help with Tmap Lookup table optimization

It sounds like the lookup and primary table are all on the same database.  Is doing the transformation on the database an option?  Have you considered using one of the tELTmap components?