tMap default innerjoin / user defined condition - execution time

One Star

tMap default innerjoin / user defined condition - execution time

Hi All,
I need to merge 2 tables on some condition, So i am using tMap and applying inner join. even for huge set of records, it works absolutely fine.
In my new requirement, I need to merge the tables on some user defined condition .
When I add some condition in tMap, there is lot of impact on execution time of job,
please find the attached screen shots and suggest me,
1. is there any flaw or is that's how tMap works.
2. on merge, I get only matched records in the output , is there any way to catch the rejected / unmatched records from both the tables.
ex: I am merging 'table A' and 'table B' based on some condition, on execution, i am getting only the matched records in the output, I also need to catch the records, not matched in table A and table B
Thanks
Chaya
One Star

Re: tMap default innerjoin / user defined condition - execution time

Hi Chaya
I can't find out any flaws from the image. All settings are decided by your job logic.
If 'table A' inner join with 'table B', you will get unmatched records of table A by selecting 'Catch lookup inner join reject' true.
If 'table B' inner join with 'table A', you will get unmatched records of table B as above.
Regards,
Pedro
One Star

Re: tMap default innerjoin / user defined condition - execution time

Hi Chaya,
According to the filter you are applying you are performing an inner join on the key of row4 (identification_number)
I would suggest you put row3.id_number back where it was in screen one, so on the left side of row4.identification_number, instead of in the filter. this should make Talend perform a true inner join, because I think it is not performing this inner join when there is no join condition and only a filter, which might slow down your process. You might even add id_type to the join, if my iterpretation of your filter is correct, but that's up to you.
As for question 2: Do as Pedro suggests: you can add an output row (create new or create join, I suppose you'll need "create join table from") and set it's options to "catch inner join rejects". This will make all unmatched row3 records go to this output. An outer join, to catch all records from all inputs is not possible unfortunately.
Hope this helps.
Regards,
Arno
One Star

Re: tMap default innerjoin / user defined condition - execution time

Thanks for the reply Pedro,
Wel,
catching rejected records is fine, i could achieve that. Thanks.
but my main concern was performance. I have a simple job,
with 2 input files and tMap (where in I have used defined expression) & tLog.
Why does is it takes more time, when we define conditions. If the same job , If u remove the condition & use normal inner join that works faster.. I am confused.. Please suggest me.
One Star

Re: tMap default innerjoin / user defined condition - execution time

Hi
The condition here will surely lead to more calculations and comparisons which affects performance.
I need to know whether the execution time is acceptable.
How many rows in both of these two files?
Regards,
Pedro
One Star

Re: tMap default innerjoin / user defined condition - execution time

Hey Pedro,
In the process of working on the above concern, we noticed that, in the user defined condition if I have only && (AND) operations, that works fine, in case if I have || (OR) operations, only those comparisons are taking more time.
I have around 5,00,000 records on 1 side & the other end I have around 3,50,000 records
The above comparison is taking around 3-4 Hrs .. & some times.. its just hangs in between..

Thanks
Chaya