One Star

Performance tMap lookup vs tJoin

Hi, can someone help?
In Ab Initio, when the main stream is big and the lookup stream is very small, somehow it is 1:20. Lookup will get better performance.
when the main stream and the lookup stream is the same or similiar size, use Join will get better performance.
Will it be the same creteria in Talend between tJoin and tMap?
Much appreciated.
6 REPLIES
Moderator

Re: Performance tMap lookup vs tJoin

Hi,
No, you can use tMap all the time.
If you have the MPx Edition, you can also go for tFileScaleJoin
Benjamin
One Star

Re: Performance tMap lookup vs tJoin

Thanks for your informaiton. But how is the performance when the income and lookup string have similiar size?
For example 10 millison records in income string and 10 m records in lookup string
One Star

Re: Performance tMap lookup vs tJoin

I have the same issue. My lookup is ~800K rows and input is 1.8M. Talend basically chokes up on this, never completes and runs out of memory.
I use tMap. I enabled Store temp data on disk, set Xmx 4GB to no avail. Talend runs out of heap memory or GC runs out of memory.
I am kind of out of options.
Any suggestions, Talend folks?
Peter.
One Star

Re: Performance tMap lookup vs tJoin

Hi Guys,
If you've tried this already then please ignore it, but in my experience with other DI tools we always preprocessed large lookup files so that they contained only the data we required for the downstream process, thus making more effective use of the memory available and not having to store data to disk. In some cases by examining the two files and what was needed from each we actually made the file with more rows the lookup, because we needed much less data from it. It may be that you need all the data from both the files but it's always worth checking when you are dealing with large volumes.
That said, there do seem to be a number of examples including this one, where Talend could benefit from a serial(sorted) joiner component/method as well as a lookup method!
Regards,
Rick
Moderator

Re: Performance tMap lookup vs tJoin

Hi,
in order not to have out of memory, you can:
With talend open studio
-(with database as source) activate stream/cursor mode on tdbinput
-load in lookup only used columns
-filter lookup rows before going into the tmap
-activate store on disk option on the tmap
-use ELT mode to make the database perform the join
-use a 64bits jvm and allocate more than 4go
With talend integration suite MPx edition (parallelelisation version)
-use tFileScaleJoin which performs a parallel sorted join/lookup (equivalent to parallel join in abinitio)
-use hadoop patterns to perform a distributed join/lookup
benjamin
One Star

Re: Performance tMap lookup vs tJoin

Thanks a lot guys