Hi, can someone help? In Ab Initio, when the main stream is big and the lookup stream is very small, somehow it is 1:20. Lookup will get better performance. when the main stream and the lookup stream is the same or similiar size, use Join will get better performance. Will it be the same creteria in Talend between tJoin and tMap? Much appreciated.
I have the same issue. My lookup is ~800K rows and input is 1.8M. Talend basically chokes up on this, never completes and runs out of memory. I use tMap. I enabled Store temp data on disk, set Xmx 4GB to no avail. Talend runs out of heap memory or GC runs out of memory. I am kind of out of options. Any suggestions, Talend folks? Peter.
Hi Guys, If you've tried this already then please ignore it, but in my experience with other DI tools we always preprocessed large lookup files so that they contained only the data we required for the downstream process, thus making more effective use of the memory available and not having to store data to disk. In some cases by examining the two files and what was needed from each we actually made the file with more rows the lookup, because we needed much less data from it. It may be that you need all the data from both the files but it's always worth checking when you are dealing with large volumes. That said, there do seem to be a number of examples including this one, where Talend could benefit from a serial(sorted) joiner component/method as well as a lookup method! Regards, Rick
Hi, in order not to have out of memory, you can: With talend open studio -(with database as source) activate stream/cursor mode on tdbinput -load in lookup only used columns -filter lookup rows before going into the tmap -activate store on disk option on the tmap -use ELT mode to make the database perform the join -use a 64bits jvm and allocate more than 4go With talend integration suite MPx edition (parallelelisation version) -use tFileScaleJoin which performs a parallel sorted join/lookup (equivalent to parallel join in abinitio) -use hadoop patterns to perform a distributed join/lookup benjamin