I wanted to check, if I want to develop a job which enriched with parallelization. Suppose, job contains, sort, tMap used for join 1,2 files/table, other component. So I wanted to understand, how this will work?
what is best way to build the job, so performance will increase and in addition, output with and without parallel is same.
@mailforsaggy,to improve the performence,i will suggest you to store on disk option in tMap(Basic settings) and tSortRow.(Advanced Settings)
@manodwhb, I can use store on disc option when data is huge or not fit in to the memory.
My question is different. My question is about implementation of parallelization. and How it work when we have component sort, tMap and etc in the job. what is run time behavior when job set to run in parallel consisting of these component.
@mailforsaggy,did you checked below link?
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema
Pick up some tips and tricks with Context Variables