I wanted to check, if I want to develop a job which enriched with parallelization. Suppose, job contains, sort, tMap used for join 1,2 files/table, other component. So I wanted to understand, how this will work?
what is best way to build the job, so performance will increase and in addition, output with and without parallel is same.
@mailforsaggy,to improve the performence,i will suggest you to store on disk option in tMap(Basic settings) and tSortRow.(Advanced Settings)
@manodwhb, I can use store on disc option when data is huge or not fit in to the memory.
My question is different. My question is about implementation of parallelization. and How it work when we have component sort, tMap and etc in the job. what is run time behavior when job set to run in parallel consisting of these component.
@mailforsaggy,did you checked below link?
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.