I have used parallelization components in a talend job. I am confused how this internally works.
Here three threads are created and passed to the tSortRow component. I am partitioning on the basis of one column(lets say A) and sorting on the basis on another column(lets say B). How is the sorting done by tsortrow component. is tSortRow component combining all rows received from all threads and then doing sorting. Once sorting is done , how is the sorted data again passed as three threads to departitioner?
The talend help site has given only an overview of the different parallelization components.
I want to go a little detailed(details i asked in the first place) so that I can use these components in my job for performance increase.
I believe that this will shed some light on the matter ..... https://help.talend.com/display/KB/How+to+automatically+enable+parallelization+of+data+flows+for+bet...
As a rule of thumb if you are concerned about efficiency with this then you want to look at the number of threads you will be using. This depends on the number of cores your machine has. If you are on a 4 core machine, use a max of 3 threads (I believe this is explained in the link).
OK, just realised I answered a different question. I guess this *may* be of some use, hence I have left it.
However, I have to ask why you are trying to make this more efficient by getting Java to sort your data? Do the sorting in the DB. Or is this just an experiment?
Watch the recorded webinar!
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.
Test drive Talend's enterprise products.