One Star

[resolved] parallelization In talend

Hi,
I tested above test case(Reading from excel Sorting and writing into file mentioned in 
https://help.talend.com/search/all?query=How+to+automatically+enable+parallelization+of+data+flows+f... article ) and results are as follows,
My Configuration is :
i3 processor (4 logical cores)
4GB RAM
<
   
Test DetailsTime taken Single Thread(sec)Time taken 3 - Thread (Sec)Time taken 2 - Thread (Sec)Rows
Reading from excel and writing to db2529 26190853 Rows
Reading from excel and writing to File315 5190853 Rows
Reading from excel and writing to db585959381706 Rows
Reading from excel Sorting and writing into file168121381706 Rows
Reading from excel Sorting and writing into file898190853 Rows
 
and it seems by enabling parallelization Jobs are actually getting slower. Then what is use of  parallelization ???
 
please explain.
 
Thanks,
Pankaj
11 REPLIES
Moderator

Re: [resolved] parallelization In talend

Hi napsterpp,
Have you already subscribed to one of the Talend Platform solutions?
Could you please report a ticket on Talend Support Portal so that we can give you a remote assistance to see if there is something wrong with your setting in current work flow. https://support.talend.com/otrs/customer.pl
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [resolved] parallelization In talend

Hi,
I am evaluating trail version provided by your sales team as part of POC.
Talend Platform for Data Management : 5.6.2
So,Could you help me out.
Thanks,
Pankaj
One Star

Re: [resolved] parallelization In talend

My Job Design:



Community Manager

Re: [resolved] parallelization In talend

Hi Pankaj,
Sabrina will look into this but also let me the Customer Success Management team on the topic (as we may need to look into your configuration more in depth).
Elisa
One Star

Re: [resolved] parallelization In talend

Hi esabot,
Thanks you for your reply.
My System Configuration is
i3 processor (4 logical cores)
4GB RAM .
and I provided JOB design.
What more information do you need.
Regards,
Pankaj
One Star

Re: [resolved] parallelization In talend

Hi,
Finally I was able to get desired results. all I have to do is disable departition row.
I skipped below steps. (I disabled them manually)


Departitioning (Recollecting (

Thanks,
Pankaj
One Star

Re: [resolved] parallelization In talend

Hi, I am trying to solve an performance issue around sorting huge file(50 Million record) to be sorted on Integer column+Alpha column(file has 6 columns). tSort takes around 30 mins with enabling sort on disk .
I am using TOS 5.6.2 and evaluating this sort for my POC . Please advise and the job design provided on this discussion has parallelision tab which I dont have to try it out.
Moderator

Re: [resolved] parallelization In talend

Hi RRaj
The parallel executions feature is not available in Talend Open Source. What's the rate of your job?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [resolved] parallelization In talend

Thanks for quick response. can u brief on rate on what exactly you are looking for .
basically my regular tsort enabling tsort on disk in expensive taking 30 mins to process 26 Million records, so wanted to check if there are any perf tuning measures within TOS can be leveraged
Moderator

Re: [resolved] parallelization In talend

Hi RRaj,
The rows/s(rate) is a row rate which means the processing rows in second.
Did you only use tSort component in your job? What's your target output? DB or flat file? Could you please show us your job design?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [resolved] parallelization In talend

Hi,
I have created a job to process a large cvs file (around 5 million records) and file size will keep on increasing on daily basis. I have 3 sub jobs those are processing this file to get the required information. There are different components used in these jobs like tFileInputDelimeted, tSortRow, tMap, tUniqueRow, tFileOutputDelimeted etc. Please see below images for one of my 3 sub jobs.
 I am using 'Sort on disk' option with buffer size 100,000 for tSort and 'Sort temp data' for tMaps with buffer size 100.000 to handle memory issues.
My problem is that this talend job taking too much time to process this file. Currently it is taking around 20 minutes and as the file size is increasing on daily basis so process time will also increase. I just want to learn how we deal such scenarios while creating talend jobs.
 
Thanks,
Shakeel