Seven Stars

Talend Job taking too much time for 30 million records

Hi,

I am executing a simple talend Job to import 30 million records from csv  file to MSSQL database. While we are doing it through Java Spring Batch jobs. It's taking around 4 hours. But When we are doing it through Talend Enterprize version on the same server, its taking around 9 hours.

 

We have tried almost each and every option - Batch Size, Parallelization, increasing Intial and maximum heap size, but in vane. The structure of the job is below.

 

tFileList --> tFileInputDelimited --> tMap --> tMSSQLOutput

 

The biggest priority of the organization while purchasing the license was to decrease the loading time.

Could someone please suggest, how we can improve the performance.

 

Best Regards,

Abhishek

1 ACCEPTED SOLUTION

Accepted Solutions
Fifteen Stars

Re: Talend Job taking too much time for 30 million records

I've not experienced poor performance like this when using MSSQL components, but I have seen posts from people who have had issues with the latest versions of Talend. Apparently adding "sendStringParametersAsUnicode=false" to the Advanced Settings of your component may help with performance. You will need to tweak the Batch Size and Commit Every options to fine tune.

 

Parallel execution will also help, but be sure to set your parallel processes to a max of n-1 (where n is the number of cores for you system).

 

You may also want to use the tParallelize component to enable you to processes more than one file at a time. But to do that you will need to add a mechanism to stop the same file from being read from twice concurrently. That's not too difficult, but you will need to implement this. A static routine to hold the file names and use only once will do. But doing this WITH the parallel execution on the db component will need to be balanced to make it an optimal. If you have 8 cores and try to read 4 files concurrently using the tParallelize component, you may want to have 3 db components set to 2 parallel executions and leave 1 with just 1 to reserve a core for controlling the process. However that is just a rule of thumb. Have a play and see what you can get.

 

If you implement all (or a combination of the above), you should see performance improvements.

Rilhia Solutions
7 REPLIES
Fifteen Stars

Re: Talend Job taking too much time for 30 million records

I've not experienced poor performance like this when using MSSQL components, but I have seen posts from people who have had issues with the latest versions of Talend. Apparently adding "sendStringParametersAsUnicode=false" to the Advanced Settings of your component may help with performance. You will need to tweak the Batch Size and Commit Every options to fine tune.

 

Parallel execution will also help, but be sure to set your parallel processes to a max of n-1 (where n is the number of cores for you system).

 

You may also want to use the tParallelize component to enable you to processes more than one file at a time. But to do that you will need to add a mechanism to stop the same file from being read from twice concurrently. That's not too difficult, but you will need to implement this. A static routine to hold the file names and use only once will do. But doing this WITH the parallel execution on the db component will need to be balanced to make it an optimal. If you have 8 cores and try to read 4 files concurrently using the tParallelize component, you may want to have 3 db components set to 2 parallel executions and leave 1 with just 1 to reserve a core for controlling the process. However that is just a rule of thumb. Have a play and see what you can get.

 

If you implement all (or a combination of the above), you should see performance improvements.

Rilhia Solutions
Seven Stars

Re: Talend Job taking too much time for 30 million records

Thank you, Rhall.
Let me try all the options that you have suggested.


Best Regards,
Abhishek
Five Stars

Re: Talend Job taking too much time for 30 million records

Will it Improve the performance when we set sendStringParametersAsUnicode to false?

Seven Stars

Re: Talend Job taking too much time for 30 million records

Hi Rhall,
Also, I am using "Open source JTDS" in the JDBC provider. Should I use "Microsoft". I mean does it have any impact on the job performance.

Best Regards,
Abhishek


Fifteen Stars

Re: Talend Job taking too much time for 30 million records

Have you tried using the Microsoft one? I'm afraid I do not have a MS db to try this on at the moment

Rilhia Solutions
Seven Stars

Re: Talend Job taking too much time for 30 million records

Hi Rhall,
I have implemented the points as suggested by you. I am able improve the performance. 90 Lakh records we are able to migrate in 20 minutes.

Best Regards,
Abhishek
Fifteen Stars

Re: Talend Job taking too much time for 30 million records

That is some improvement. Glad it worked :-)

Rilhia Solutions