How to increase performance?

Five Stars

How to increase performance?

Performance loading from source db to target db is very slow. Our source db is on premise and target db on Azure.  Loading three tables and all of them total around 25,000 records and it takes 5 minutes to load. No transformation is done, only straight load.  My job uses tMSSqlConnection to connect and auto commit option is selected.  Is there any options to set to increase the speed?

Seven Stars

Re: How to increase performance?

Hi,

You can execute them in parallelize (If you are using enterprize edition) else use the Multi Thread option in Job tab.

Along with that set the Batch size to 10k and commit size as well to 10k. 

Also the number of processes you can put in tParallelize should be less than the number of cores you are using on your server.

 

Let me know if that helps.

 

 

Best Regards,

Abhishek

 

Community Manager

Re: How to increase performance?

@abhishek mentions some good things to check/change, but there is also a known issue with the MSSqlServer components (depending on the version you are using). There is an easy workaround though. All you need to do is to add....

 

 "sendStringParametersAsUnicode=false"

....to the "Additional JDBC Parameters" in the "Advanced" tab of your Insert or Update component. You should see an improvement from that. 

 

You should also keep in mind that your performance will suffer a bit by the fact that you data is moving from on-premise to the cloud.....although nowhere near to the point where you are getting only 250 rows a second, unless you have a VERY slow internet connection. 

Five Stars

Re: How to increase performance?

I don't see any option under Advanced settings in the tmssqlinput or tmssqloutput to set

"sendStringParametersAsUnicode=false"

.  Also,  where do you set the # of how many rows to commit?

 

Capture3.PNG

Community Manager

Re: How to increase performance?

Sorry, I didn't take in that you are using a connection component. The advanced settings will be there when you use a connection component.

Ten Stars

Re: How to increase performance?

Do you have a (unique/primary) key and have UPDATE or INSERT or the other way around?
Update/insert is the performance killer.

Suggestion, load from premises into tmp_tbl and run a t-sql script to load this tmp table into target tbl. Destroy the tmp table on success.
Moderator

Re: How to increase performance?

Hello,

Would you mind posting your current work flow screenshots on forum which will get more visibility and more help?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: How to increase performance?

I don't see any option in tMssqlconnection under advanced settings to add sendStringParametersAsUnicode=false.Capture3.PNG

Highlighted
Community Manager

Re: How to increase performance?

Look in the Basic Settings. For some reason the "Additional JDBC Parameters" are on the Advanced tab for the Input and Output components, but on the Basic tab for the Connection components.

Five Stars

Re: How to increase performance?

I've added this string and still my performance didn't improve.  I'm only loading around 6,000 records and no transformation at all.  Use Batch size is set to 10000

 

Capture.PNGCapture2.PNG

Community Manager

Re: How to increase performance?

You are showing two screenshots. The first is showing over 10000 rows a second. The second is much slower. Are you saying that 10000 rows/second is not fast enough? Also what is the difference between the two?

Five Stars

Re: How to increase performance?

First screen shot value in blue shows the speed it is reading from the source.  2nd screen with green value shows the speed it takes to load data to the database.  It is taking too long to upload the data to the destination.  Anything I can do to improve this?

Four Stars

Re: How to increase performance?

Hi,

 

I have a similar problem. I am trying to move data from one server to another server and i am using Talend Open Studio. there are 82M records in source(data is in views) and everything should be moved to destination(should be moved to Tables with no primary keys and indexes). I have the set the 'batch size' and 'commit every' to 20000. There are close to 80 columns in source and same goes with Destination. The rate of Transfer is just 1000 rows/s. is there any way that i can improve the transfer rate in the job. could someone please help me with this. i have set the JVM's for 4096M.

I have tried to use bulk components, but since the SQL server and studio both are in different machine, it is not very helpful for me.

Please find the attached screenshots. Any help to move the data faster would be helpful.

Ten Stars

Re: How to increase performance?

Did you calculate the block size you are trying to upload:
1000r/s * (  80cols : totalSum(bytesize per colum)) = x MB
Maybe your upload speed is the performance killer in this one, wifi (20Mbit) or utp cable Gbit?

My experience:
Create a backup of the source table (it will create an insert into record) and next upload this file first to your new server, and then process this file. You will get the maximum performance.

 

How busy is your target DB? if there're many read/writes by other processes and maybe a transaction replication configured, suggest search for a time/window where the amount of concurrent processes are minimum.


Other wise sit back and relax, its just a backup, doing it once you will never do it again.

Community Manager

Re: How to increase performance?

As an experiment can you remove your target connection component, configure the connection details in your output component, play around with the "Commit every" settings, set "Use Batch Size" and play around with that number. Add the String I told you about before in the Advanced settings of this component. Also, post your config if you cannot see an improvement.

 

@Dijke also makes a good point, are you sure that this limitation is not being caused by the server you are writing to OR the bandwidth between Talend and your target server?

Five Stars

Re: How to increase performance?

I'm using tmssqlconnection and only option it gives me is auto commit.  Even in tmssqloutput, I don't have an option commit every.  How do I set # of rows to commit?  Is auto commit, commits every row?

 

Thanks

Community Manager

Re: How to increase performance?

You need to remove the connection component and untick "Use an existing connection" in your output component. The connection options change after that. Look in both the Basic settings and the Advanced settings

Four Stars

Re: How to increase performance?

Di Dijke,

I didnt calculate the block size. I am connected to a wifi at my work and the DB isnt that busy, it is only me who is performing ETL operations to the DB.

 

Also, Can you tell me what do you mean by creating the backup of source table? i mean i am not catching your point.

 

Thanks

Rahul.

Four Stars

Re: How to increase performance?

@rhall_2_0,

I did Add the String you told about before in the Advanced settings of this component and it didnt quite improve the performance. just to add, i am using talend on machine which is only 7GB ram and when i checked the CPU utilization, it is almost full. just speculating if that is the issue. 

 

Thanks

Rahul.

Ten Stars

Re: How to increase performance?

I dont know your wifi speed, but lets say 80Mbit/s = 9 MegaBytes = 9 000 000 Bytes
How many columns? summarize the size... say between 100 - 300 Bytes per record (avg record) = MAX 100.000 Record /s when it are only inserts and without calculating overhead and other stuff.
But if you want to insert/update... it will do a lookup ... and that's the performance serial killer.

Search for your (input) database how to backup tables, or how to migrate. High chance it will generate insert into statements. Upload this file first to your target database if possible using a migration tool... search for it on google and you will find better instructions.
Six Stars

Re: How to increase performance?

Hi, was this eventually resolved or not?

 

I'm facing the same issue - reading from salesforce connection and inserting in a test table on Azure SQL Database takes too long. Compared to the same action on a PostgreSQL environment, it is 100 times slower. Azure SQL is a standard 100 DTU environment. Postgres is free RDS on AWS. The Azure processing takes a total of 529 seconds for 50K records

PastedGraphic-1.png

 

 

The postgres variant doing exactly the same, only takes 49seconds. Go figure ..

PastedGraphic-2.png

 

 

Any Azure / Talend experts that can recommend some improvements? 

 

The process start with SF connection and DB connection, then uses TDBrow to truncate the target table (using existing connection), calls the SF api (Bulk Query mode) and transfers the data for insertion into the target tabel. Target table uses Parallel 4, Data action = Insert, table is always empty to start with, no keys required, batch size = 10000

 

Post-job does the commit and close of connection. 

 

 
Four Stars

Re: How to increase performance?


@rp2018 wrote:

Performance loading from source db to target db is very slow. Our source db is on premise and target db on Azure.  Loading three tables and all of them total around 25,000 records and it takes 5 minutes to load. No transformation is done, only straight load.  My job uses tMSSqlConnection to connect and auto commit option is selected.  Is there any options to set to increase the speed?


Hello everyone,,

Do you have a (unique/primary) key and have UPDATE or INSERT or the other way around?
Update/insert is the performance killer.

Suggestion, load from premises into tmp_tbl and run a t-sql script to load this tmp table into target tbl. Destroy the tmp table on success..

Five Stars

Re: How to increase performance?

After we upgraded the Azure database to Premium (P1)  the performance drastically improved.  

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch