Performance loading from source db to target db is very slow. Our source db is on premise and target db on Azure. Loading three tables and all of them total around 25,000 records and it takes 5 minutes to load. No transformation is done, only straight load. My job uses tMSSqlConnection to connect and auto commit option is selected. Is there any options to set to increase the speed?
You can execute them in parallelize (If you are using enterprize edition) else use the Multi Thread option in Job tab.
Along with that set the Batch size to 10k and commit size as well to 10k.
Also the number of processes you can put in tParallelize should be less than the number of cores you are using on your server.
Let me know if that helps.
@abhishek mentions some good things to check/change, but there is also a known issue with the MSSqlServer components (depending on the version you are using). There is an easy workaround though. All you need to do is to add....
....to the "Additional JDBC Parameters" in the "Advanced" tab of your Insert or Update component. You should see an improvement from that.
You should also keep in mind that your performance will suffer a bit by the fact that you data is moving from on-premise to the cloud.....although nowhere near to the point where you are getting only 250 rows a second, unless you have a VERY slow internet connection.
I don't see any option under Advanced settings in the tmssqlinput or tmssqloutput to set
. Also, where do you set the # of how many rows to commit?
Sorry, I didn't take in that you are using a connection component. The advanced settings will be there when you use a connection component.
Would you mind posting your current work flow screenshots on forum which will get more visibility and more help?
Look in the Basic Settings. For some reason the "Additional JDBC Parameters" are on the Advanced tab for the Input and Output components, but on the Basic tab for the Connection components.
I've added this string and still my performance didn't improve. I'm only loading around 6,000 records and no transformation at all. Use Batch size is set to 10000
You are showing two screenshots. The first is showing over 10000 rows a second. The second is much slower. Are you saying that 10000 rows/second is not fast enough? Also what is the difference between the two?
First screen shot value in blue shows the speed it is reading from the source. 2nd screen with green value shows the speed it takes to load data to the database. It is taking too long to upload the data to the destination. Anything I can do to improve this?
I have a similar problem. I am trying to move data from one server to another server and i am using Talend Open Studio. there are 82M records in source(data is in views) and everything should be moved to destination(should be moved to Tables with no primary keys and indexes). I have the set the 'batch size' and 'commit every' to 20000. There are close to 80 columns in source and same goes with Destination. The rate of Transfer is just 1000 rows/s. is there any way that i can improve the transfer rate in the job. could someone please help me with this. i have set the JVM's for 4096M.
I have tried to use bulk components, but since the SQL server and studio both are in different machine, it is not very helpful for me.
Please find the attached screenshots. Any help to move the data faster would be helpful.
Did you calculate the block size you are trying to upload:
1000r/s * ( 80cols : totalSum(bytesize per colum)) = x MB
Maybe your upload speed is the performance killer in this one, wifi (20Mbit) or utp cable Gbit?
Create a backup of the source table (it will create an insert into record) and next upload this file first to your new server, and then process this file. You will get the maximum performance.
How busy is your target DB? if there're many read/writes by other processes and maybe a transaction replication configured, suggest search for a time/window where the amount of concurrent processes are minimum.
Other wise sit back and relax, its just a backup, doing it once you will never do it again.
As an experiment can you remove your target connection component, configure the connection details in your output component, play around with the "Commit every" settings, set "Use Batch Size" and play around with that number. Add the String I told you about before in the Advanced settings of this component. Also, post your config if you cannot see an improvement.
@Dijke also makes a good point, are you sure that this limitation is not being caused by the server you are writing to OR the bandwidth between Talend and your target server?
I'm using tmssqlconnection and only option it gives me is auto commit. Even in tmssqloutput, I don't have an option commit every. How do I set # of rows to commit? Is auto commit, commits every row?
You need to remove the connection component and untick "Use an existing connection" in your output component. The connection options change after that. Look in both the Basic settings and the Advanced settings
I didnt calculate the block size. I am connected to a wifi at my work and the DB isnt that busy, it is only me who is performing ETL operations to the DB.
Also, Can you tell me what do you mean by creating the backup of source table? i mean i am not catching your point.
I did Add the String you told about before in the Advanced settings of this component and it didnt quite improve the performance. just to add, i am using talend on machine which is only 7GB ram and when i checked the CPU utilization, it is almost full. just speculating if that is the issue.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.