Bottom-line: For the tMSSqlOutput component, the default "Commit every" and "Batch Size" of 10k provide good performance.
I had reason to investigate performance given some issues I was experiencing. I created a simple job that read from one table and inserted the data into another table in the same database. The table contains 463,122 rows with a single, varchar(50) primary clustered key, 50 columns, and an average record size of 1k bytes. The database was idle with no operations other than my testing. The job uses the "Open source JTDS" provider, with each component having its own connection (built-in) and no "Field Options".
I tried several combinations of sizes for "Commit" and "Batch" to see how they impact performance. This was not a scientific study where I ran hundreds of tests and reconciled the results. This was a seat-of-the-pants look at general behavior. The tests did not account for network or other potential through-put issues.
Based upon the results, I concluded that "Batch" is the number of rows to be "batched up" before sending them to the database and "Commit" is the number of rows to be sent to the database before sending a commit. (For those coming from Datastage, these appear comparable to "Array size" and "Transaction size".)
For this testing configuration, I found that "Batch" size should be definitely enabled (horrendous performance without it) and should be kept the same size as "Commit". Setting "Batch" smaller than "Commit" significantly reduced performance compared to when the same as "Commit". Setting "Batch" larger than "Commit" also reduced performance though not as much.
Thanks for sharing your best practices on forum.
use commit after the insertion process complete. this strategy will save huge amount of time.
Watch the recorded webinar!
Introduction to Talend Open Studio for Data Integration.
Test drive Talend's enterprise products.
Practical steps to developing your data integration strategy.