Performance impact of tMSSqlOutput Commit Every and Batch Size

Highlighted
Seven Stars

Performance impact of tMSSqlOutput Commit Every and Batch Size

Bottom-line:  For the tMSSqlOutput component, the default "Commit every" and "Batch Size" of 10k provide good performance.

 

I had reason to investigate performance given some issues I was experiencing.  I created a simple job that read from one table and inserted the data into another table in the same database.  The table contains 463,122 rows with a single, varchar(50) primary clustered key, 50 columns, and an average record size of 1k bytes.   The database was idle with no operations other than my testing.  The job uses the "Open source JTDS" provider, with each component having its own connection (built-in) and no "Field Options".

 

I tried several combinations of sizes for "Commit" and "Batch" to see how they impact performance.  This was not a scientific study where I ran hundreds of tests and reconciled the results.  This was a seat-of-the-pants look at general behavior.  The tests did not account for network or other potential through-put issues.

 

Based upon the results, I concluded that "Batch" is the number of rows to be "batched up" before sending them to the database and "Commit" is the number of rows to be sent to the database before sending a commit.  (For those coming from Datastage, these appear comparable to "Array size" and "Transaction size".)

 

For this testing configuration, I found that "Batch" size should be definitely enabled (horrendous performance without it) and should be kept the same size as "Commit".  Setting "Batch" smaller than "Commit" significantly reduced performance compared to when the same as "Commit".  Setting "Batch" larger than "Commit" also reduced performance though not as much.

 

Screen Shot 2018-03-22 at 11.00.19 AM.png

Moderator

Re: Performance impact of tMSSqlOutput Commit Every and Batch Size

Hello,

Thanks for sharing your best practices on forum.

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Seven Stars

Re: Performance impact of tMSSqlOutput Commit Every and Batch Size

use commit after the insertion process complete. this strategy will save huge amount of time.

Regards
Aashish
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch