What are Performance Tuning steps one should follow?

Hello All,
When dealing with huge data sets, What are the performance improvements steps you follow when developing Talend DI Job?
Need your expert inputs.
3 REPLIES
Moderator

Re: What are Performance Tuning steps one should follow?

Hi,
I think it depends on job design.
1. If there is t<DB>output in job design, we usually suggested use "commit every" option to improve job performance.
2. If you DB support for bulk mode, talend also provide bulk component for user.
3. sql query is faster than tMap, because sql query is executed on DB engine(If your job refer to main, look up on huge data). In addition, ELT component is also a good option.
4. "use connection option" in t<DB>connection affect the job performance, for the batch mode is no longer activated and without this option performance significantly drops.
Hope it will be useful.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Seventeen Stars

Re: What are Performance Tuning steps one should follow?

Hi Sabrina,
your first hint is - on my opinion - totally wrong. Committing every single dataset is a performance pitfall!
I would also suggest to use the Cursor size greater then 1000 (or switch on the Streaming mode for MySQL) in input components.
The hint with joins in the database is also not a good rule of thumb. It depends on how large is the main table and how large is the lookup table. I would always recommend to do the lookup within them tMap is the amount of lookup datasets is not larger than 5 Mio datasets (per lookup). Joining huge tables with (even smaller) tables in the database can cause various problems like overrunning the shared memory area or overload the temporary tablespaces and so on.
ELT is a good option but has the disadvantage like all other database internal procedures: you never see any progress and therefore it feels much slower than doing that in a ETL mode (probably a bit slower) but with the option to be able to calculate the duration time. Mostly it is not the goal to get the last percent of performance, mostly it is more important to be able to calculate the needed duration time.
For huge datasets it is always a good thing to separate the datasets into smaller groups (like a partitioning key) and proceed the groups in parallel jobs.
One Star

Re: What are Performance Tuning steps one should follow?

Really it is Great TOPIC to discuss
-- 1. If there is t<DB>output in job design, we usually suggested use "commit every" option to improve job performance
Not sure what was exactly meant by that
By default it is 10000
To improve performance in Oracle (in Amazon)
I put to it smaller things like 50,100 for 2 reasons:
1) Transaction was too big if 10000 - and it affect performance badly - really it was NOT able to work at all
2) When 50 - I can see progress by queries like Select count(*) - to see speed
One more thing - let us say from 10 000 entries one is bad
if 10 000 - then whole transaction will fail
if 50 - then latest portion failed (previous will be saved)
It depends - I prefered 2-nd option.
ps I would read some best practise on performance
if there are articles - pls give references to them