I am working on optimizing the performance of a job. Please could you let know
Does adding more components in a job increase the execution time??? (server used: Git server).
There are various factors can degrade a performance of a job
1. Data volume
2. Network Latency
3. Poor configuration
4. Poor Coding ( here comes more components, but it may not be a reason always, sometimes)
Please share a screen shot of what are you trying to achieve, that will help us identify what's going on.
Thanks very much for the reply.
I am doing a whole lot of tasks in the job like extracting xml from amazon s3 bucket and loading them into (after transformation) the tables.( using 4 child jobs~)
And emptying the s3 buckets after the load.
To make my understanding clear you are extracting the data from Amazon s3 alone.
(meaning not joining the data from anywhere else)
Loading into which database? are you performing any transformations at this stage?
Log the time in a file at each stage with stage name and find out which stage takes time. we can start tuning from there.
All the jobs screen shot can help for better understanding.
Yes extracting from s3 alone. ==> extracting the fields from the xml ====> and loading into Oracle db tables.
Okay I will try running stage by stage as you have suggested. Is there anything else we need to take care or avoid to improve the execution time?
Sorry Gatha,at this moment i will not be able to send screenshots due to security issues in here.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Part 2 of a series on Context Variables
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema