Five Stars

Optimize job performance - too slow

I would like to optimize the job attached to run faster. It currently takes 11-12mins to run.

Advanced Settings : Xms256M Xmx60240M 

 

 

I have additional transforms that start with the same input file that are also running this length of time. Would really appreciate some guidance on how to improve this job, i.e. is there a way to not have to read row by row and just pull information needed immediately into the job?

 

6 REPLIES
Twelve Stars TRF
Twelve Stars

Re: Optimize job performance - too slow

Hi,

Most of elapsed time comes from tFileInputDelimited for reading BillOfMaterial.tab file, and you read it twice for a total of 10 minutes.

You should try to add a tHashOutput after the 1st tFileInputDelimited and replace the 2nd tFileInputDelimited by the corresponding tHashInput.

Depending on the effective size of the input file, it should be significantly faster.


TRF
Five Stars

Re: Optimize job performance - too slow

i've tried to put the tFileInputDelimited into tHashOutput and it doesn't work. It stops processing after 4M records. 

 

I've also tried using the tBufferOutput with Dynamic Schema but I don't know how to convert it back to multiple field schema once I call it using the tBufferInput.

 

 

Twelve Stars TRF
Twelve Stars

Re: Optimize job performance - too slow

What if you have just a tFileInputDelimited in your job (just to confirm the required time to read is due to the file size, not to the operations achieved in the following tMap)?

If it runs for 5 minutes, you don't have lot of choices, else share your both tMap in case of (Map_Cols and tMap_9).


TRF
Five Stars

Re: Optimize job performance - too slow

When reading the file, it takes more than 5 mins without the operations.

I'm not sure why it would take so long to just read the file even with the record count. It's just a text file. I expect this file to grow considerably so would like to figure out a way to optimize.

 

Twelve Stars TRF
Twelve Stars

Re: Optimize job performance - too slow

Try with a smaller file (for example with 1,000,000 records) with the same schema and compare the throughput (rows/sec).
If it's really better than the actual throughput, consider the possibility to have a set of smaller files instead of a big one if you can (but in my opinion, you probably get the same result as text file are read sequentially and the size should not affect the performance).

TRF
Five Stars

Re: Optimize job performance - too slow

I made smaller files using 3M records per file as max.

design changed to process files... tFileList > InputDelimited > .. > .. > tBufferOutput

onsubjobok --> tBufferInput >>>> tHashInput

end of job is called using the tFileLIst to do the lookups in each file then append to the same file

 

It has reduced processing time to 7mins but that is still too high.