"tOracleOutputBulk" and "tOracleOutputBulkExec"

One Star

"tOracleOutputBulk" and "tOracleOutputBulkExec"

I have been trying the two components "tOracleOutputBulk" and "tOracleOutputBulkExec" after a complex tMap and I found out that they allowed a dramatic performance increase over tOracleOutput. I wonder if there is any known (and accepted) backdraws to these components? (Other of course than to have to write temporary file on the HD during job process).

Other question is there an automatic way destroying the delimited file used? (I use a tFileExists and tFileDelete but I wonder if it could be done inside tOracleOutputBulkExec)

Thanks in advance,
Diane2
One Star

Re: "tOracleOutputBulk" and "tOracleOutputBulkExec"

Hello,

Sorry for diturbing your topic Diane.
Until now, I had not understood the interest of these components, and I hadn't try to use them.
You mean they permit to perform the same stuff as a tOracleOutput, but quickly, just because they create some temp files ? (how is it possible ? Shouldn't it be slower due to accesses to the hard disk ?)

Thank you,
Best regards,

Nicolas
One Star

Re: "tOracleOutputBulk" and "tOracleOutputBulkExec"

From what I have seen, they are really more performant. Fact is that I load datas from a database, map them with tMap and then write them in another database.
I guess writing a temp file on HD and using SQLLoader with these components save a lot of time compared to a row to row insert because it is really much more efficient. (I load in fact 8 tables from one DB and insert into 7 table from another DB, main table as "only" 400.000 rows but compared to a simple TOutput, I have a 10 to 20 quicker execution time).

That's all I know but I'm sure people from talend can explain this way better than I do.

Another remark: unlike what is specified in componant doc, one cannot use "Insert or Update" with this component. Is is a know bug or is there a technical ground for this?

Regards,
Diane
One Star

Re: "tOracleOutputBulk" and "tOracleOutputBulkExec"

Ok, thank you Diane Smiley Happy

Another remark: unlike what is specified in componant doc, one cannot use "Insert or Update" with this component. Is is a know bug or is there a technical ground for this?

I've just seen that. For the moment, it's the drawback of this component (I use this action most of the time).
Then, I think you only gain some performances with big - or huge - tables... Imagine your job has to be run daily, not sure you'll have thoudands of new rows to update or insert everyday... In this case, can't it be slower ? Smiley Wink
Then, why are we obliged to create a temp file ? Data cannot be stored in the RAM ? (Obviously, it would take much place in memory, but it could be an option).

Best regards,

Nicolas
One Star

Re: "tOracleOutputBulk" and "tOracleOutputBulkExec"

Diane2 wrote:
I have been trying the two components "tOracleOutputBulk" and "tOracleOutputBulkExec" after a complex tMap and I found out that they allowed a dramatic performance increase over tOracleOutput. I wonder if there is any known (and accepted) backdraws to these components? (Other of course than to have to write temporary file on the HD during job process).

Other question is there an automatic way destroying the delimited file used? (I use a tFileExists and tFileDelete but I wonder if it could be done inside tOracleOutputBulkExec)

Thanks in advance,
Diane2

Diane,

I am not seeing the performance increase you are. I continually benchmark loading 11M rows of identical data using 2 scenarios:

A. tOracleInput=>tOracleOutputBulk=>tOracleBulkExec (bulk out and then bulk load)

versus

B. tOracleInput=>tOracleOutput (conventional load)


In my tests, scenario B above (conventional load) always beats scenario A (bulk) by @25% faster


What I am seeing is that tOracleOutputBulk is extremely slow to output the data to disk, and then finally when tOracleBulkExec is executed, it is lightning fast, but, again, overall about 25% slower than scenario B

Dave
Employee

Re: "tOracleOutputBulk" and "tOracleOutputBulkExec"

David,

Instead of the tOracleOutputBulk; you can also use the tFileOutputDelimited to create the BULK file before calling the tOracleBulkExec. Normally you'll have the same performance. But it's interesting to know and compare that.

Your conclusion are pretty strange because the tOracleOutputBulk is just writing on the disk and disk writing is much faster than Database load.
Are you writing the Bulk file into another machine than the machine runing the Job ? Maybe the Network latency is a bottle neck in your case.

You should have better performance using BULK LOAD instead of CLASSICAL LOAD.

Best regards;
One Star

Re: "tOracleOutputBulk" and "tOracleOutputBulkExec"

cantoine wrote:
David,

Instead of the tOracleOutputBulk; you can also use the tFileOutputDelimited to create the BULK file before calling the tOracleBulkExec. Normally you'll have the same performance. But it's interesting to know and compare that.

Your conclusion are pretty strange because the tOracleOutputBulk is just writing on the disk and disk writing is much faster than Database load.
Are you writing the Bulk file into another machine than the machine runing the Job ? Maybe the Network latency is a bottle neck in your case.

You should have better performance using BULK LOAD instead of CLASSICAL LOAD.

Best regards;

I am outputing to disk first with tOracleOutputBulk prior to calling tOracleBulkExec. Also, the disk output is on the same computer that is running the job, so no network bottleneck to deal with. I am very surprised that for 11M rows (10 columns, 6 chachters in each column, no nulls), it takes 22 minutes to output to disk, and then less than 1 minute for tOracleBulkExec to load the 11M rows. Conventional load only takes 17 minutes. What this seems to tell me is that ONLY is a distributed environment in which flat files are preprocessed first, and then sent over (ftp. etc) to a job that then processes using tOracleBulkExec, would bulk be faster.

That said, I am running on a laptop, so maybe tOracleOutputBulk performs very poorly when run that way.

Dave
One Star

Re: "tOracleOutputBulk" and "tOracleOutputBulkExec"

dadumas wrote:
cantoine wrote:
David,

Instead of the tOracleOutputBulk; you can also use the tFileOutputDelimited to create the BULK file before calling the tOracleBulkExec. Normally you'll have the same performance. But it's interesting to know and compare that.

Your conclusion are pretty strange because the tOracleOutputBulk is just writing on the disk and disk writing is much faster than Database load.
Are you writing the Bulk file into another machine than the machine runing the Job ? Maybe the Network latency is a bottle neck in your case.

You should have better performance using BULK LOAD instead of CLASSICAL LOAD.

Best regards;

I am outputing to disk first with tOracleOutputBulk prior to calling tOracleBulkExec. Also, the disk output is on the same computer that is running the job, so no network bottleneck to deal with. I am very surprised that for 11M rows (10 columns, 6 chachters in each column, no nulls), it takes 22 minutes to output to disk, and then less than 1 minute for tOracleBulkExec to load the 11M rows. Conventional load only takes 17 minutes. What this seems to tell me is that ONLY is a distributed environment in which flat files are preprocessed first, and then sent over (ftp. etc) to a job that then processes using tOracleBulkExec, would bulk be faster.

That said, I am running on a laptop, so maybe tOracleOutputBulk performs very poorly when run that way.

Dave

I guess the answer here could be linked to heap space allowed to the process. Using sun's java tools could help you understand your problem.
One Star

Re: "tOracleOutputBulk" and "tOracleOutputBulkExec"

Diane2 wrote:
dadumas wrote:
cantoine wrote:
David,

Instead of the tOracleOutputBulk; you can also use the tFileOutputDelimited to create the BULK file before calling the tOracleBulkExec. Normally you'll have the same performance. But it's interesting to know and compare that.

Your conclusion are pretty strange because the tOracleOutputBulk is just writing on the disk and disk writing is much faster than Database load.
Are you writing the Bulk file into another machine than the machine runing the Job ? Maybe the Network latency is a bottle neck in your case.

You should have better performance using BULK LOAD instead of CLASSICAL LOAD.

Best regards;

I am outputing to disk first with tOracleOutputBulk prior to calling tOracleBulkExec. Also, the disk output is on the same computer that is running the job, so no network bottleneck to deal with. I am very surprised that for 11M rows (10 columns, 6 chachters in each column, no nulls), it takes 22 minutes to output to disk, and then less than 1 minute for tOracleBulkExec to load the 11M rows. Conventional load only takes 17 minutes. What this seems to tell me is that ONLY is a distributed environment in which flat files are preprocessed first, and then sent over (ftp. etc) to a job that then processes using tOracleBulkExec, would bulk be faster.

That said, I am running on a laptop, so maybe tOracleOutputBulk performs very poorly when run that way.

Dave

I guess the answer here could be linked to heap space allowed to the process. Using sun's java tools could help you understand your problem.

I think to be fair, I need to run this on a server, as logically, I would think that writing out to a file should be very fast.
Dave

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Have you checked out Talend’s 2019 Summer release yet?

Find out about Talend's 2019 Summer release

Blog

Talend Summer 2019 – What’s New?

Talend continues to revolutionize how businesses leverage speed and manage scale

Watch Now

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog