Four Stars

ETL is slow between Oracle to Vertica

Hi Everyone,

     I have an Oracle to Vertica job built in the Talend open source version.  I just have a tOracleInput --> tVerticaOutput. On the tOracleInput, I have the "use Cursor" checked with cursor size at 500,000. On the tVerticaOutput, I have the "use batchmode" selected and set to commit every 500,000 rows. Loading two million rows from Oracle to Vertica in 3 to 4 minutes. I think that is really long. If I dump the data from oracle into a csv file and then upload that file to vertica, it takes seconds. Does anyone have experience with this? Am I missing some option I'm not aware of?  Or is there a difference between Talend Open Source vs Talend Open Studios that would make it faster?  Thanks in advance. 

  • Data Integration
1 ACCEPTED SOLUTION

Accepted Solutions
Seven Stars TRF
Seven Stars

Re: ETL is slow between Oracle to Vertica

Hi,

 

Well, to confirm which part is slow (Oracle or Vertica), you can replace the tVerticaOutput by a simple tFileOutputDelimited.

Now, regarding the tVerticaOutput component, you have to change also the batchsize.

But you may also change to use tVerticaOutputBulkExec which is (as all t<DB name>OutputBulkExec components) dedicated to mass transfer.

 

Hope this helps.


TRF
2 REPLIES
Seven Stars TRF
Seven Stars

Re: ETL is slow between Oracle to Vertica

Hi,

 

Well, to confirm which part is slow (Oracle or Vertica), you can replace the tVerticaOutput by a simple tFileOutputDelimited.

Now, regarding the tVerticaOutput component, you have to change also the batchsize.

But you may also change to use tVerticaOutputBulkExec which is (as all t<DB name>OutputBulkExec components) dedicated to mass transfer.

 

Hope this helps.


TRF
Four Stars

Re: ETL is slow between Oracle to Vertica

Hi, 

    Sorry for the delayed response. So I tried your solution and did notice an uptick in speed by a couple of seconds. So what I tested was to output oracle data to a delimited file and then use tVerticaOutputBulkexec to upload the file. This method was faster for large tables but for smaller tables, it didn't seem to matter. 

 

That said, i was trying to keep everything to be database to database instead of having to generate a file and then load that. 

 

In our vertica server, we installed a sqlplus add on that can pull data from oracle to vertica much faster than what talend can do. I was just curious if i was missing anything when i setup the job. Thanks