ETL is slow between Oracle to Vertica

Four Stars

ETL is slow between Oracle to Vertica

Hi Everyone,

     I have an Oracle to Vertica job built in the Talend open source version.  I just have a tOracleInput --> tVerticaOutput. On the tOracleInput, I have the "use Cursor" checked with cursor size at 500,000. On the tVerticaOutput, I have the "use batchmode" selected and set to commit every 500,000 rows. Loading two million rows from Oracle to Vertica in 3 to 4 minutes. I think that is really long. If I dump the data from oracle into a csv file and then upload that file to vertica, it takes seconds. Does anyone have experience with this? Am I missing some option I'm not aware of?  Or is there a difference between Talend Open Source vs Talend Open Studios that would make it faster?  Thanks in advance. 


Accepted Solutions
Fifteen Stars TRF
Fifteen Stars

Re: ETL is slow between Oracle to Vertica

Hi,

 

Well, to confirm which part is slow (Oracle or Vertica), you can replace the tVerticaOutput by a simple tFileOutputDelimited.

Now, regarding the tVerticaOutput component, you have to change also the batchsize.

But you may also change to use tVerticaOutputBulkExec which is (as all t<DB name>OutputBulkExec components) dedicated to mass transfer.

 

Hope this helps.


TRF

All Replies
Fifteen Stars TRF
Fifteen Stars

Re: ETL is slow between Oracle to Vertica

Hi,

 

Well, to confirm which part is slow (Oracle or Vertica), you can replace the tVerticaOutput by a simple tFileOutputDelimited.

Now, regarding the tVerticaOutput component, you have to change also the batchsize.

But you may also change to use tVerticaOutputBulkExec which is (as all t<DB name>OutputBulkExec components) dedicated to mass transfer.

 

Hope this helps.


TRF
Four Stars

Re: ETL is slow between Oracle to Vertica

Hi, 

    Sorry for the delayed response. So I tried your solution and did notice an uptick in speed by a couple of seconds. So what I tested was to output oracle data to a delimited file and then use tVerticaOutputBulkexec to upload the file. This method was faster for large tables but for smaller tables, it didn't seem to matter. 

 

That said, i was trying to keep everything to be database to database instead of having to generate a file and then load that. 

 

In our vertica server, we installed a sqlplus add on that can pull data from oracle to vertica much faster than what talend can do. I was just curious if i was missing anything when i setup the job. Thanks 

Five Stars

Re: ETL is slow between Oracle to Vertica

Hi All,

I agree on this solution. It works well. Normally my environment row processing around ~100K rows/sec.

Oracle --> tMap --> Vertica(outputBulkExecute)

cursor=500K             staging with outfile, rejfile, and excfile.

 

Currently I'm looking for it possible to do partially commit using outputBulkExectue of Vertica.

Please advise.

 

15TH OCTOBER, COUNTY HALL, LONDON

Join us at the Community Lounge.

Register Now

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

APIs for Dummies

View this on-demand webinar about APIs....

Watch Now