Performance issue reading from Oracle and writing to Netezza

One Star

Performance issue reading from Oracle and writing to Netezza

I have a simple job reading from an Oracle table and writing into a Netezza table, with NO transformations. I am using only these two: tOracleInput and tNetezzaOutput. The table has about 600,000 rows. The job starts off fast (~500 rows/sec) but then crawls to about 1 row/sec once it reads about 400,000 rows. I tried the job with less number of rows to read and write. Here are the stats:
100,000 rows = 3 minutes
200,000 rows = 7 minutes
300,000 rows = 11 minutes
all (600,000) rows = 2 hours and counting
NetezzaOutput uses Bulk Load. The job writes to a pipe and Netezza reads from pipe (using an external table) and writes to target. This part runs very fast, once all data is available.
Version: TIS PE 3.0.2
Xms1024m
Xmx2048m
When I write into a delimited flat file instead of Netezza, the job dumps all 600,000 rows in less than 4 minutes. The flat file is about 100 MB in size. So I can probably rewrite the job to dump to flat file first and use BulkExec to load into target. But I would like to know if there is a way to improve performance the way it is.
Any help is appreciated.
Thanks
One Star

Re: Performance issue reading from Oracle and writing to Netezza

Hi,
I don't know netezza but I could take a look at the generated code. Could you please post the options you use in your components.
Bye
Volker
One Star

Re: Performance issue reading from Oracle and writing to Netezza

Options used in the tOracleInput component:
Basic Settings: For Query, I added a filter condition and a few case statements in the SELECT. All other settings at default (host, port, login, pw, schema, table etc)
Advanced Settings: "Use cursor" is unchecked
Options used in tNetezzaOutput component:
Apart from the standard values for Host, port, database, table, login and pw, I have
Action on Table: None
Action on Data: Insert
Please let me know if you need to look at the actual Java code.
Thanks a lot for looking into it.
-sundar
One Star

Re: Performance issue reading from Oracle and writing to Netezza

Hi sundar,
for me it looks like a bug in tNetezzaOutput:
All inserts are prepared for batch execution. Additional there is a hard value as commit frequency. But in the code block to check the commit interval does a commit but does not the batch execution. So tNetezzaOutput will ignore the commit interval and store all your data in memory.
I tested based on Talend 3.0.3. Which version do you use?
Bye
Volker
One Star

Re: Performance issue reading from Oracle and writing to Netezza

I am using 3.0.1.
Yes it could be a bug in tNetezzaOutput. But it may not have anything to do with the commit interval, because it does not really use it. I can see the log in Netezza when the job completes if using smaller data sets, the log contains entries of the SQL queries: a Netezza external table is created on a pipe (the job writes data into the pipe), and then it does an insert into target select from external table. It's all rows or nothing.
If I write to a delimited flat file instead of Netezza, all 600,000 rows get written out. It's only when writing to Netezza output that the job crawls to a stop.
I will open a ticket.
thanks-
One Star

Re: Performance issue reading from Oracle and writing to Netezza

Hi sundar,
as I wrote above I do not have any knowledge about Netezza. But I would say if I use batch execution all data / SQL will be stored until I do really execute the batch. And this is done at the end (after 600.000 rows). And computer systems in most cases will work exponential slower at increasing load and if you have to store the data). Additional if every 10.000 rows a commit is done for something which seems to be still not executed looks strange. So I think someone of the Talend team (with more knowledge than I) should check this code block.
If you have a test system you could try to patch Talend at the following position:
/org.talend.designer.components.localprovider/components/tNetezzaOutput/tNetezzaOutput_main.javajet
Try to use commit intervals and add the following ocde in the above file (after making a backup) at line 690 before the commit statement:
       pstmt_<%=cid %>.executeBatch();
pstmt_<%=cid %>.clearBatch();

Bye
Volker
One Star

Re: Performance issue reading from Oracle and writing to Netezza

Hi Volker,
This seems to have done the trick! I left the commit interval the same (10000 rows) and added the code you suggested to the tNetezzaOutput_main.javajet file, and now the job completes without any degradation in performance: it loaded all 600000 rows in about 20 min.
Thanks a lot for your help! I will add this to the bug I created so Talend can make the change to the component.
-sundar
One Star

Re: Performance issue reading from Oracle and writing to Netezza

hello ,
i have a simple job
i m using talend 4.0 . In tFileExcelInput component get the data from excel 2007 file(.xslx) but data
is too large its size is 20 MB and around 250000 rows & this excel sheet contain 10 sheet only one sheet data is around 250000 row.
when i m using 1000 rows then its work but when using row is 100000 then this job is not worked

thanks for ur help
naveen