Seven Stars

Taking too much time to load data

Hi,

 

I am loading data from Netezza to Vertica Database, i am having 1.3cr records of data it was taking almost 2hr to load data.

I tried using bulk load,i also enabled the cursor option in NetezzaInput component.

And i also tried to load data using Dynamic data type option in Edit schema i was getting the following error

org.netezza.error.NzSQLException: An existing connection was forcibly closed by the remote host

 

can any one can help me on this error and also any other option to load data faster.

 

Thanks,

Bharath.

Tags (1)
20 REPLIES
Twelve Stars

Re: Taking too much time to load data

I'm assuming that cr corresponds to crore (10,000,000). Is that correct? If so, that is a rate of around 1800 rows a second. Are your source and target dbs local or remote (that will have an impact)? Have you got a comparison of rates that you expect and have seen from other software?  

Rilhia Solutions
Seven Stars

Re: Taking too much time to load data

Hi rhall,

 

Yes the data was 1.3 crore and the databases are in remote location. The row count is around 1600 to 2000 rows/second.

But that is not exactly the row count, it is decreasing and increasing automatically.

 

Thanks,

Bharath.

Twelve Stars

Re: Taking too much time to load data

Hmmmm it sounds like you need to first workout whether your network bandwidth is sufficient to deal with the load. 1800 records a second may not be too bad if the dbs are located remotely.....especially if they are in different remote locations. Can you test the bandwidth during a run? 

Rilhia Solutions
Seven Stars

Re: Taking too much time to load data

Hi,

 

Thanks for your quick response.

Bandwidth means Ethernet speed right. If it so the speed was 1.5 MB/S.

If not could you please let me know where i need to check it.

 

Thanks,

Bharath.

Twelve Stars

Re: Taking too much time to load data

I'm afraid I cannot answer your questions about your network, I was merely suggesting you consider that unless you can prove that you can get faster performance from other tools on the same infrastructure. Problems like this are very difficult to isolate and people generally don't consider the most obvious reasons for why performance may not be what is expected.

 

If 1.5MB/s is 1.5 mega bytes per second, your network speed is not very fast. If it is mega bits per second, it is even worse. You also have to consider your source and target dbs' upload and download bandwiths AND if anyone else is using your network for anything else at the same time.

 

As an example of how location can affect performance. I was on a project where I was running Talend on an internal network where the performance was reasonable (I cannot remember the number of rows per second). I went off site and ran the same code in a different country. The performance went down by a factor of around 20. The data had to be retrieved from my source db over the internet to my laptop, processed and then sent back to the target db over the internet. The latency caused was massive. You need to consider this before assuming there is anything wrong with the job

Rilhia Solutions
Seven Stars

Re: Taking too much time to load data

Thanks for your quick response.

The speed is 1.5 MegaByte/Sec. Then i will treat it as a network bandwidth problem.

Do we have any other option which can load data faster(less than  2hr ).

 

Thanks,

Bharath.

Twelve Stars

Re: Taking too much time to load data

I wasn't saying that it is definitely a network issue, I was saying that you need to diagnose this like a doctor might diagnose a patient. The patient might have a pain their hip that is caused by a defect in their foot. You need to know what the limitations are and what you should realistically be able to expect.

Rilhia Solutions
Seven Stars

Re: Taking too much time to load data

Ya i will defiantly try to diagnose this issue. Thanks for your time for helping me out.

As i mentioned earlier i was getting the following Exception 

org.netezza.error.NzSQLException: An existing connection was forcibly closed by the remote host

do you have any idea about this.

 

Thanks,

Bharath.

Twelve Stars

Re: Taking too much time to load data

I'm afraid I cannot answer that question without trying it out and I do not have a suitable environment

Rilhia Solutions
Seven Stars

Re: Taking too much time to load data

OK.

Thank you!! for spending your valuable time. Please let me know if you get anything positive.

 

 

Thanks,

Bharath.

Twelve Stars TRF
Twelve Stars

Re: Taking too much time to load data

First, regarding the cause of the NzSQLException, maybe a timeout.
I've encountered a similar issue with Salesforce in the past. The operations triggered by the tSalesforceOutputBulkExec were so slow that at a moment the component failed with a timeout exception.
As you are facing a response time issue, maybe it should be a trail to follow. What happens if you reduce the number of records you try to get from Netezza?

Second, even if the case is not solved, my opinion is that explainations are so detailed and meaningful that you should give him some kudos for it's time.

TRF
Seven Stars

Re: Taking too much time to load data

Hi,

 

Thanks for your response.

When tried to load data by reducing the records it works fine, but if we don't do that i am getting this exception.

 

Thanks,

Bharath.

Twelve Stars TRF
Twelve Stars

Re: Taking too much time to load data

Hi,

Thank's for the Kudo even if I was talking about @rhall_2_0 effort!

So, it seems the trail I suggested is not so bad Smiley Wink

Don't know the specificities of tELTNetezza components and how to optimize query or work with subset and iterate to get the expected result.


TRF
Seven Stars

Re: Taking too much time to load data

Hi,

 

I am not using tELTNetezza i am using ETL component only. And i was using Dynamic datatype to load data.

So could you please help me on this.

 

Thanks,

Bharath.

Ten Stars

Re: Taking too much time to load data

If you're running into a connection timeout, you can try to break up your data set into smaller files and load each separately.
Twelve Stars TRF
Twelve Stars

Re: Taking too much time to load data

OK, Netezza components were not visible in my palette, now they are.

As I suggest in my previous answer, 1st thing to do is to optimize the query.

Then you may also try to retrieve subset of rows instead of all rows at the same time.

Playing with "LIMIT" you can decide how many rows to select at a time and playing with "OFFSET" you decide how many rows to ignore.

So, you can select rows 1-100000 then 1000001-2000000 and so on.

Just add a loop over the "select" operation, and use the change the query at each time to get the desired values for the OFFSET/LIMIT parameters.


TRF
Seven Stars

Re: Taking too much time to load data

Thanks for your reply.

But there is nothing to optimize the query,it is just direct load from netezza to vertica.

There are many other tables which needs to be automated.I think in this scenario limit will not work.

Please correct me if i am wrong.

 

 

Thanks,

Bharath.

Twelve Stars TRF
Twelve Stars

Re: Taking too much time to load data

OK, nothing to optimize, it's possible.

However, why do you think the proposed scenarion will not work?

In fact, I think it should, you just have to set the limit and offset values to minimize the number of queries to get all the expected rows.

Now, how these parameters are processed by the db engine? 

If querying all the rows is not necessary to apply the offset, you should have not too bad response time.


TRF
Seven Stars

Re: Taking too much time to load data

OK.

If we use offsets we cannot automate the jobs right,we need to do it manually.We exactly don't know what is the row count of table, so in this case what will be the solution.Please correct me if i am wrong.

 

Thanks,

Bharath.

Twelve Stars TRF
Twelve Stars

Re: Taking too much time to load data

You don't need to how many rows in the table. Just decide/test how many you are able to get at a time. For example 1,000,000 then with the appropriate logic, iterate over the select tasks (tNetezzaInput, tFileOutputDelimited) and of course a piece of Java code to set the offset and limit values using global variables.
I'm sure it works but can't be sure how the db will react (don't work with netezza); I just know the engine accept offset and limit.

TRF