[resolved] Errors when loading large data sets to/from Microsoft Azure

One Star

[resolved] Errors when loading large data sets to/from Microsoft Azure

Hello,
When trying to use a Microsoft Azure connection in my Talend package (Data Integration v5.6) I get errors like the following:
"I/O Error: Connection reset
Exception in component tMSSqlOutput_1
java.sql.SQLException: Invalid state, the Connection object is closed.
at net.sourceforge.jtds.jdbc.ConnectionJDBC2.checkOpen(ConnectionJDBC2.java:1699)
at net.sourceforge.jtds.jdbc.ConnectionJDBC2.commit(ConnectionJDBC2.java:2085)
at dbservicestest.azure_query_test_0_1.Azure_Query_Test.tMSSqlInput_2Process(Azure_Query_Test.java:1397)
at dbservicestest.azure_query_test_0_1.Azure_Query_Test.tMSSqlConnection_1Process(Azure_Query_Test.java:529)
at dbservicestest.azure_query_test_0_1.Azure_Query_Test.runJobInTOS(Azure_Query_Test.java:1755)
at dbservicestest.azure_query_test_0_1.Azure_Query_Test.main(Azure_Query_Test.java:1540)"
For example, in a job where I send rows from a local SQL database (via tMSSqlInput) to a tMSSqlOutput component connected to Azure, it is successful at loading a small number of rows (1-97), but when loading more rows (more than 98 or so) it fails with an I/O Error. 
I will need to send tens of thousands of rows, so batching things in groups of 50 is not practical.
--Porter

Accepted Solutions
One Star

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

I figured out how to get this work.
1. Add some keys to the registry, as described in this article: https://msdn.microsoft.com/en-us/library/hh290696(v=sql.110).aspx
2. In the Advanced settings of the component, change the batch size to 10
This runs very slow, and I'm not sure it's the cleanest solution, but it's the only way I was able to get it to work.

All Replies
Moderator

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

Hi,
Could you please show us the screenshots of job setting and sql query?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

Here are some screenshots:

The error is in the "tMSSqlOutput_1" component.
One Star

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

Looks like I'm having trouble uploading screenshots... let me try again.

EDIT: Still unable to post screenshots. I'm using "Drop files here to attach them" below, which seems to work until I hit submit, at which point it scrubs the images from the post before submitting it.
One Star

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

I figured out how to get this work.
1. Add some keys to the registry, as described in this article: https://msdn.microsoft.com/en-us/library/hh290696(v=sql.110).aspx
2. In the Advanced settings of the component, change the batch size to 10
This runs very slow, and I'm not sure it's the cleanest solution, but it's the only way I was able to get it to work.
Moderator

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

Hi,
Thanks for your solution. Have you tried to change the batch size to 100 to see if it is better?
What's the current rate for your work flow?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

I had initially tried lowering the batch size to 100 and even to 50, but it was still dropping the connection at times (I'm guessing this is because of a slow or unstable connection to Azure). Only by reducing it to 10 was I able to get it to work--though I may be able to experiment more to find the "sweet spot".
The data flow rate seems to average between 200-400 rows/second. I don't necessarily blame Talend for this--it could be that our connection to Azure is just slow--but it seems that sending things in bigger batches might help reduce overhead, if I could get it to not drop the connection. C'est la vie!
One Star

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

Hi,
Did you ever get a better solution to this?  I am having the same or very similar issue trying to write to Azure.  I get a "I/O Error: Connection reset by peer: socket write error" whenever the batch size is about 10.  I started at 10,000 and kept trying different numbers until it would work.  The problem is this makes the job take 5 hours to complete for 1 of 12 tables.  That isn't going to work for me.
Maybe I should update my MS SQL java driver but I am not really sure how.
One Star

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

Hi,
Did you ever get a better solution to this?  I am having the same or very similar issue trying to write to Azure.  I get a "I/O Error: Connection reset by peer: socket write error" whenever the batch size is about 10.  I started at 10,000 and kept trying different numbers until it would work.  The problem is this makes the job take 5 hours to complete for 1 of 12 tables.  That isn't going to work for me.
Maybe I should update my MS SQL java driver but I am not really sure how.

It looks like our current approach is to uncheck "Die on error", so that the process will silently continue on its way if there is an error writing to the database. We have our batch size set to 10.
Probably a better approach would be to make a custom component (or just ditch Talend and write a custom app) that handles the IO exception and resumes sending data using some sort of exponential backoff. But we didn't have the time to do that, and there were other priorities, so we left it like this for now.
One Star

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

I actually found a solution today.  It seems that Talend's MS components are using an open source jdbc driver that might be out of date. I am not sure if the problem I had is with all MS sql but at least with Azure data sources. To make this work for me I downloaded a jdbc driver directly from Microsoft and set up a Generic JDBC source in Talend.  The JDBC connection string should be like this.
jdbc:sqlserver://<serveraddress>;databaseName=<dbname>;user=<username>;password=<password>
I was able to leave the batch size the same and complete my task in a few minutes instead of the 5 hours it took with built in components.
One Star

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

Hi robbz23,
I am facing the same issue for which your comment is to get it solved.
Can you please guide me about the steps to be followed in order to resolve it?
One Star

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

Dear all,
I've noticed the same problem. With the default Talend driver I got 300 rows / second. With the most recent jdbc driver downloade from MS, I get 900 records / second.
Moderator

Re: [resolved] Errors when loading large data sets to/from Microsoft Azure

Hello, 
Could you please indicate what's the build version you are using? So far, the lastest released version is 6.3.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.