Salesforce Bulk API java.net.SocketException: Connection reset

One Star

Salesforce Bulk API java.net.SocketException: Connection reset

Hi all,
I am using Talend 5.6.0_20141024_1545
My organization is trailing Talend for ELT and we are testing with Salesforce.
We have a large LEADS table ~3 million rows with ~400+ columns.
This means that the bulk component is necessary to pull data.
I have tried many ways to get this data, including splitting the SOQL query by createddate as a filter.
I get this error

Exception in component tSalesforceInput_1
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at sun.security.ssl.InputRecord.readFully(Unknown Source)
at sun.security.ssl.InputRecord.read(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readDataRecord(Unknown Source)
at sun.security.ssl.AppInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.ChunkedInputStream.fastRead(Unknown Source)
at sun.net.www.http.ChunkedInputStream.read(Unknown Source)
at java.io.FilterInputStream.read(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
at java.util.zip.InflaterInputStream.fill(Unknown Source)
at java.util.zip.InflaterInputStream.read(Unknown Source)
disconnected
at java.util.zip.GZIPInputStream.read(Unknown Source)
at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.read1(Unknown Source)
at java.io.BufferedReader.read(Unknown Source)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.read1(Unknown Source)
at java.io.BufferedReader.read(Unknown Source)
at com.talend.csv.CSVReader.fill(CSVReader.java:444)
at com.talend.csv.CSVReader.readNext(CSVReader.java:189)
at org.talend.salesforceBulk.SalesforceBulkAPI.getQueryResult(SalesforceBulkAPI.java:370)
at dw2.bulk_lead_pt1_0_1.bulk_lead_pt1.tSalesforceInput_1Process(bulk_lead_pt1.java:16439)
at dw2.bulk_lead_pt1_0_1.bulk_lead_pt1$6.run(bulk_lead_pt1.java:23414)

This is my batch info from the console
-------------- waiting ----------,firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2014,MONTH=10,WEEK_OF_YEAR=46,WEEK_OF_MONTH=3,DAY_OF_MONTH=13,DAY_OF_YEAR=317,DAY_OF_WEEK=5,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=5,HOUR_OF_DAY=5,MINUTE=48,SECOND=50,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0]'
systemModstamp='java.util.GregorianCalendar,firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2014,MONTH=10,WEEK_OF_YEAR=46,WEEK_OF_MONTH=3,DAY_OF_MONTH=13,DAY_OF_YEAR=317,DAY_OF_WEEK=5,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=6,HOUR_OF_DAY=6,MINUTE=44,SECOND=23,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0]'
numberRecordsProcessed='1345006'
numberRecordsFailed='0'
totalProcessingTime='0'
apiActiveProcessingTime='0'
apexProcessingTime='0'
]

As you can see, it only processed 390,949 rows out of 1,345,006 before failing.
One Star

Re: Salesforce Bulk API java.net.SocketException: Connection reset

I have tried on a much smaller table, 219 columns and 133932 rows.
I get the same error.

One Star

Re: Salesforce Bulk API java.net.SocketException: Connection reset

I have also tried storing the tMap on disk and writing the CSV in row mode to see if it would help, but it does not.
Additionally, I tried increasing the TimeOut to 600000 ms on the Salesforce connection, but this does not seem to help either.
Can anyone give some tips on how else to troubleshoot? 
Has anyone else experienced this problem before?  I have tried to search the forums and the rest of the web but I can't find anything.
One Star

Re: Salesforce Bulk API java.net.SocketException: Connection reset

Update:
Created a JIRA ticket
https://jira.talendforge.org/browse/TDI-31213
Seventeen Stars

Re: Salesforce Bulk API java.net.SocketException: Connection reset

Generally this error is caused by a network problem and so far no communication with SalesForce has taken place. Therefore changing parameters of the query or similiar cannot help.
Increasing the timeout will in this case probably only lead to a longer time until you get this error.
I would suggest you check if you need to use a proxy or if some firewall rules prevents TCP traffic to SalesForce.
At a first simple test you could start a ping from the server where your job runs to the SalesForce server.
One Star

Re: Salesforce Bulk API java.net.SocketException: Connection reset

Thanks for the feedback, Jan.
Unfortunately, I think that this is not a causing my problems.
For one, I have been able to use the bulk query many many times, and it only fails on very large (3mm++ rows and 300+ column) objects.
Secondly, you can clearly see in the first screenshot, that data has actually been transferred.
Finally, I am also able to run the same query using the regular query, although it takes 20+ hours to complete.
Additionally, I am also able to monitor the inbound traffic, and can see data coming in from salesforce.
Thanks again!
One Star

Re: Salesforce Bulk API java.net.SocketException: Connection reset

I have also checked, and I can ping salesforce. I also do not need to change any firewall rules, or use a proxy.
One Star

Re: Salesforce Bulk API java.net.SocketException: Connection reset

Hi, I have the same problem.
I was using Talend TOS DI  5.3 and upgraded to 5.6 hoping the Salesforce Bulk Query will work as I have a table with more than 100 k rows. Unfortunately it does not work because of 10 k limit .
Does anyone have suggestions? I really need to speed the SQL.
Thanks!