Four Stars

tFileFetch hangs at random url in iterate loop

Hi Group,

 

I have a job that reads a list of image url's from a DB and downloads each in an iterate loop.  The job hangs at a random url in the tFileFetch.  The debugger is no help:

 

2017-06-08_17-03-59.png

 

I have the timeout set for 10 seconds (10000 ms) in the tFileFetch

I set the timeout to 10ms and it did finish but missed 80% of the files.

I set it to 100ms and it hung...

Any advice at all on how to proceed would be greatly appreciated.  I am running v6.2.1

  • Data Integration
Tags (1)
8 REPLIES
Moderator

Re: tFileFetch hangs at random url in iterate loop

Hi,

Could you please also post your current job design screenshot into forum? How does your loop work?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: tFileFetch hangs at random url in iterate loop

Here is the main job.  It looks at two url sources.  Then I iterate to a subjob to fetch the two images.  The url's are passed as params:

job.png

The subjob checks to see if the url field is null or blank.  If not it tries the fetch

subjob.png

There are about 4000 images to download, and It get anywhere between 10 and 500 before it hangs.  Files come from many different locations.  When it hangs I see the file it's trying to download in the folder in my work directory, but it shows a size of 0.  When it hangs it will sit forever - until you manually cancel the job.

 

I'm going to try building a simplified test job to see if I can get more insight.  One without a subjob and multiple iterate operations.  And another test job that just tries to download the same file in a iterate loop. 

Ten Stars

Re: tFileFetch hangs at random url in iterate loop

In the Advanced settings of tFileFetch is an option to "Print response to console". Does that reveal anything interesting just before the job hangs?
Four Stars

Re: tFileFetch hangs at random url in iterate loop

I did try this but no error is shown:

Status Line: HTTP/1.1 200 OK
*** Response Header ***
Date: Fri, 09 Jun 2017 16:16:05 GMT
Server: Apache/2.2.15 (CentOS)
Last-Modified: Tue, 28 Feb 2017 22:09:31 GMT
ETag: "ac0509-8a47-5499e73433cc0"
Accept-Ranges: bytes
Content-Length: 35399
Connection: close
Content-Type: image/jpeg
Four Stars

Re: tFileFetch hangs at random url in iterate loop

another update.  I created a simple job and connected a tloop for 1 to 3000 ----> iterate -----> tfilefetch

the file is the same file every time:

...https://images.tradeservice.com/TF6WT1M87RNDP45U/PRODUCTIMAGES/DIR100002/BRONUTC00003_16_TN_001.jpg

after a random number of downloads, the Talend job hangs.  I quickly try the url in a browser, and it spins.  So the server is slow...

But the question is, why doesn't the tfilefetch give up and move on to the next iteration after the specified number of milliseconds?

 

Four Stars

Re: tFileFetch hangs at random url in iterate loop

Another update. I built the following code routine for testing and call it in a loop with tjava. This runs without hanging up. Note that the code disables ssl cert checking so it's not a production solution:

public static String HttpDownloadFile(String fileURL, String saveDir, String saveName) {
String msg = "";

// Create a new trust manager that trust all certificates
TrustManager[] trustAllCerts = new TrustManager[]{
new X509TrustManager() {
public java.security.cert.X509Certificate[] getAcceptedIssuers() {
return null;
}
public void checkClientTrusted(
java.security.cert.X509Certificate[] certs, String authType) {
}
public void checkServerTrusted(
java.security.cert.X509Certificate[] certs, String authType) {
}
}
};

// Activate the new trust manager
try {
SSLContext sc = SSLContext.getInstance("SSL");
sc.init(null, trustAllCerts, new java.security.SecureRandom());
HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
} catch (Exception e) {
msg = "DOWNLOAD ERROR: Unhandled exception.";
e.printStackTrace();
}

try {
URL url = new URL(fileURL);
URLConnection connection = url.openConnection();

String fileName = "";
String disposition = connection.getHeaderField("Content-Disposition");
String contentType = connection.getContentType();
int contentLength = connection.getContentLength();

if (disposition != null) {
// extracts file name from header field
int index = disposition.indexOf("filename=");
if (index > 0) {
fileName = disposition.substring(index + 10,
disposition.length() - 1);
}
} else {
// extracts file name from URL
fileName = fileURL.substring(fileURL.lastIndexOf("/") + 1,
fileURL.length());
}

System.out.println("Content-Type = " + contentType);
System.out.println("Content-Disposition = " + disposition);
System.out.println("Content-Length = " + contentLength);
System.out.println("fileName = " + fileName);

// opens input stream from the HTTP connection
InputStream inputStream = connection.getInputStream();
String saveFilePath = saveDir + File.separator + saveName;

// opens an output stream to save into file
FileOutputStream outputStream = new FileOutputStream(saveFilePath);

int bytesRead = -1;
byte[] buffer = new byte[8192];
while ((bytesRead = inputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, bytesRead);
}

outputStream.close();
inputStream.close();

System.out.println("File downloaded");
} catch (FileNotFoundException e) {
msg = "DOWNLOAD ERROR: File not found.";
} catch (IOException ex) {
msg = "DOWNLOAD ERROR: Unhandled exception.";
ex.printStackTrace();
}

return msg;
}



Moderator

Re: tFileFetch hangs at random url in iterate loop

Hello,

Could you please go to Window > Show view > General > Error Log to see if there is any error message?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: tFileFetch hangs at random url in iterate loop

Here are the log entries for a job.

 

!ENTRY org.talend.platform.logging 1 0 2017-07-10 10:31:16.262

!MESSAGE 2017-07-10 10:31:16,262 INFO org.eclipse.m2e.core.internal.lifecyclemapping.LifecycleMappingFactory - Using NULL lifecycle mapping for MavenProject: org.talend.master.butler:code.Master:6.2.1 @ C:\Talend\Workspaces_6_2_1\WS_Butler\.Java\pom.xml.

 

 

!ENTRY org.talend.platform.logging 1 0 2017-07-10 10:31:17.402

!MESSAGE 2017-07-10 10:31:17,402 INFO org.eclipse.m2e.core.internal.lifecyclemapping.LifecycleMappingFactory - Using NULL lifecycle mapping for MavenProject: org.talend.master.butler:code.Master:6.2.1 @ C:\Talend\Workspaces_6_2_1\WS_Butler\.Java\pom.xml.

 

 

!ENTRY org.talend.platform.logging 1 0 2017-07-10 10:31:18.543

!MESSAGE 2017-07-10 10:31:18,543 INFO org.eclipse.m2e.core.internal.lifecyclemapping.LifecycleMappingFactory - Using NULL lifecycle mapping for MavenProject: org.talend.master.butler:code.Master:6.2.1 @ C:\Talend\Workspaces_6_2_1\WS_Butler\.Java\pom.xml.

 

 

!ENTRY org.talend.platform.logging 1 0 2017-07-10 10:31:19.715

!MESSAGE 2017-07-10 10:31:19,715 INFO org.eclipse.m2e.core.internal.lifecyclemapping.LifecycleMappingFactory - Using NULL lifecycle mapping for MavenProject: org.talend.master.butler:code.Master:6.2.1 @ C:\Talend\Workspaces_6_2_1\WS_Butler\.Java\pom.xml.

 

 

!ENTRY org.talend.platform.logging 1 0 2017-07-10 10:31:20.403

!MESSAGE 2017-07-10 10:31:20,403 INFO org.talend.designer.core.runprocess.Processor - Command line: C:/Java/jre1.8.0_112/bin/java.exe -Xms256M -Xmx1024M -Dfile.encoding=UTF-8 -cp C:/Talend/Workspaces_6_2_1/WS_Butler/.Java/target/classes;.;C:/Talend/Workspaces_6_2_1/WS_Butler/.Java/lib/commons-codec-1.6.jar;C:/Talend/Workspaces_6_2_1/WS_Butler/.Java/lib/commons-httpclient-3.0.1.jar;C:/Talend/Workspaces_6_2_1/WS_Butler/.Java/lib/commons-logging-1.1.jar;C:/Talend/Workspaces_6_2_1/WS_Butler/.Java/lib/dom4j-1.6.1.jar;C:/Talend/Workspaces_6_2_1/WS_Butler/.Java/lib/jcifs-1.3.0.jar;C:/Talend/Workspaces_6_2_1/WS_Butler/.Java/lib/jtds-1.3.1-patch.jar;C:/Talend/Workspaces_6_2_1/WS_Butler/.Java/lib/log4j-1.2.16.jar;C:/Talend/Workspaces_6_2_1/WS_Butler/.Java/lib/talendcsv.jar;C:/Talend/Workspaces_6_2_1/WS_Butler/.Java/lib; butler.testfilefetch_0_1.testFileFetch --context=Test --stat_port=3978 %*