Temp path for spark jars missing a slash?

One Star

Temp path for spark jars missing a slash?

I'm trying to run a Spark batch job on an EMR cluster, using Data Fabric 6.1.1
Jobs are being started on the cluster, but failing with an error like
java.io.FileNotFoundException: File file:/...<local_path_to_Talend>.../Talend-Studio-macosx-cocoa.app/Contents/MacOS/temp/reads3/lib/talend-spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

the job name is reads3 in this case. From the debug logs, I can see it trying to pass this file to the cluster (and several others in this directory as -Dspark.jars= arguments)
: org.apache.spark.deploy.yarn.Client -         SPARK_YARN_CACHE_FILES -> file:/...<local_path_to_Talend>.../Talend-Studio-macosx-cocoa.app/Contents/MacOS/temp/reads3/lib/talend-spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar#__spark__.jar
: org.apache.spark.deploy.yarn.Client -     resources:
: org.apache.spark.deploy.yarn.Client -         __spark__.jar -> resource { scheme: "file" port: -1 file: "/...<local_path_to_Talend>.../Talend-Studio-macosx-cocoa.app/Contents/MacOS/temp/reads3/lib/talend-spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar" } size: 183166734 timestamp: 1467949359000 type: FILE visibility: PRIVATE

The key point here appears to be that it's expecting these directories:
.../Talend-Studio-macosx-cocoa.app/Contents/MacOS/temp/reads3/lib
.../Talend-Studio-macosx-cocoa.app/Contents/MacOS/temp/reads3/reads3

But I find the following directories have been created (though they are mostly empty):
.../Talend-Studio-macosx-cocoa.app/Contents/MacOS/temp/reads3lib
.../Talend-Studio-macosx-cocoa.app/Contents/MacOS/temp/reads3reads3

It looks like Talend is not putting a trailing slash on a temporary directory name, but I'm not sure where I would go to fix this. It's possible that it's then not decompressing the archive there, but the directory names seems like the first issue. I tried
mkdir ./Talend-Studio-macosx-cocoa.app/Contents/MacOS/temp/reads3/
tar -xC /Talend-Studio-macosx-cocoa.app/Contents/MacOS/temp/reads3/ -f ./workspace/.Java/target/reads3_0_1.tar.gz

Which puts the required jar file in the right place, and then running it again, but it looks like Talend removes this directory when it builds. Though it leaves the reads3lib and reads3reads3 directories.
Moderator

Re: Temp path for spark jars missing a slash?

Hi Mark.Nettle,
Is your network running well?
Could you please report a ticket on Talend Support Portal for your Data Fabric 6.1.1 product. In this way, we can give you a remote assistance on this issue to see if it is a bug through support cycle with priority.

Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.