org.apache.spark.SparkContext - Error initializing SparkContext

One Star

org.apache.spark.SparkContext - Error initializing SparkContext

hi,

 

i'm new to talend big data integration,
i'm currently try to create spark big data batch job in talend, and  encounter following error.

the job is only to read data from hive, using talend big data batch to tlogrow
talend big data batch.png
Starting job TEST_SPARK at 15:23 30/09/2018.

[statistics] connecting to socket on port 3746
[statistics] connected
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/TalendStudio/talendworkspace/.Java/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/TalendStudio/talendworkspace/.Java/lib/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[WARN ]: org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[WARN ]: org.apache.spark.SparkConf - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
[WARN ]: org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows.
[WARN ]: org.apache.hadoop.hdfs.DFSClient - DFSOutputStream ResponseProcessor exception  for block BP-1971060428-10.1.98.58-1536015946021:blk_1073769294_28491
java.io.IOException: An existing connection was forcibly closed by the remote host
    at sun.nio.ch.SocketDispatcher.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:197)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2390)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:837)
[WARN ]: org.apache.hadoop.hdfs.DFSClient - Error Recovery for block BP-1971060428-10.1.98.58-1536015946021:blk_1073769294_28491 in pipeline DatanodeInfoWithStorage[10.1.98.60:50010,DS-c101ddeb-7b7b-4bce-a539-56779c4d2787,DISK], DatanodeInfoWithStorage[10.1.98.61:50010,DS-a1a58bf4-8cb2-4795-a8ac-b06bf9160196,DISK]: bad datanode DatanodeInfoWithStorage[10.1.98.60:50010,DS-c101ddeb-7b7b-4bce-a539-56779c4d2787,DISK]
[WARN ]: org.apache.hadoop.hdfs.DFSClient - DataStreamer Exception
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.1.98.61:50010,DS-a1a58bf4-8cb2-4795-a8ac-b06bf9160196,DISK]], original=[DatanodeInfoWithStorage[10.1.98.61:50010,DS-a1a58bf4-8cb2-4795-a8ac-b06bf9160196,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1036)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1110)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1268)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:993)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:500)
[ERROR]: org.apache.spark.SparkContext - Error initializing SparkContext.
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.1.98.61:50010,DS-a1a58bf4-8cb2-4795-a8ac-b06bf9160196,DISK]], original=[DatanodeInfoWithStorage[10.1.98.61:50010,DS-a1a58bf4-8cb2-4795-a8ac-b06bf9160196,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1036)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1110)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1268)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:993)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:500)
[WARN ]: org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint - Attempted to request executors before the AM has registered!
[WARN ]: org.apache.spark.metrics.MetricsSystem - Stopping a MetricsSystem that is not running
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.1.98.61:50010,DS-a1a58bf4-8cb2-4795-a8ac-b06bf9160196,DISK]], original=[DatanodeInfoWithStorage[10.1.98.61:50010,DS-a1a58bf4-8cb2-4795-a8ac-b06bf9160196,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1036)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1110)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1268)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:993)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:500)
[ERROR]: eds_spark_test.test_spark_0_1.TEST_SPARK - TalendJob: 'TEST_SPARK' - Failed with exit code: 1.
Exception in thread "main" java.lang.RuntimeException: TalendJob: 'TEST_SPARK' - Failed with exit code: 1.
    at eds_spark_test.test_spark_0_1.TEST_SPARK.main(TEST_SPARK.java:1049)
Job TEST_SPARK ended at 15:31 30/09/2018. [exit code=0]

 

 

i'm not sure whats causing this.

i'm currently using,

Talend Big Data 6.4.1 and Hortonworks HDP 2.6.5.0

let me know if more detail needed.

Moderator

Re: org.apache.spark.SparkContext - Error initializing SparkContext

Hello,

Are you using jdk 1.8? Is your cluster correctly configured and your connection from the repository? More information will be preferred.

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now