tHDFSOutput Error: File ... could only be replicated to 0 nodes

Four Stars

tHDFSOutput Error: File ... could only be replicated to 0 nodes

Dear talend Community,

I am using the following software:

 

  • Ubuntu 16.04 LTS
  • Virtual Box 5.0.40
  • TOS_BD-20170623_1246-V6.4.1
  • HDP_2.6_virtualbox_05_05_2017_14_46_00_hdp.ova

I can successfully access the sandbox in Firefox browser using 127.0.0.1:8888, 127.0.0.1:8080 and also 127.0.0.1:50070.

 

Now I want to send data from talend to the sandbox. The connection seems to work fine as I can browse the files clicking the button behind the FileName field in the Components Tab of tHDFSOutput. But running the job results in the following error:


Exception in component tHDFSOutput_1 (testHDPConnection)
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/raj_ops/out-2.csv could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1703)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3336)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3260)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:849)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:503)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

The file is created but and I can see it using the Files View in the Sandbox. But there is no data written, it stays an empty file. After reading posts with the same error, I still don't knwo how to make this work.

 

Thank you in advance for helping me resolving the issue!

Employee

Re: tHDFSOutput Error: File ... could only be replicated to 0 nodes

Hello!  For a quick check, can you open the URI http://sandbox-host-name:50070/ in a browser and verify that your DataNode is healthy?

 

In the table, you should see Live Nodes: 1

Four Stars

Re: tHDFSOutput Error: File ... could only be replicated to 0 nodes

Thank you for your reply. I attached a screenshot where you can see what 127.0.0.1:50070 looks like on my system.

 

 

Highlighted
Four Stars

Re: tHDFSOutput Error: File ... could only be replicated to 0 nodes

In between I tried to debug and followed some posts:. I added the HDFS Ports to the Virtual Box (as described in the link in this post: https://community.hortonworks.com/questions/82072/file-userroottmptesttxt-could-only-be-replicated-t...). As this didn't work I tried to use an older Sandbox version with no docker (2.4.), but still I have the same error.

In this post : https://www.talendforge.org/forum/viewtopic.php?id=50662 there is a parameter mentioned in hdfs-default.xml, but where is hdfs-default.xml? Logging into the sandbox with  ssh root@127.0.0.1 -p 2222 I only find hdfs-site.xml but unfortunately not containing the parameter mentioned: "dfs.client.use.datanode.hostname". Running the tHDFSOutput-job with "Use Datanode Hostname" enabled and disabled: still same error and empty file being created.

 

And although adding the IP (as described here: https://www.youtube.com/watch?v=xG3nQAfkEyM&feature=youtu.be ) I still cannot open sandbox:8080 or 8088 but still only  127.0.0.1:8088 or localhost:8088. Can this be related to the error? And what did I do wrong?

 

Thanks for any hint!

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now