One Star

[resolved] Issue writing to HDP 2.0 from Talend 5.4.1

Downloaded the most recent Hortonworks HDP Sandbox (v2.0) and latest Talend Open Studio for Big Data (v5.4.1) on 2/10/14. I am able to interact with and upload data to the HDP VM through the Hue interface. However, when trying to upload data via Talend as per the tutorial available online at ___http://hortonworks.com/kb/how-to-connectwrite-a-file-to-hortonworks-sandbox-from-talend-studio/___ I am receiving the following error:
Starting job createrow at 17:03 10/02/2014.
connecting to socket on port 3958
connected
: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
: org.apache.hadoop.util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
at org.apache.hadoop.util.Shell.(Shell.java:293)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:76)
at org.apache.hadoop.conf.Configuration.getTrimmedStrings(Configuration.java:1546)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:519)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:453)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2433)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:156)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:153)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:153)
at firsttest.createrow_0_1.createrow.tRowGenerator_1Process(createrow.java:632)
at firsttest.createrow_0_1.createrow.tHDFSConnection_1Process(createrow.java:364)
at firsttest.createrow_0_1.createrow.runJobInTOS(createrow.java:1010)
at firsttest.createrow_0_1.createrow.main(createrow.java:875)
: org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream
java.net.ConnectException: Connection timed out: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1303)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
: org.apache.hadoop.hdfs.DFSClient - Abandoning BP-1578958328-10.0.2.15-1382306880516:blk_1073742537_1717
: org.apache.hadoop.hdfs.DFSClient - Excluding datanode 10.0.2.15:50010
: org.apache.hadoop.hdfs.DFSClient - DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/testfilez could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2503)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
Exception in component tHDFSOutput_1
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/testfilez could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2503)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
disconnected
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
: org.apache.hadoop.hdfs.DFSClient - Failed to close file /user/root/testfilez
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/testfilez could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2503)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
Job createrow ended at 17:03 10/02/2014.
___________
The file is created in HDP, but it is 0 bytes and thus contains no content.
Based on research online, I have verified that the data node is running and is not full (see attached cluster summary image). I have seen similar issues to this one with other versions of HDP/Talend in online forums, none of which has provided a solution. I am trying to have a tRowGenerator in Talend generate 100 rows and then output them into Hadoop via the tHDFSOutput. Every time I run the job I am receiving the error. I have the following configured in Talend:
tHDFSConnection:
Hadoop Version: Hortonworks Data Platform V2.0.0(BigWheel)
NameNode URI: "hdfs://127.0.0.1:8020/"
User name: "root" (have also tried "sandbox" and "hue")
tHDFSOutput:
File Name: "/user/root/testfile" (have tried "/" and "/user/hue/testfile")
tRowGenerator:
Configured to generate 100 rows, each with two string and 1 int columns.
The tHDFSConnection is connected to the tRowGenerator via an OnComponentOk trigger, and the tRowGenerator is connected to the tHDFSOutput via a Row Main.
Can anyone please suggest a solution to this connectivity issue? Thanks in advance for your advice!

6 REPLIES
Employee

Re: [resolved] Issue writing to HDP 2.0 from Talend 5.4.1

Hello,
The first issue is not a Talend problem, but a Hadoop issue:
: org.apache.hadoop.util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
The second one looks to be a network issue. In your tHDFSConnection, you shouldn't use the IP address but the hostname of your namenode.
One Star

Re: [resolved] Issue writing to HDP 2.0 from Talend 5.4.1

Hi,
In regards to the second issue, I have updated the tHDFSConnection to use the hostname (sandbox.hortonworks.com) instead of the IP (127.0.0.1). Note: this hostname is bound to this IP, so I'm not sure that that made a difference. My NameNode URI is now "hdfs://sandbox.hortonworks.com:8020". However, after making this change I am still receiving the same error as before where the only datanode on this VM is being excluded, and thus the data fails to replicate and I am left with a 0 byte file.
Do you have any additional suggestions to try? Thanks!
Employee

Re: [resolved] Issue writing to HDP 2.0 from Talend 5.4.1

Hello,
I have created a short movie in order to explain: https://www.youtube.com/watch?v=xG3nQAfkEyM&feature=youtu.be
Cheers,
One Star

Re: [resolved] Issue writing to HDP 2.0 from Talend 5.4.1

Hi,
rdubois - thank you for taking the time to make that video. To resolve the issue I was facing I first went to my Oracle VM VirtualBox Manager and changed the network settings for my Hortonworks Sandbox 2.0 VM to be attached to the Host-Only Adapter (it was previously attached to NAT by default). This led to a very different IP address in the sandbox than what I had been getting previously. I then followed the steps in your video and was able to write to HDP from Talend.
I had tried the steps in your video without making the change to the VM settings, and it still was not working, so it appears this is a necessary step.
Thanks again for your help with this - I'll mark this issue as resolved.
Employee

Re: [resolved] Issue writing to HDP 2.0 from Talend 5.4.1

Hi dgoverman,
Yes you are right, the NAT connection mode is not working worrectly with Virtual Box.
Using the Host-Only adapter or the Bridge option is better.
In the contrary, NAT works well with VMWare.
Good to see your issues are resolved.
One Star

Re: [resolved] Issue writing to HDP 2.0 from Talend 5.4.1

Thanks Remmy,
It's so simple once the correct procedure is followed. Wish that video had been available 2 weeks ago.
I also had the problem with Virtual Bax - using that as it was recommended on Hortonworks web site.
Thanks again
Jan