Unable to read à file with tHDFSInput (HDFS, cloudera vm)

Highlighted
Five Stars

Unable to read à file with tHDFSInput (HDFS, cloudera vm)

Hello,
I have a problem with TOS Big Data v6, i try to read a file located int the hdfs with a tHDFSInput and a have this error :
: org.apache.hadoop.hdfs.DFSClient - Could not obtain block: BP-286282631-127.0.0.1-1433865208026:blk_1073742460_1646 file=/user/cloudera/achats.txt No live nodes contain current block Block locations: DatanodeInfoWithStorage Dead nodes:  DatanodeInfoWithStorage. Throwing a BlockMissingException
: org.apache.hadoop.hdfs.DFSClient - Could not obtain block: BP-286282631-127.0.0.1-1433865208026:blk_1073742460_1646 file=/user/cloudera/achats.txt No live nodes contain current block Block locations: DatanodeInfoWithStorage Dead nodes:  DatanodeInfoWithStorage. Throwing a BlockMissingException
Exception in component tHDFSInput_1
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-286282631-127.0.0.1-1433865208026:blk_1073742460_1646 file=/user/cloudera/achats.txt
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:938)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:607)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:847)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:897)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at org.talend.fileprocess.UnicodeReader.<init>(UnicodeReader.java:25)
at org.talend.fileprocess.TOSDelimitedReader.<init>(TOSDelimitedReader.java:77)
at org.talend.fileprocess.FileInputDelimited.<init>(FileInputDelimited.java:93)
at local_project.tes_001_0_1.tes_001.tHDFSInput_1Process(tes_001.java:827)
at local_project.tes_001_0_1.tes_001.tHDFSConnection_1Process(tes_001.java:364)
: org.apache.hadoop.hdfs.DFSClient - DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-286282631-127.0.0.1-1433865208026:blk_1073742460_1646 file=/user/cloudera/achats.txt
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:938)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:607)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:847)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:897)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at org.talend.fileprocess.UnicodeReader.<init>(UnicodeReader.java:25)
at org.talend.fileprocess.TOSDelimitedReader.<init>(TOSDelimitedReader.java:77)
at org.talend.fileprocess.FileInputDelimited.<init>(FileInputDelimited.java:93)
at local_project.tes_001_0_1.tes_001.tHDFSInput_1Process(tes_001.java:827)
at local_project.tes_001_0_1.tes_001.runJobInTOS(tes_001.java:1198)
at local_project.tes_001_0_1.tes_001.main(tes_001.java:1055)
at local_project.tes_001_0_1.tes_001.tHDFSConnection_1Process(tes_001.java:364)
at local_project.tes_001_0_1.tes_001.runJobInTOS(tes_001.java:1198)
at local_project.tes_001_0_1.tes_001.main(tes_001.java:1055)
disconnected

When a tried to put a file into the hdfs with the talend component tHDFSPut, the file is created but empty ! and i have this error :
: org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in component tHDFSPut_1
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/cloudera/soc.txt could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1541)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3286)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:667)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
: org.apache.hadoop.hdfs.DFSClient - DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/cloudera/soc.txt could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1541)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3286)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:667)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy7.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy7.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy8.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1544)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:600)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy8.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1544)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:600)

My configuration :
Windows 7 machine
TOS Big Data v6 (installed in local machine)
VM Clouera 5.4.2
I already install the hadoop bin module (v2.6) and create the environement var HADOOP_HOME.
Thanks for your help.
Moderator

Re: Unable to read à file with tHDFSInput (HDFS, cloudera vm)

Hi,
Are you able to access hdfs successfully from your local machine?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: Unable to read à file with tHDFSInput (HDFS, cloudera vm)

Hi sabrina,
I'm able to access to  hdfs from my local machine.
Actually, i already configurated the hadoop  cluster in  Talend repository. I able to retrive the files from the cluster, but when i tried to use the tHDFSInput to read the file, i got this message :
Exception in component tHDFSInput_1
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-286282631-127.0.0.1-1433865208026:blk_1073742440_1620 file=/user/hdfs/achats.txt
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:938)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:607)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:847)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:897)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at org.talend.fileprocess.UnicodeReader.<init>(UnicodeReader.java:25)
at org.talend.fileprocess.TOSDelimitedReader.<init>(TOSDelimitedReader.java:77)
at org.talend.fileprocess.FileInputDelimited.<init>(FileInputDelimited.java:93)
at local_project.tes_001_0_1.tes_001.tHDFSInput_1Process(tes_001.java:741)
at local_project.tes_001_0_1.tes_001.runJobInTOS(tes_001.java:1136)

Any help please

What’s New for Talend Spring ’19

Join us live for a sneak peek!

Sign up now

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch