One Star

Connecting to HDFS system from Windows environment... <Newbie to BD>

Hi,
I am newbie in TOS_BD, was trying to connect to the HDFS system remotely and write some files there but was getting cought up with some errors have attached the job for the flow please have a look at it and suggest.

Process flow description:
-------------------
1) Connected to the remote linex system using tSSH
2) To connect to the HDFS system inside that system used tHDFSConnection
3) Used tHDFSPut to put the file in HDFS system.

Error Log:
-------------------
Starting job test02 at 12:04 13/08/2012.

connecting to socket on port 3489
connected
Talend Open Studio
Exception in component tHDFSPut_1
java.net.UnknownHostException: unknown host: cmtest001
at org.apache.hadoop.ipc.Client$Connection.(Client.java:195)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
at org.apache.hadoop.ipc.Client.call(Client.java:720)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at test_proj.test02_0_1.test02.tHDFSPut_1Process(test02.java:530)
at test_proj.test02_0_1.test02.tHDFSConnection_1Process(test02.java:469)
at test_proj.test02_0_1.test02.tSSH_1Process(test02.java:382)
at test_proj.test02_0_1.test02.runJobInTOS(test02.java:862)
at test_proj.test02_0_1.test02.main(test02.java:730)
disconnected
Job test02 ended at 12:04 13/08/2012.

11 REPLIES
One Star

Re: Connecting to HDFS system from Windows environment... <Newbie to BD>

Can TOS_BD be connected to remote HDFS ?
--
Regards,
Vinod
One Star

Re: Connecting to HDFS system from Windows environment... <Newbie to BD>

Any help would be appriciated.
--
Regards,
Vinod
Employee

Re: Connecting to HDFS system from Windows environment... <Newbie to BD>

I haven't tried with tSSH, but certainly TOSBD works when that's not a requirement. See the Talend Channel on youtube for a good number of Big data examples. Although, none of them use 'tSSH'.
Ciaran
One Star

Re: Connecting to HDFS system from Windows environment... <Newbie to BD>

Thanks a lot for reply.
Unfortunately from here I cannot access youtube due to some reasons, however will try to browse through the videos later.
From your reply I understand that it certainly works when Talend is installed on the BD environment, on the same linux box.
Will get back to you after trying out suggested options, at the moment I am planning to reformat my system to create some space for hadoop env. where I can test the HDFS get/put.
--
Vinod
Employee

Re: Connecting to HDFS system from Windows environment... <Newbie to BD>

Hi
I run Hadoop on a VM, and connect using TOSBD over the HDFS and/or the Templeton interface. TOSBD doesn't require you use SSH.
Ciaran
One Star

Re: Connecting to HDFS system from Windows environment... <Newbie to BD>

Hi Ciaran,
I believe since both VM and HDFS are on the same env. so may be that is why they do not require authentication, but what when you are accessing the HDFS server from totally different location. In my case I am running my TOS_BD on Windows XP and from this box I am trying to copy some files to a different machine which is Hadoop server which is geographically located somewhere else.
As per my understanding it should not have any implications as I have all the access credentials.
Well that is a real time need too, as we are planning to migrate all the files to our hadoop env. for many reasons one could be to place all the files to a high speed central location so later if required we can preform analytics plus utilize features of big data env.
--
Regards,
Vinod
One Star

Re: Connecting to HDFS system from Windows environment... <Newbie to BD>

Hi vinod ,
I am trying to do the same,connecting my TOS (My OS:WindowsXP) to a remote Ubuntu System where the HDFS system is located.....
Am getting the same error as shown by you....
Any help would be appreciated....
Regards,
Nayan.
Employee

Re: Connecting to HDFS system from Windows environment... <Newbie to BD>

Hello,
It's totally possible to connect to HDFS from a windows client. (Talend on windows and a remote cluster).
When I see your error, there are different possibilities:
1 - Your namenode hostname and the namenode IP are not binded correctly on the namenode side (/etc/hosts file)
2 - Your namenode hostname and the namenode IP are not binded correctly on the client side (hosts file in the C:/Windows/System32/drivers/etc folder)
3 - You have not set the good parameters in the component. What have you put in the namenode URI parameter?
Rémy.
One Star

Re: Connecting to HDFS system from Windows environment... <Newbie to BD>

Sh.t! Do the Talend's helpdesk people know what their tools are made for? It's just keeping replies, just waste of time!
Four Stars

Re: Connecting to HDFS system from Windows environment... <Newbie to BD>

"java.net.UnknownHostException: unknown host: cmtest001"
That is your problem. Your laptop is unable to resolve the hostname you provided. Use the IP address, or add the machine to your HOST file, or sort your DNS out.
One Star

Re: Connecting to HDFS system from Windows environment... <Newbie to BD>

Hi working on a POC for a client - cant get hadoop big data mapping working-any ideas before I assume it wont work?
Thanks
Hadoop running on AWS (4.6) (I havent modified any configurations) - I can run standard hive jobs ok so not cluster issue
Running evaluation client studio locally (i can access namenode uri and ssh to the name node)
Also on the hadoop config I left the staging directory default value since the xml files do not contain entries as mentioned in the talend documentation. Using default user hdfs to execute
This is the error (I was getting a winutils error but installed winutils to a bin directory to get arouund that one)
tarting job SimpleJobMapR at 10:33 13/10/2016.
: org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
connecting to socket on port 3571
connected
java.io.IOException: DataStreamer Exception:
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:574)
Caused by: java.nio.channels.UnresolvedAddressException
    at sun.nio.ch.Net.checkAddress(Unknown Source)
    at sun.nio.ch.SocketChannelImpl.connect(Unknown Source)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1548)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1324)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1277)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:454)
disconnected
Job SimpleJobMapR ended at 10:34 13/10/2016.
This is the map screen shot after running