Five Stars

How to use hdfs or hadoop commands in Big Data Sandbox

I'm trying to post my local files into hdfs through terminal in the Big Data Sandbox. But I'm getting error saying hdfs or hadoop commands aren't found. I understand the Cloudera cluster is running as Docker containers but would like to know if there is anyway that we can access the hadoop cluster through Terminal. Appreciate your help!

1 ACCEPTED SOLUTION

Accepted Solutions
Employee

Re: How to use hdfs or hadoop commands in Big Data Sandbox

Hello!  When running a cluster in docker, the commands are installed inside the docker container (and usually not the host).

 

This means that you need to execute the command in the specific container -- fortunately, docker permits you do to do this, but it depends heavily on how the container was configured.   I don't have a running sandbox available, but I suggest trying:

 

docker exec docker_cdh580_1 gosu hdfs hadoop fs -mkdir -p /user/newuser

 

In this example, docker_cdh580_1 is the name of the container running the cluster, gosu hdfs asks to run the command as the hdfs user, and everything following is the command to run.  You can get the name of the container by running docker ps (and hopefully the image name is obvious!)

 

I hope this helps, and let us know if it works!

 

 

 

4 REPLIES
Eight Stars

Re: How to use hdfs or hadoop commands in Big Data Sandbox

Hello,

You need to set your PATH variable to get the system know where are your binaries located.
Please reed this:
https://askubuntu.com/questions/60218/how-to-add-a-directory-to-the-path

Regards
Lojdr
Five Stars

Re: How to use hdfs or hadoop commands in Big Data Sandbox

Appreciate your response. But the concern is, in the Big Data Sandbox provided by Talend, I couldn't find the binaries even. Are they pre-installed somewhere in the sandbox?

Employee

Re: How to use hdfs or hadoop commands in Big Data Sandbox

Hello!  When running a cluster in docker, the commands are installed inside the docker container (and usually not the host).

 

This means that you need to execute the command in the specific container -- fortunately, docker permits you do to do this, but it depends heavily on how the container was configured.   I don't have a running sandbox available, but I suggest trying:

 

docker exec docker_cdh580_1 gosu hdfs hadoop fs -mkdir -p /user/newuser

 

In this example, docker_cdh580_1 is the name of the container running the cluster, gosu hdfs asks to run the command as the hdfs user, and everything following is the command to run.  You can get the name of the container by running docker ps (and hopefully the image name is obvious!)

 

I hope this helps, and let us know if it works!

 

 

 

Five Stars

Re: How to use hdfs or hadoop commands in Big Data Sandbox

Thank you so much, that worked like charm!. Container name is cluster1_cdh580_1 so the below command is what I had to use:

 

docker exec cluster1_cdh580_1 gosu hdfs hadoop fs -mkdir -p /user/newuser