Error while running tHDFSCopy

One Star

Error while running tHDFSCopy

Hi everybody,
I am testing Talend Open Studio for Big Data 5.5.0 with a Hadoop cluster on AWS (Cloudera distribution, CDH4.4.0 version). I have a file called customer.csv, which I am trying to copy from my home directory to a subdirectory called /new. I set up a job that consists of only one component - tHDFSCopy. The job runs for awhile producing an EMPTY file customer.csv in the target directory and ends with the following error:
Exception in component tHDFSCopy_1
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-425321293-10.15.244.108-1401446443266:blk_8047645766350991207_142708 file=/user/kpopov/customer.csv
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:839)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:531)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794)
at java.io.DataInputStream.read(Unknown Source)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:260)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:232)
at copyfileinhdfs.copyfileinhdfs_0_1.CopyFileInHDFS.tHDFSCopy_1Process(CopyFileInHDFS.java:339)
at copyfileinhdfs.copyfileinhdfs_0_1.CopyFileInHDFS.runJobInTOS(CopyFileInHDFS.java:589)
Who can tell me, what is going on?
Four Stars

Re: Error while running tHDFSCopy

Have you checked the scenario - https://help.talend.com/pages/viewpage.action?pageId=9310644#ychen-20120907-bigdata-thdfslist_scenar...
Whether the connection is ok ?
Vaibhav
One Star

Re: Error while running tHDFSCopy

Yes, the connection is correct: the IP is right, the port (8020) is right, the Hadoop version is correct. As I open the Component inset and and click the button next to the File Name field, the Open Studio connects with HDFS fine and lets me choose a directory for my file to be copied. The only problem is, like I said before, the copied file turns out to be empty and the Open Studio ends the job with the error.
One Star

Re: Error while running tHDFSCopy

Ok, it seems like the issue was the closed 50010 port for data transfer on the datanode.
Moderator

Re: Error while running tHDFSCopy

Hi kpopov,
Is the component working well for you? If the issue is fixed, may I ask you to click the "Set this topic as resolved" link which is right underneath your initial post? This way, other users will be informed that this thread has been resolved.
Many thanks
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.