Hi everybody, I am testing Talend Open Studio for Big Data 5.5.0 with a Hadoop cluster on AWS (Cloudera distribution, CDH4.4.0 version). I have a file called customer.csv, which I am trying to copy from my home directory to a subdirectory called /new. I set up a job that consists of only one component - tHDFSCopy. The job runs for awhile producing an EMPTY file customer.csv in the target directory and ends with the following error: Exception in component tHDFSCopy_1 org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-425321293-10.15.244.108-1401446443266:blk_8047645766350991207_142708 file=/user/kpopov/customer.csv at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:839) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:531) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794) at java.io.DataInputStream.read(Unknown Source) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:260) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:232) at copyfileinhdfs.copyfileinhdfs_0_1.CopyFileInHDFS.tHDFSCopy_1Process(CopyFileInHDFS.java:339) at copyfileinhdfs.copyfileinhdfs_0_1.CopyFileInHDFS.runJobInTOS(CopyFileInHDFS.java:589) Who can tell me, what is going on?
Yes, the connection is correct: the IP is right, the port (8020) is right, the Hadoop version is correct. As I open the Component inset and and click the button next to the File Name field, the Open Studio connects with HDFS fine and lets me choose a directory for my file to be copied. The only problem is, like I said before, the copied file turns out to be empty and the Open Studio ends the job with the error.
Hi kpopov, Is the component working well for you? If the issue is fixed, may I ask you to click the "Set this topic as resolved" link which is right underneath your initial post? This way, other users will be informed that this thread has been resolved. Many thanks Best regards Sabrina
-- Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.