Moving files on HDFS supported?

Hello,
I was wondering whether moving files on HDFS is supported by talend?
1) I have tried hdfs rename component but no luck since it fails when I do following operation
move /root/path1/folder123 /root/path2/folder123. When I do rename operation renaming component location (path1 to path2) it fails. Although, if I rename actual folder (move /root/path1/folder123 /root/path1/folder456) it works.
2) I tried to use hdfs copy with different configurations to make this work but it looks like it is actually doing "copy" operation in all cases regardless of configuration. For example I tried different configuration variations including checking "Rename" box and "Remove Source" box, but non of the configurations actually did the "move", they all do "copy" operation which is "very expensive" on hdfs.
Any suggestions?
2 REPLIES

Re: Moving files on HDFS supported?

Any suggestions on this one?
One Star

Re: Moving files on HDFS supported?

Well. Since no one else will answer the question. I (while sitting next to you) will answer it for you and more importantly for everyone else who might be interested.
The method we use to "move" files or directories are hdfs is through the use of the tHdfsCopy component, which is relatively straight forward. However, there is one big caveat (might be a bug). On the first attempt to copy it will move the subject directory perfectly, but on subsequent moves it will actually nest the source directory on the target location side one time, then error out on additional attempts.
Example:
Source Directory Path: /user/admin/srcDir/{file1, file2, file3, dir1, dir2}
Target Directory Path: /user/admin/targetDir
Using tHdfsCopy fill in the "Source File or Directory" as the Source Directory Path defined above, then fill int the "Target Location" with the Target Directory Path defined above (don't check 'Remove Source'). The first time we run this component the following happens:
Source Directory Path: /user/admin/srcDir/{file1, file2, file3, dir1, dir2}
Target Directory Path: /user/admin/targetDir/srcDir/{file1, file2, file3, dir1, dir2}
The result is great. However, if you run it a second time this happens:
Source Directory Path: /user/admin/srcDir/{file1, file2, file3, dir1, dir2}
Target Directory Path: /user/admin/targetDir/srcDir/srcDir/{file1, file2, file3, dir1, dir2}
See how it adds a nested srcDir. If you run it a third time it will error out saying something like, "the directory already exists". Playing around with the "Override Target File" doesn't do anything. The solution we (you and I together) came up with was the following flow:
tHdfsExists -- if true --> tHdfsDelete target location --> tHdfsCopy srcDir to targetDir --> done.
-- if false --> tHdfsCopy srcDir to targetDir --> done.
This ensures that no odd nesting happens if a directory with the same name exists on the target side. However, be careful with this because if there's something in that directory you need... well you just deleted it.
Hopefully, other people will find this response helpful. As I mentioned, Nicholas and I are colleagues.