Six Stars

tHDFSGet all files in directory

Hi,

 

Is it possible to get all files in HDFS directory and save it as a single file / multiple files on a local machine?

I've want to extract files from a directory with regular expressions but it doesn't seem to work. But it says on the documentation here (https://help.talend.com/reader/g8zdjVE7fWNUh3u4ztO6Dw/PUKLf_wAqRMmwe4w~Lw1wA) that regular expressions is supported in filemasks.

 

I'm basically trying to grab files that match: ".+part-.*" inside a directory (iterating through subdirectories).

These files are the output from the tFileOutputDelimited from a Spark Streaming job.

 

Thank you.

 

Tags (1)
1 ACCEPTED SOLUTION

Accepted Solutions
Six Stars

Re: tHDFSGet all files in directory

Have you tried tHDFSList? You can specify a filemask (glob or regex) with this and iterate through files / directories / subdirectories of a specific hdfs location.  You could then pass the global variable 

((String)globalMap.get("tHDFSList_1_CURRENT_FILEPATH"))

to the "HDFS directory" property of tHDFSGet

2 REPLIES
Six Stars

Re: tHDFSGet all files in directory

Have you tried tHDFSList? You can specify a filemask (glob or regex) with this and iterate through files / directories / subdirectories of a specific hdfs location.  You could then pass the global variable 

((String)globalMap.get("tHDFSList_1_CURRENT_FILEPATH"))

to the "HDFS directory" property of tHDFSGet

Six Stars

Re: tHDFSGet all files in directory

Thank you! I happened to stumble across an example at the bottom part of the documentation too. Didn't know that the autocomplete also works on the component fields.