tHDFSGet all files in directory

Six Stars

tHDFSGet all files in directory

Hi,

 

Is it possible to get all files in HDFS directory and save it as a single file / multiple files on a local machine?

I've want to extract files from a directory with regular expressions but it doesn't seem to work. But it says on the documentation here (https://help.talend.com/reader/g8zdjVE7fWNUh3u4ztO6Dw/PUKLf_wAqRMmwe4w~Lw1wA) that regular expressions is supported in filemasks.

 

I'm basically trying to grab files that match: ".+part-.*" inside a directory (iterating through subdirectories).

These files are the output from the tFileOutputDelimited from a Spark Streaming job.

 

Thank you.

 

Tags (1)

Accepted Solutions
Eight Stars

Re: tHDFSGet all files in directory

Have you tried tHDFSList? You can specify a filemask (glob or regex) with this and iterate through files / directories / subdirectories of a specific hdfs location.  You could then pass the global variable 

((String)globalMap.get("tHDFSList_1_CURRENT_FILEPATH"))

to the "HDFS directory" property of tHDFSGet


All Replies
Eight Stars

Re: tHDFSGet all files in directory

Have you tried tHDFSList? You can specify a filemask (glob or regex) with this and iterate through files / directories / subdirectories of a specific hdfs location.  You could then pass the global variable 

((String)globalMap.get("tHDFSList_1_CURRENT_FILEPATH"))

to the "HDFS directory" property of tHDFSGet

Six Stars

Re: tHDFSGet all files in directory

Thank you! I happened to stumble across an example at the bottom part of the documentation too. Didn't know that the autocomplete also works on the component fields.

What’s New for Talend Spring ’19

Watch the recorded webinar!

Watch Now

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch