Six Stars

how to define tHDFSInput for hadoop archive (.har) files

I am mapping hadoop files to oracle rdbms (transferring the files from hdfs to rdbms)

I have successfully been able to integrate the files using tHDFSInput into a pre-defined schema and tMap to tOracleOutput -  this works,  Talend is able to see the files and transfer to oracle.

 

But i have archived files in .har format and i have tried to define by Namenode URI as  har ://file-myhost:8020//user/job.har   using the format (har://<schema>-<host>:<port>/path to har folder)  but talend is not able to un-archive the files nor able to view the files

 

Does anyone know how i can define my URI and FileName in tHDFSInput

 

thanks

Buks

  • Big Data
  • Data Integration
3 REPLIES
Moderator

Re: how to define tHDFSInput for hadoop archive (.har) files

Hello,

So far, there is no component  to read/get Archives files under HDFS.

Here is a new feature jira issue:https://jira.talendforge.org/browse/PMBD-123 about 'Support for Hadoop Archives'.

 

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Six Stars

Re: how to define tHDFSInput for hadoop archive (.har) files

when i launched the link here is what i got below, how do i request for access if data still exist

 

This issue can't be viewed

The issue you're trying to view can't be displayed.
It may have been deleted or you don't have permission to view it right now.

Moderator

Re: how to define tHDFSInput for hadoop archive (.har) files

Hi,

This new feature issue is still in process(roadmap). We will keep you posted when it will be available in our released version.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.