Is there a way to load data to redshift in Talend spark job without using S3 ?

Five Stars

Is there a way to load data to redshift in Talend spark job without using S3 ?

Hello All,
Is there a way to load data to redshift in Talend spark job without using S3 ?
In Spark Job(BigDataBatch job), by default tRedshiftConfiguration is looking for tS3configuration.

Thanks
Vijay

Employee

Re: Is there a way to load data to redshift in Talend spark job without using S3 ?

Hi,

 

   The Talend Bigdata job component for Redshift mandates to use S3 components as part of data load. This is the case if you try to use the Bulk component for Redshift in Standard job also. 

 

    The only component which helps you to directly load is tRedshiftOutput component in Standard job but its not advised to use it for huge data volumes.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

Highlighted
Five Stars

Re: Is there a way to load data to redshift in Talend spark job without using S3 ?

Thank you Nikhil for your response.

 

S3Configuration in BigDataBatch Job is using s3a file system to enable inherit credentials from AWS role.

Is there any option to use S3N file system with inherit credentials option i.e; that is without providing access Key and secret Key ?

 

Problem is :

when we try to use s3a file system, getting some access issues while running spark job in EMR  using s3a file system option checked in talend s3configuration job.

 

Error:

java.nio.file.AccessDeniedException: s3a://<<Location>>/_temporary/0: innerMkdirs on s3a://<<location>>/_temporary/0: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied;

 

and in my organization they are not willing to share the access key and secret key for all the users, instead we have to use roles to execute jobs. 

I see there is no assume role option like we have in bulk load in standard jobs. 

 

if s3a file system is the only option without access key and secret key, what are those temp folders and what permissions that instance role or redshift role should have to execute the job ?

 

Please provide some information on s3 usage when they load data to redshift in talend bigdata job... Thanks again.

 

Employee

Re: Is there a way to load data to redshift in Talend spark job without using S3 ?

Hi,

 

    Unfortunately my view is that its not possible in Bigdata spark job at the moment. But I would recommend you to either raise a support case or create a JIRA ticket.

 

https://jira.talendforge.org

 

    So I would recommend you to go for a Hybrid Talend Standard Job + Bigdata job approach as a work around.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

Best Practices for Using Context Variables with Talend – Part 3

Read about some useful Context Variable ideas

Blog