Integration with Amazon S3

One Star

Integration with Amazon S3

Hello,
I am using Amazon's cloud based services:EC2(elastic compute cloud), Redshift and S3(simple storage service) for hosting a data warehouse. In order to carry out this process, I am using Talend Load Jobs which would be used to transfer the files from FTP to Redshift via a three way process:
i) Fetching of data from FTP and downloading it to EC2
ii) Uploading the data to S3 from EC2
iii) Loading data from S3 to Redshift tables
While executing the Talend Load Job, I found out that step i) is executing normally and correspondingly the files are successfully downloaded to EC2 but step ii) is not executed successfully. The error log displays an error stating that "a region must be specified --region or specifying the region in a configuration file or as an environment variable."
The detailed error log is mentioned below:
A region must be specified --region or specifying the region
in a configuration file or as an environment variable.
Alternately, an endpoint can be specified with --endpoint-url
Exception in component tRedshiftRow_1
org.postgresql.util.PSQLException: ERROR: The specified S3 prefix 'Acxiom/adv_oli_20140910.dat' does not exist
  Detail:
  -----------------------------------------------
  error:  The specified S3 prefix 'Acxiom/adv_oli_20140910.dat' does not exist
  code:      8001
  context:  
  query:     226570
  location:  s3_utility.cpp:545
  process:   padbmaster
  -----------------------------------------------
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2096)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1829)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:510)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:372)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:364)
    at talend_jobs.dup_adv_oli_load_with_error_catching_0_1.Dup_adv_oli_Load_with_error_catching.tRedshiftRow_1Process(Dup_adv_oli_Load_with_error_catching.java:2313)
disconnected
    at talend_jobs.dup_adv_oli_load_with_error_catching_0_1.Dup_adv_oli_Load_with_error_catching.tSystem_1Process(Dup_adv_oli_Load_with_error_catching.java:2199)
    at talend_jobs.dup_adv_oli_load_with_error_catching_0_1.Dup_adv_oli_Load_with_error_catching.tSystem_5Process(Dup_adv_oli_Load_with_error_catching.java:2008)
    at talend_jobs.dup_adv_oli_load_with_error_catching_0_1.Dup_adv_oli_Load_with_error_catching.runJobInTOS(Dup_adv_oli_Load_with_error_catching.java:3442)
    at talend_jobs.dup_adv_oli_load_with_error_catching_0_1.Dup_adv_oli_Load_with_error_catching.main(Dup_adv_oli_Load_with_error_catching.java:3290)
adv_oli_20140910.dat is the name of the file to be transferred from EC2 to S3, and
talend_jobs.dup_adv_oli_load_with_error_catching_0_1 is the name of the Talend 'Load' Job.

Any help would be highly appreciated.
Thanks,
Shobhit
Four Stars

Re: Integration with Amazon S3

Hi Shibhit,
Can you show the job design? Not sure how you are using tRedshiftRow_1 component for loading data into Redshift? 
Please check example scenario for loading data into redshift
https://help.talend.com/pages/viewpage.action?pageId=19426484#tredshiftinputscenario
Please elaborate more on design side.
Thanks
Vaibhav
One Star

Re: Integration with Amazon S3

Hi Vaibhav,
Please find attached the screenshot of the job design.

 
One Star

Re: Integration with Amazon S3

PFA
Four Stars

Re: Integration with Amazon S3

I think here is an issue. S3 is a file storage... tRedShiftRow fires query on database... How you are reading a file using a query? can you show the query which reads file from S3 and inserts data into the redshift?... please look at following scenario...
https://help.talend.com/pages/viewpage.action?pageId=19426484#tredshiftinputscenario
I think you need to download the file to local drive and then insert into the redshift... 
Please let me know if I have understood differently.
Thanks
Vaibhav
One Star

Re: Integration with Amazon S3

The files in raw format are being picked up from the FTP and put on S3 using tsystem_5 component in the screenshot shown. This component downloads the file from FTP to EC2.
The second component copies the data from EC2 to S3 using cp Utility component.
The third component copies the data from S3 to Redshift using the following query for a fixed width raw file:
"create table phase2.adv_oli_increment_raw_temp as select top 1 * from phase2.adv_oli_increment_raw_2;
copy phase2.adv_oli_increment_raw_temp from 's3://WoltersKluwerHealthMR/Acxiom/adv_oli_"+TalendDate.getDate("CCYYMMDD")+".dat' CREDENTIALS '' 
 COMPUPDATE ON ACCEPTINVCHARS MAXERROR AS 50 REMOVEQUOTES TRUNCATECOLUMNS;
"

Thanks,
Shobhit
Four Stars

Re: Integration with Amazon S3

Ok great.. As per error, following path prefix does not exists...
The specified S3 prefix 'Acxiom/adv_oli_20140910.dat' does not exist
Whether the bucket and the prefix exists in the S3? can you check manually? If it exists, then use print statement to check the query...
And test the query using DB client

Vaibhav
Six Stars

Re: Integration with Amazon S3

@Shobhit_G

I would like to know, where is the Talend installed? I also have the same requirement. It will help me. Is it on EC2 instance?