The Definitive Guide to Data Quality
I am trying to move data from several files in S3 bucket into Redshift tables. I have tried several options for this and wanted to know the best approach that I can use for this.
Note that I can copy the files from S3 to Redshift tables without talend and it works great. However, wanted to use Talend as I will be using this for incremental loads (without dropping/clearing the tables).
I have tried the following options.
1. tS3List --> tRedshiftBulkExec : fails on delimiter not found. Still trying to identify why this issue is occurring. Copy is working.
- tRedshiftBulkExec requires me to reenter my S3 credentials. This is kind of crazy as I already have a S3 connection and it would be great it the component can just use an existing connection.
2. tRedshiftRow : I enter the copy command here and it works. However this is not ideal when using an ETL tool. I can as well just use basic scripting. Ideally #1 should work.
3. S3 --> tS3Get (writes to local, ugh!!) --> tFileInput (from local) --> tRedshiftOutput (defeats the purpose of S3 to Redshift).
What is the best way to accomplish this using Talend OS for Big Data ver 6.1.1?