I'm trying for a better approach on how to do the incremental load to Redshift. Source might be Hive or S3 or any db
Without deletes, I would like to do the updates. Please give me a high level of what components and approach would be better.
Data update is always a costly operation in any data warehouse and same is the case in Redshift also. You can use tRedshiftOutput component to perform the update operation. But if would be a good idea to design your tables in Redshift in such a way that it will always accept multiple records for same underlying customer information and you will pick only the latest record for an id (even if multiple records are present) for downstream operations (choose your indexes also accordingly).
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema
Pick up some tips and tricks with Context Variables