I am working on a Data Acquisition System and I only want to update the records that have changed. I noticed one of the settings for tPostgresqlOuput is to Update/Insert. How does it compare the incoming records with the record at the source? How does it know what to update? What's the where clause in the update statement? Thanks for your response ahead of time.
How does it compare the incoming records with the record at the source?
As you didn't say if you're using a Java or a Perl project, I'll answer for the 2 generated languages. First in Perl. There is not "update or insert" and "insert or update" data action for PostgreSQL because of the atomic transaction mode of this RDBMS. We only have "insert or update". 1. load a cache with all existing keys in the destination table 2. for each incoming line of the data flow, checks if the key is in the cache. Perform an insert if the key is not in the cache, an update is the key is already in the cache. Now in Java, the algorithm is slightly different. For "insert or update", foreach input line: 1. perform a "select count(*) from table where id = <line_id>" 2. if count > 0, perform an update, else perform an insert For "update or insert", foreach input line: 1. perform an update 2. if no line updated, perform an insert Conclusion: Perl algorithm should be faster but memory consuming and dangerous in "concurrent mode". Java algorithm is slower (one more query per data flow line) but takes no memory and should be safer in "concurrent mode".
How does it know what to update? What's the where clause in the update statement?
This is the same for Perl and Java now. By default, if a column of the schema is marked as "key", it is used in the where clause, else it's a column to update. The behaviour can be changed thanks to 2063 (in Perl, only for tMysqlOutput, add a feature request to have it in PostgreSQL) and 2759 (in Java, all existing database connectors). With the "field options", you can select with precision which columns are in "set" list (field is "updatable) and "where" clause (field is in the "update key"). As you can read in the bugtracker, the "field option" feature is available since 2.3.0 only (very recent feature).
In this post are a lot of wrong statements! 1. PostgreSQL does not perform atomic statements per default. 2. It is probably a very bad idea to load all keys into the memory to check if insert is possible! And I am pretty shure, perl wouldn't do this per default. By the way Perl is gone in Talend. An incremental load is a design pattern and depends on the way how to extract the source data. It is not a matter of insert or update table action.