I want to implement one functionality in which need to compare source data with destination and depending on the result would like to insert or update at destination tables.
How we can implement this in Talend.
Depending of the target, you may have the choice to decide the operation (insert, update, insert or update, update or insert, upsert and so on).
So, in these cases, you don't have to decide to split the flow giving 2 data sets with 1 operation (insert / update).
If you don't have this facility on your target (Excel for example), you need to read the existing data first, then using a classical tMap with a lookup, get the matching rows on a 1rst flow (for update) and rejected rows on a 2nd flow (for insert).
Hope this helps.
My Target is postgressql and even if 1 field id changes i would like to update and for new records would like to insert.
Do you have any example for this implementation.
If you need a database independent way to know if records in a table have changed, using the free version of Talend (the subscription versions already have CDC components for this) then you can use the following general approach:
Things to be aware of:
Just though I'd preempt any comments/questions regarding generating the CRC/checksum directly in the database instead of in the Talend job.
Most DBMSs have a checksum or similar function, and using this is definitely much more efficient than returning a full result set from a query, and generating a CRC for each row in Talend.
However, each DBMS has its own approach, with everything from MSSQL Server's simple, elegant CHECKSUM function, to MySQL's awful "convert every individual column to a string, handling NULLs correctly, concatenate them all together, and generate a CRC or MD5 hash from this" approach.
Not all data sources will have this of course, and using the approach detailed above, you could even do change detection on data from a spreadsheet or CSV file.
Doing things in a Talend job means you can also join multiple data sources (of completely different types if necessary) and carry out any processing, filtering etc. before generating a CRC of the columns you're interested in tracking changes for, which is really powerful.
If you do decide to generate your checksums on the database server, avoid just doing a CHECKSUM(*) for an entire table. This might appear to be a nice quick way to do things, but if you add/remove fields on the table/query, then the checksums returned for each row will change immediately. Instead, either use a view which explicitly defines the columns to be returned, in which case you *can* do a CHECKSUM(*) on that (recommended) or specify the individual fields in the CHECKSUM function. Doing this will mean that you can add new fields and they wont affect the checksums until you're ready to update the view/query, and can carry out a refresh of the "previous" date as detailed above.
Thanks for the information.
Just to clarify i am not expecting job implementation from anyone here. just requesting direction to some problems which we are facing in implementation. I think this community is to share problems and is someone faced similar issues in the past and all, etc.. we can discuss and help each other..
Absolutely, and my apologies if it sounded like I was suggesting you were expecting this. If I'd felt you were looking for a full implementation rather than just help/suggestions as to how your goal might be accomplished, then I'd simply not have replied to your post at all.
There have been posts recently from people who are certainly looking for someone to do their work for them, and they've blatantly ignored perfectly good, straightforward suggestions as to how they might accomplish their task, and instead asked again for a full solution. My comments in this regard were very much intended for them.
I use the free versions of Talend products for a good deal of my day-to-day work, and so think it's only fair that I try to spend an hour each day giving back to the community via this forum. I'm therefore more than happy to suggest approaches which have worked or me in the past, and help where I can. If that involves exchanges over many days, to work through a problem, then I'm happy to spend the time doing this, as long as I feel that it's a two way process.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.