I am new Talend user. I have a quite simple task to do: create a job that has to take data from a table in a database A and update or insert records in a table in a database B.
I tried using the following structure but it is very slow (2 records/s)
Is there a way to perform an Oracle merge (so: check if a record exists based on a specific condition, update if exists and insert otherwise).
I find a tSQLTemplateMerge but I don't know how to use it
If I understand your requirement very well, do you want to capture the changed data and load it into your target DB?
You can compare tables by using tMap.
The work flow should be:
Target DB-->tMap(make inner join on your input and set the "Catch lookup inner join reject" as true)-->output
The output will be the changed data.
you should consider this approach for very large tables. Instead of processing the data in Talend, first stage the incoming data to a temp table, then use a SQL statement to merge the data using the DB.
You are not giving enough information. Carrying out an insert/update on a table with over 2 million records every day is a bad solution unless there is absolutely no other information to identify which rows are new, have changed and have not changed. You are essentially carrying out a cartesian join in terms of comparisons.
What @xdshi suggested is a fair enough suggestion given the original information. 2 million records should not lead to memory issues unless you are working with a very small amount of RAM. Have you tried upping the RAM? How many columns do you import per row? Can you use heuristics to identify which records need to be compared? Are there any records within the data set which will help identify changes? Can you filter your source tables in the DB pre-emptively before carrying out the comparison that @xdshi suggested? Can you limit the number of columns returned by both source and target tables? Can you make use of the tMap memory functionality (lookup Temp Data Directory here https://help.talend.com/reader/NNO~fmVQU4rlkF9Depfdxw/usVFZrXILonTCNdL86safA)?
Once we know more about the problem it will be easier to suggest a better way of approaching this. @JGM has suggested using a Temp table which might be a reasonable thing to give a go. But moving that many records to carry out an in DB process (which is sometimes the best solution, granted), seems a bit of overkill if you are only looking for hundreds of rows (changed/created).
If you don't have a last_modified_date_timestamp column on your tables you should add one.
Then you can find the max(last_modified_date_timestamp) in database B.
Next select from database A where last_modified_date_timestamp is > than the max(last_modified_date_timestamp) from database B.
Watch the recorded webinar!
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.