Scenario based Question in Talend???

Four Stars

Scenario based Question in Talend???

I have one scenario based question to all the Talend forum members. It come like this as given below :-
I)Suppose you are having source text file named "A.txt" which contains billons of records.
The task is to extract all the data from source file and loads into target table named " Table D".
When millions of records are inserted into your target table, in the middle of your job for the particular record/data you get error.
After resolving the error, when you execute the job again I want my job to do the following :-
The already inserted records should not be inserted again in the "target table D" keeping the performance of your job in mind.
How can we do?
Consider the cases : 1) Source file may contain unique records
and 2) It may contains duplicate records
B) If you the use of tmemorize components, how we can achieve/do this task without using the components?
Thanks for your help in advance.
Regards
SWABHOSA
Moderator

Re: Scenario based Question in Talend???

Hi,
Have you checked "insert or update" option in t<DB>output. On schema of t<DB>Output, check the 'key' box of the columns which are the primary key in database, if you select 'insert or update' option, the data will be inserted if it does not exist yet, otherwise, it is updated.
Best  regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: Scenario based Question in Talend???

Hello Sabrina,
Thanks for your reply. I am aware of that "insert and update" option in t<db> component.
What if you have to insert only their will be no use of the "Update" options.
When I re-run my job, it should insert the new records only, not the already inserted records.
Best Regards:-swabhosa
One Star

Re: Scenario based Question in Talend???

If it has a key column.
You can simply look up the data in "Target D" and join it with your input file using a tmap.  Records that match can be filtered out.
Four Stars

Re: Scenario based Question in Talend???

Hello Egoetsch,
Thanks for reply!!!
It does not have an key column.
Regards
Swabhosa
Five Stars

Re: Scenario based Question in Talend???

Any answer for this post????
One Star

Re: Scenario based Question in Talend???

can tell me what could be the error ? if it  is  data mismatch error, you can always do the schema level checks  then pull out the  rejected records into the file and only correct data can taken further for inserting into database
Five Stars

Re: Scenario based Question in Talend???

1. In case of Source file may contain unique records:

==> In this case, you can import all the files to the Destination table. And before inserting into Destination table use tMap to lookup with the source file unique column. Map the primary key from the inserting main flow to the primary unique column of lookup source file. And keep the 'Inner Join Reject' setting to True in tMap at the output side.

This will insert only those records which are not in the destination table if few of the records are already there is destination table, then it will reject based on an inner join.

 

1. In case of Source file may contain duplicate records:

==> I am assuming here the whole row does not have duplicated values, only a few columns have duplicated values.

For example:  | Id   |  Name |  Age |  Country |

                         2       John       25       USA

                         5       John       28       USA

                         8       Jack       32        UAE

 

For above case use tAddCRCRow component while inserting data into dest table. And apply/check CRC for all columns, so each record will get a unique CRC value. After that before inserting data into Dest. table join the main flow with source file CRC value as lookup and keep the 'Inner Join Reject' to True at the output side in tMap.

This will insert only those records which are not in the destination table if few of the records are already there is destination table, then it will reject based on an inner join.

 

Don't forget to give kudos if this reply is worked for you. Also, suggestions/ improvements for this solutions are appreciated.