Source Table ----->tMap-------->Destination Table I want to skip first few rows from source table to be processes. How can i do it? PS: This job runs several times with new data added to source Table. I don't want the Data which were already loaded to Destination to be loaded again. Thanks!
Hi If you just want the data that do not exist in the target table to be inserted, you need to do an inner job between the source data and target table, and get the unmatched rows, insert these data into target table. Shong
---------------------------------------------------------- Talend | Data Agility for Modern Business
Hi ksultania, What is the column structure of your input table? Do you have unique identification column which identifies new rows? Do you have time stamp in your input table? Can you show snapshot of your data with new rows and old rows? Vaibhav
I do not have a unique key on which I can take inner join. Also, i don't have time stamp in input table. What in was thinking is,s every time I pass the data from input table to output Table. I will maintain a count (count of rows). And the next time the job runs, i will start reading the rows from count+1 row. Is this possible?
Hi Sultania, Do you also have delete and update from source table or just new insertions? If there are only new insertions, I have another approach - Once your initial load is completed, create a copy of your source data (Table A is source and Table B is copy) - During second execution use (A-B) to get additional records in A which are not present in B - Insert these new records in target - Flush out B and make another copy of A If the records are not too many, another approach would be to perform inner join with A and B and get the rejected records from A which are insertions (it could be update as well) Thanks vaibhav
No, i don't have to delete the anything in source table. The approach you mentioned will work. But i dont want to create a separate table. Also a lot of memory will be wasted in this approach. Won't this approach also be inefficient in terms of performance (when there are millions of rows) ? Thanks for the suggestion though.
So in designing lot of things are dependent on the basic requirements... if you clear the requirements initially, then it becomes easy to plan for approach... It is better, if you provide the use case scenario with all the details, based on this better approach could be devised.. Have a look at your first post regarding problem definition... and then again reformulate the problem definition... Thanks Vaibhav
Yeah, I should have given more details. The scenario is like this. 1. File ---> Source Table 2. Source Table ----->tMap-------->Destination Table
we need to copy the content of Source after transformation to Destination, which are not already present in destination. Source table and destination table do not have a unique key. The data is huge(Millions of records/Rows) So making an extra table will lead to consumption of extra memory. The 2nd job will run after a new file loads data to Source table.
Each file is of approx 1mb. (Containing 3k-4k rows and around 60 columns) there may be 1000s of files coming in. Also, I do not have any such combination which leads to unique key formation. I am thinking of inserting a column as index and then make it work.