One Star

Skip rows

Hello,
I have a Job like this

Source Table ----->tMap-------->Destination Table
I want to skip first few rows from source table to be processes. How can i do it?
PS: This job runs several times with new data added to source Table. I don't want the Data which were already loaded to Destination to be loaded again.
Thanks!
12 REPLIES
Community Manager

Re: Skip rows

Hi
If you just want the data that do not exist in the target table to be inserted, you need to do an inner job between the source data and target table, and get the unmatched rows, insert these data into target table.
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Skip rows

it might do the job for now but performance won't be good. Or will it? (For millions of rows)
Four Stars

Re: Skip rows

Hi ksultania,
What is the column structure of your input table?
Do you have unique identification column which identifies new rows?
Do you have time stamp in your input table?
Can you show snapshot of your data with new rows and old rows?
Vaibhav
One Star

Re: Skip rows

I do not have a unique key on which I can take inner join. Also, i don't have time stamp in input table.
What in was thinking is,s every time I pass the data from input table to output Table. I will maintain a count (count of rows). And the next time the job runs, i will start reading the rows from count+1 row.
Is this possible?
Four Stars

Re: Skip rows

Hi Sultania,
Do you also have delete and update from source table or just new insertions?
If there are only new insertions, I have another approach
- Once your initial load is completed, create a copy of your source data (Table A is source and Table B is copy)
- During second execution use (A-B) to get additional records in A which are not present in B
- Insert these new records in target
- Flush out B and make another copy of A
If the records are not too many, another approach would be to perform inner join with A and B and get the rejected records from A which are insertions (it could be update as well)
Thanks
vaibhav
One Star

Re: Skip rows

No, i don't have to delete the anything in source table.
The approach you mentioned will work. But i dont want to create a separate table. Also a lot of memory will be wasted in this approach.
Won't this approach also be inefficient in terms of performance (when there are millions of rows) ?
Thanks for the suggestion though.
Four Stars

Re: Skip rows

So in designing lot of things are dependent on the basic requirements... if you clear the requirements initially, then it becomes easy to plan for approach...
It is better, if you provide the use case scenario with all the details, based on this better approach could be devised.. Have a look at your first post regarding problem definition... and then again reformulate the problem definition...
Thanks
Vaibhav
One Star

Re: Skip rows

Yeah, I should have given more details.
The scenario is like this.
1. File ---> Source Table
2. Source Table ----->tMap-------->Destination Table

we need to copy the content of Source after transformation to Destination, which are not already present in destination.
Source table and destination table do not have a unique key. The data is huge(Millions of records/Rows) So making an extra table will lead to consumption of extra memory.
The 2nd job will run after a new file loads data to Source table.
Four Stars

Re: Skip rows

Hi Sultania,
Do you have a unique column or combination of columns which represents a unique row? If you don't have this, I am afraid about how to do...
What is the file size?
Vaibhav
Vaibhav
One Star

Re: Skip rows

Each file is of approx 1mb. (Containing 3k-4k rows and around 60 columns)
there may be 1000s of files coming in.
Also, I do not have any such combination which leads to unique key formation. I am thinking of inserting a column as index and then make it work.
One Star

Re: Skip rows

Hi Vaibhav,
I have inserted a timestamp as unique key. Is there any way other than A-B using which i can achieve the task? As it consumes a lot of extra memory.
Four Stars

Re: Skip rows

in tMap properties, you file in place of system memory for processing records.
Vaibhav