Hi, I have this file which I need to import it along with adding a sequence column, the rule needed for the sequence is as followes:
Columns: ID, ST, PT, BReq, XT
when a new ID the sequence is restarted, but the if the ID is the same as the previous one(s) we compare the 4 columns and see if one of their values is changed, if it does then the sequence will increment by 1, if the 4 columns are the same as the previous one(s) I do not want to insert it in the final file.
1, j1, o1, p1, q1 - 001
1, j2, o1, p1, q1 - 002
1, j2, o1, p2, q1 - 003
1, j1, o1, p1, q1 - should be neglected as the four columns are equal to the ones in the first row.
2, j1, o1, p3, q2 - 001
Any help would be appreciated, thanks ...
OK, this can be done quite easily with a few components. You may need to tweak this if a particular ordering is required, but this certainly meets the rules you have specified. I'll describe this in steps.....
1) Output your data from your file component to a tAggregateRow component. Set this component up to group by ALL of the fields. This will remove duplicates immediately.
2) The next component should be a tMap. This is where you sort out your sequence. In your tMap it is a straight pass through. All you columns entering will also exit the other side PLUS a sequence column. Create this output column. Now in a tMap variable (the box in the middle), use the following code....
routines.Numeric.sequence(row1.ID+"", 1, 1)
I've assumed your row will be "row1" and your ID is not a String (hence I added the +""). This is the name of your sequence. So when the same ID passes through it carries on the count for that ID. A new ID creates a new count.
3) After that, you are done. Do whatever you want with the data and the new sequence.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Part 2 of a series on Context Variables
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema