Hi, I have this file which I need to import it along with adding a sequence column, the rule needed for the sequence is as followes:
Columns: ID, ST, PT, BReq, XT
when a new ID the sequence is restarted, but the if the ID is the same as the previous one(s) we compare the 4 columns and see if one of their values is changed, if it does then the sequence will increment by 1, if the 4 columns are the same as the previous one(s) I do not want to insert it in the final file.
1, j1, o1, p1, q1 - 001
1, j2, o1, p1, q1 - 002
1, j2, o1, p2, q1 - 003
1, j1, o1, p1, q1 - should be neglected as the four columns are equal to the ones in the first row.
2, j1, o1, p3, q2 - 001
Any help would be appreciated, thanks ...
OK, this can be done quite easily with a few components. You may need to tweak this if a particular ordering is required, but this certainly meets the rules you have specified. I'll describe this in steps.....
1) Output your data from your file component to a tAggregateRow component. Set this component up to group by ALL of the fields. This will remove duplicates immediately.
2) The next component should be a tMap. This is where you sort out your sequence. In your tMap it is a straight pass through. All you columns entering will also exit the other side PLUS a sequence column. Create this output column. Now in a tMap variable (the box in the middle), use the following code....
routines.Numeric.sequence(row1.ID+"", 1, 1)
I've assumed your row will be "row1" and your ID is not a String (hence I added the +""). This is the name of your sequence. So when the same ID passes through it carries on the count for that ID. A new ID creates a new count.
3) After that, you are done. Do whatever you want with the data and the new sequence.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema
Pick up some tips and tricks with Context Variables