Hi, I have this file which I need to import it along with adding a sequence column, the rule needed for the sequence is as followes:
Columns: ID, ST, PT, BReq, XT
when a new ID the sequence is restarted, but the if the ID is the same as the previous one(s) we compare the 4 columns and see if one of their values is changed, if it does then the sequence will increment by 1, if the 4 columns are the same as the previous one(s) I do not want to insert it in the final file.
1, j1, o1, p1, q1 - 001
1, j2, o1, p1, q1 - 002
1, j2, o1, p2, q1 - 003
1, j1, o1, p1, q1 - should be neglected as the four columns are equal to the ones in the first row.
2, j1, o1, p3, q2 - 001
Any help would be appreciated, thanks ...
OK, this can be done quite easily with a few components. You may need to tweak this if a particular ordering is required, but this certainly meets the rules you have specified. I'll describe this in steps.....
1) Output your data from your file component to a tAggregateRow component. Set this component up to group by ALL of the fields. This will remove duplicates immediately.
2) The next component should be a tMap. This is where you sort out your sequence. In your tMap it is a straight pass through. All you columns entering will also exit the other side PLUS a sequence column. Create this output column. Now in a tMap variable (the box in the middle), use the following code....
routines.Numeric.sequence(row1.ID+"", 1, 1)
I've assumed your row will be "row1" and your ID is not a String (hence I added the +""). This is the name of your sequence. So when the same ID passes through it carries on the count for that ID. A new ID creates a new count.
3) After that, you are done. Do whatever you want with the data and the new sequence.
Watch the recorded webinar!
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.