Duplicate rows between dates

Highlighted
Four Stars

Duplicate rows between dates

Hello,

 

I have a user profile history table with the following format : 

Capture.PNG

(Here the same User H950LM has changed profile several times).

I want to duplicate the rows on a monthly basis, in order to visualize the user in each month, with its respective Profile. like the following :

 

Captureé.PNG

 

Thank you,

El Hadi


Accepted Solutions
Community Manager

Re: Duplicate rows between dates

This is an interesting problem and there will be a number of ways to achieve this. However, none that sprung to mind in an easy way. So I have written a job to do this which I will describe below.

 

The job looks like this. I have numbered the components and will describe them in order...

Screenshot 2019-11-21 at 11.16.05.png

 

1) tFixedFlowInput_1

This component simply recreates the data you showed in your example. You will not need this in your job.

 

2) tSortRow_1

This component is used to order your data by your CODE_USER then by your MODIF_DATE. The order (asc or desc) does not matter for CODE_USER, but it is essential that your MODIF_DATE is set to desc.

Screenshot 2019-11-21 at 11.17.43.png

 

3) tMap_2

This component is used to add an END_DATE to each row. 

Screenshot 2019-11-21 at 11.18.00.png

Essentially it makes use of some functionality that is not well known in Talend. The tMap variables (the box in the middle) are processed in order from top to bottom AND they keep their values between rows. What we are doing here is checking when the CODE_USER changes. When it is the same as the previous row, we are adding the previous row's MODIF_DATE to the record as an END_DATE. I will show the code and variables that are used below this.

 

Expression Type Nullable  Variable
Var.THIS_CODE_USER String No OLD_CODE_USER
row2.CODE_USER  String No THIS_CODE_USER
Var.THIS_CODE_USER.equals(Var.OLD_CODE_USER)  boolean No SAME_USER
Var.SAME_USER ? routines.TalendDate.addDate(Var.MODIF_DATE, -1, "dd") : null  Date No END_DATE
row2.MODIF_DATE  Date No MODIF_DATE

 

The order of the variables must be kept the same as above. 

 

This will produce an output recordset that looks like below.....

Screenshot 2019-11-21 at 11.36.19.png

 

4) tJavaFlex_1

This component is a lookup. It takes values passed in to the tMap_1 (described next) and uses those to calculate the dates that need to be returned. It also multiplies the rows. So if your first record has a MODIF_DATE and END_DATE which are 10 months apart, 10 records will be returned with each of those months as dates. The code for this is shown below...

 

Start Code

// start part of your Java code
//Retrieve the MODIF_DATE from the globalMap value set in tMap_1
Date MODIF_DATE = ((Date)globalMap.get("MODIF_DATE"));
//Retrieve the END_DATE from the globalMap value set in tMap_1
Date END_DATE = ((Date)globalMap.get("END_DATE"));

int months = 0;

if(END_DATE!=null){	
        //Set the number of months that need to be set
	months = (int)routines.TalendDate.diffDate(END_DATE,MODIF_DATE, "MM");
}

//Open a FOR LOOP to iterate through the dates
for(int i = 0; i<=months; i++){

 

Main Code

// here is the main part of the component,
// a piece of code executed in the row
// loop

//Set the value being sent to the tMap_1 as a lookup. We are using the 
//"addDate" method to use the iteration number as a value to add in months to
//the MODIF_DATE.
row4.MODIF_DATE = routines.TalendDate.addDate(MODIF_DATE, i, "MM");

 

End Code

// end of the component, outside/closing the loop
}  

5) tMap_1

This component is used to take the lookup data (described above) and multiply the input rows. It also sets the globalMap values used in the above component. The configuration looks like below...

Screenshot 2019-11-21 at 11.18.40.png

Pay attention to the "Lookup Model" and the "globalMap Key" section. When you set the "Lookup Model" to "Reload at each row", it reveals the "globalMap Key" section. This sets the globalMap keys to the MODIF_DATE and END_DATE of each input record supplied by your previous component.

 

6) tSortRow_2

This component is simply used to sort your data by the CODE_USER and new MODIF_DATE. It is essentially the same as the previous tSortRow, but the order of the MODIF_DATE is reversed. This outputs your data in date order asc.

Screenshot 2019-11-21 at 11.19.02.png

 

7) tLogRow_1

This component is simply used to print the output. When I ran this, the output looked like below....

Screenshot 2019-11-21 at 11.51.10.png

View solution in original post


All Replies
Community Manager

Re: Duplicate rows between dates

This is an interesting problem and there will be a number of ways to achieve this. However, none that sprung to mind in an easy way. So I have written a job to do this which I will describe below.

 

The job looks like this. I have numbered the components and will describe them in order...

Screenshot 2019-11-21 at 11.16.05.png

 

1) tFixedFlowInput_1

This component simply recreates the data you showed in your example. You will not need this in your job.

 

2) tSortRow_1

This component is used to order your data by your CODE_USER then by your MODIF_DATE. The order (asc or desc) does not matter for CODE_USER, but it is essential that your MODIF_DATE is set to desc.

Screenshot 2019-11-21 at 11.17.43.png

 

3) tMap_2

This component is used to add an END_DATE to each row. 

Screenshot 2019-11-21 at 11.18.00.png

Essentially it makes use of some functionality that is not well known in Talend. The tMap variables (the box in the middle) are processed in order from top to bottom AND they keep their values between rows. What we are doing here is checking when the CODE_USER changes. When it is the same as the previous row, we are adding the previous row's MODIF_DATE to the record as an END_DATE. I will show the code and variables that are used below this.

 

Expression Type Nullable  Variable
Var.THIS_CODE_USER String No OLD_CODE_USER
row2.CODE_USER  String No THIS_CODE_USER
Var.THIS_CODE_USER.equals(Var.OLD_CODE_USER)  boolean No SAME_USER
Var.SAME_USER ? routines.TalendDate.addDate(Var.MODIF_DATE, -1, "dd") : null  Date No END_DATE
row2.MODIF_DATE  Date No MODIF_DATE

 

The order of the variables must be kept the same as above. 

 

This will produce an output recordset that looks like below.....

Screenshot 2019-11-21 at 11.36.19.png

 

4) tJavaFlex_1

This component is a lookup. It takes values passed in to the tMap_1 (described next) and uses those to calculate the dates that need to be returned. It also multiplies the rows. So if your first record has a MODIF_DATE and END_DATE which are 10 months apart, 10 records will be returned with each of those months as dates. The code for this is shown below...

 

Start Code

// start part of your Java code
//Retrieve the MODIF_DATE from the globalMap value set in tMap_1
Date MODIF_DATE = ((Date)globalMap.get("MODIF_DATE"));
//Retrieve the END_DATE from the globalMap value set in tMap_1
Date END_DATE = ((Date)globalMap.get("END_DATE"));

int months = 0;

if(END_DATE!=null){	
        //Set the number of months that need to be set
	months = (int)routines.TalendDate.diffDate(END_DATE,MODIF_DATE, "MM");
}

//Open a FOR LOOP to iterate through the dates
for(int i = 0; i<=months; i++){

 

Main Code

// here is the main part of the component,
// a piece of code executed in the row
// loop

//Set the value being sent to the tMap_1 as a lookup. We are using the 
//"addDate" method to use the iteration number as a value to add in months to
//the MODIF_DATE.
row4.MODIF_DATE = routines.TalendDate.addDate(MODIF_DATE, i, "MM");

 

End Code

// end of the component, outside/closing the loop
}  

5) tMap_1

This component is used to take the lookup data (described above) and multiply the input rows. It also sets the globalMap values used in the above component. The configuration looks like below...

Screenshot 2019-11-21 at 11.18.40.png

Pay attention to the "Lookup Model" and the "globalMap Key" section. When you set the "Lookup Model" to "Reload at each row", it reveals the "globalMap Key" section. This sets the globalMap keys to the MODIF_DATE and END_DATE of each input record supplied by your previous component.

 

6) tSortRow_2

This component is simply used to sort your data by the CODE_USER and new MODIF_DATE. It is essentially the same as the previous tSortRow, but the order of the MODIF_DATE is reversed. This outputs your data in date order asc.

Screenshot 2019-11-21 at 11.19.02.png

 

7) tLogRow_1

This component is simply used to print the output. When I ran this, the output looked like below....

Screenshot 2019-11-21 at 11.51.10.png

View solution in original post

Four Stars

Re: Duplicate rows between dates

Thank you, the solution covers my need.

 

El Hadi

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog