CSV data manipulation

Four Stars

CSV data manipulation

Hi, I'm trying to create a job to process csv data input as follow :

The Input : 

ID             Start_Date     End_Date      Action          Cost 

ID1           Date_S1        Date_E1        Action1        1000

ID1           Date_S2        Date_E2        Action2        200

 

The Output :

If Action2=Value1, the output will be : 

ID             Start_Date         End_Date    Action_Date             Action          Cost 

ID1           Date_S1            Date_E1       Date_S2                  Action2        1000

ID1           Date_S2+1        Date_E2                                                           1200

 

 

I need to apply this changes for each ID with several rows

Any suggestions?

Employee

Re: CSV data manipulation

Hi,

 

     Before going to solution part, I have some queries related to input data. Could you please share your thoughts for these scenarios?

 

a) If we are having a single record, what sould we do for matched records for Action and non matched records?

 

ID             Start_Date     End_Date      Action          Cost 

ID1           Date_S1        Date_E1        Action1        1000

ID2           Date_S2        Date_E2        Action2        200

 

b) If we are having more than two records, what should we do for matched and non matched ids?

 

ID             Start_Date     End_Date      Action          Cost 

ID1           Date_S1        Date_E1        Action1        1000

ID1           Date_S2        Date_E2        Action2        200

ID1           Date_S3        Date_E3        Action3        300

 

c) Is there any chance of duplication with same Action for same id? If yes, what should we do here?

 

ID             Start_Date     End_Date      Action          Cost 

ID1           Date_S1        Date_E1        Action1        1000

ID1           Date_S2        Date_E2        Action2        200

ID1           Date_S3        Date_E3        Action2        300

 

There might be lot of other permutations and combinations also. Could you please first list down the valid combinations before going to the solution part?

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)
Four Stars

Re: CSV data manipulation

Hi and thank you for your reply.

 

a) if we have a single record for an ID it will be written as it is.

 

b) If we are having more than two, the output for your example will be :

 

ID             Start_Date         End_Date    Action_Date             Action          Cost 

ID1           Date_S1            Date_E1       Date_S2                 Action2        1000

ID1           Date_S2+1        Date_E2       Date_S3                 Action3         1200

ID1           Date_S3+1        Date_E3                                                           1500

(PS : I have some actions with cost <>0 that I shouldn't add to the cost so my solution is to change its cost to 0. Is there a smarter solution for this?)

 

c) The same ID can have duplication with same action and they're considered as different action since they have different dates

 

ID             Start_Date         End_Date    Action_Date             Action          Cost 

ID1           Date_S1            Date_E1       Date_S2                 Action2        1000

ID1           Date_S2+1        Date_E2       Date_S3                 Action2         1200

ID1           Date_S3+1        Date_E3                                                           1500

 

I used the "tMemorizeRows" in my job but it's too long since I have to use an asc sorting for the "cost" and a desc sorting for the "Action_date" and I'm still not sure if I got the right results.

I really appreciate your help.

 

Best Regards

Nihel Haddad

Employee

Re: CSV data manipulation

Hi,

 

Could you please share the current job screenshots and the component screenshots of how you are handling the tmemorize?

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)
Four Stars

Re: CSV data manipulation

Hi,

 

I created two jobs : 

1- First job : to calculate the cost  

tfileInputDelimited--> tmap-->tmemorizeRow-->tfileOutputDelimited

  • tMemorizeRow : to memorize one row (ID and the cost)
  • tmap : to calculate the new cost

 

2- Second job : to update the action_date

tfileInputDelimited-->tSortRow1-->tmap-->tmemorizeRow-->tSortRow2-->tfileOutputDelimited

  • tSortRow1 : to inverse the order of the actions for each ID
  • tMemorizeRow : to memorize one row (ID and Start_Date)
  • tmap : to calculate the new "Action_Date
  • tSortRow2 : to get the order back

 

Best Regards

Nihel Haddad

 

What’s New for Talend Spring ’19

Watch the recorded webinar!

Watch Now

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch