cumulative sum values group by column

One Star

cumulative sum values group by column

Hi All,
i am newbie to Talend
i am not able to find out Cumulative sum values. Any idea how to get output?
Example data is:
we have one input like
No|name|salary|Job
1|aaa|3000|Manager
2|bbb|4000|sales
3|sven|2500|marketing
4|joe|3250|Manager
5|jim|2340|sales
6|karl|1500|Manager
7|mac|3000|sales
we have another input as
Job|cum_salary
Manager|45000
sales|0
marketing|25000

output like
No|name|salary|Job|Cum_salary
1|aaa|3000|Manager|48000
4|joe|3250|Manager|51250
6|karl|1500|Manager|52750
2|bbb|4000|sales|4000
5|jim|2340|sales|6340
7|mac|3000|sales|9340
3|sven|2500|marketing|27500
Ex:
Job:Manager,
First row cum_salary is 45000 + 3000 = 48000
second row cum_salary is 48000 + 3250 = 51250
third row cum_salary is 51250 + 1500 = 52750
Job:sales,
First row cum_salary is 0 + 4000 = 4000
second row cum_salary is 4000 + 2340 = 6340
third row cum_salary is 6340 + 3000 = 9340
Job:Marketing
First row cum_salary is 25000 + 2500 = 27500
like that.......
we have cumulative salary based on Job
and like incremental loading
Can you please suggest on this.
cheers,
Raju
Seven Stars

Re: cumulative sum values group by column

I think simplest is to read your cum_salary table and store the cum_salary for each job in a globalMap variable:
tInput --> tJavaRow => globalMap.put(input_row.Job,input_row.cum_salary)
Then read your individuals table and add the salary:
tInput --> tJavaRow
Integer cum_salary = (Integer)globalMap.get(input_row.Job) + input_row.salary;
globalMap.put(input_row.Job,cum_salary);
output_row.No = input_row.No;
output_row.name = input_row.name;
output_row.salary = input_row.salary;
output_row.Job = input_row.Job;
output_row.Cum_salary= cum_salary;
Community Manager

Re: cumulative sum values group by column

Hi
Do a left outer join on tMap base on the Job column, here I create a demo job to show you how to achieve this request, please see my screenshots for details.
tFixedFlowInput_1:
1|aaa|3000|Manager
2|bbb|4000|sales
3|sven|2500|marketing
4|joe|3250|Manager
5|jim|2340|sales
6|karl|1500|Manager
7|mac|3000|sales
tFixedFlowInput_2:
Manager|45000
sales|0
marketing|25000

Cum_salary expression of out1 table on tMap:
(Integer)globalMap.get(""+row1.Job+"")==null?row1.salary+row2.Cum_salary:(Integer)globalMap.get(""+row1.Job+"")+row1.salary

// if (Integer)globalMap.get(""+row1.Job+"")==null, means it is the first row for one Job type.
Result:
Starting job jobA at 19:16 31/07/2013.

connecting to socket on port 3791
connected
.--+----+------+---------+----------.
| tLogRow_1 |
|=-+----+------+---------+---------=|
|No|name|salary|Job |Cum_salary|
|=-+----+------+---------+---------=|
|1 |aaa |3000 |Manager |48000 |
|2 |bbb |4000 |sales |4000 |
|3 |sven|2500 |marketing|27500 |
|4 |joe |3250 |Manager |51250 |
|5 |jim |2340 |sales |6340 |
|6 |karl|1500 |Manager |52750 |
|7 |mac |3000 |sales |9340 |
'--+----+------+---------+----------'
disconnected
Job jobA ended at 19:16 31/07/2013.

Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business

Re: cumulative sum values group by column

i have a requirement like
zoneid loss id(pk) month
z1 10 1 jan
z1 20 3 feb
z1 30 5 march
z2 100 2 jan
z2 200 4 feb
z2 300 6 march
now i need cumalative output like
zoneid loss id(pk) month cumalativeloss
z1 10 1 jan 10
z1 20 3 feb 30
z1 30 5 march 60
z2 100 2 jan 100
z2 200 4 feb 300
z2 300 6 march 600
Thanks,
raviteja
One Star

Re: cumulative sum values group by column

Hi,
Any update on the above give problem cause i m having the same issue as mentioned by raviteja.talend

Regards,
Ojas
Community Manager

Re: cumulative sum values group by column

Hi
@raviteja and Ojas, the tMemorizeRow component can fit your need, refer to the component manual and learn this component.
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
Five Stars

Re: cumulative sum values group by column

@shong

Thank you for your detailed solution!

Could you please explain this part:

 

(Integer)globalMap.get(""+row1.Job+"")

 

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch