How do I aggregate data, conditionally?

Seven Stars

How do I aggregate data, conditionally?

I've a raw file which has data which looks like this:

 

account_code|payment_period|payment_date|transaction_type_code|fund_type_code|amount|next_date|payment_status

7H0068T1|201901|2019-01-15|RFP|C|2521.40000000|2019-06-14|DONE
7H0068T1|201901|2019-01-15|RFX|C|678.60000000|2019-06-14|DONE
7H0068T1|201901|2019-01-15|WHT|C|800.00000000|2019-06-14|DONE

7J0004T1|201912|2019-12-13|RFP|C|985.31000000|2019-06-14|PENDING
7J0004T1|201912|2019-12-13|RFX|C|614.69000000|2019-06-14|PENDING
7J0004T1|201912|2019-12-13|WHT|C|400.00000000|2019-06-14|PENDING

 

I want to sum up the values from the amount column, for each account (account_code), for each day (payment_date) -> only for the records where transaction_type is RFP or RFX. So in the above data, for 7H0068T1, I need to sum up 2521.4 and 678.6 (and exclude WHT/ 800.0)

 

The iAggregate node doesn't seem to work for me or I am missing something.

 

Technically it's a combo of 'group by' and 'where' and the tAggregate only provides me the option to group by 

 

Eight Stars

Re: How do I aggregate data, conditionally?

Hello,

 

Please add a tFilterRow or tMap Column before tAggregateRow. tAggregateRow does not support Group By with Where.

 

Thanks,

Subhadip

Seven Stars

Re: How do I aggregate data, conditionally?

That's exactly what I started doing. Can't the tFilter row output to multiple streams? Based on my data type, I'd want to output to three streams, for three types of data values.

Eight Stars

Re: How do I aggregate data, conditionally?

Hello,

 

If you want to divide data in three output streams along with filter, please use tMap instead of tFilterRow.

 

Thanks,

Subhadip

Employee

Re: How do I aggregate data, conditionally?

Hi @sushantV 

 

    Could you please explain more when you are saying output to three streams? Right now, the information provided is partial so the solutions provided also is based on the inputs from your end.

 

     If your objective is to filter the data to multiple groups and then take the aggregation, you can do that way. What you will have to do is to filter the unwanted records out (you can send the unwanted types to different stream by Reject condition) and then send to different flow. 

 

     Now if your objective is to take aggregate for each transaction type and then send it to different flow, that is also possible. After taking the aggregation, you can send the result set to a tMap where you can add the output expression filter to pick necessary transaction types.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Seven Stars

Re: How do I aggregate data, conditionally?

@nikhilthampi 

Nikhil, there are three subtypes in the data and I need to aggregate just two of them. So I split up the data into three streams (one for each subtype), using the tMap. Let's call these three streams A, B and C. I intend to sum up the A and B, and ignore the C for now. I thought I'd be able to combine them using the tMap but I gues I am missing something,

 

Here is how the flow looks like: Screenshot from 2019-06-06 14-23-18.png

 

I inted to combine the RFP and RFX data so that I can aggregate the amounts for each account, on each day. That can be done using tAggregate but I need to combine the data, before I can attempt the tAggregate node.

Employee

Re: How do I aggregate data, conditionally?

Hi,

 

    Let us first do the segregation of data for the two types you want.

image.png

 

If you do like above, it will pick both types and remove any other types. Now, send this data to the aggregation component. Do not add the transaction type as a field for aggregation so that your sum will be based on only key fields like account number.

 

In this way you should be able to get your results.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Seven Stars

Re: How do I aggregate data, conditionally?

@nikhilthampi I am dumb. Don't have to over-complicate things. Your solution is elegant. Thanks.

 

BTW I still need a way to combine flows. A further requirement wants me to sum up for all the subtypes. I can find my way around it but I am curious about knowing how could I combine two subtypes (as in the screen I shared), into a tMap component.

 

 

Employee

Re: How do I aggregate data, conditionally?

Hi,

 

    No need to worry. You will soon become a master of Talend :-)

 

    Coming to your followup query, you can do that before segregation of data using tMap. I would suggest you to use a tReplicate to duplicate the data to two streams. One stream can be used for total sum where you will send the output to a taggregaterow directly. Another stream can be passed to tMap where you will separate the data based on transaction types followed by taggregaterow.

 

    Hope I have answered all your queries for this post. Please spare a second to mark the topic as resolved :-)

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

 

 

Seven Stars

Re: How do I aggregate data, conditionally?

Almost there. Even after I split the flow in two and process them separately, I need to combine them all, before putting that info in the database. How can I do that?

Employee

Re: How do I aggregate data, conditionally?

Hi,

 

    If you are using a DB, why you need to combine the dataset? You can add a tDBOutput to each stream so that each dataset will be loaded independently.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Seven Stars

Re: How do I aggregate data, conditionally?

The reason is that I wish to hit the DB once. As in, combine everything into one stream and then update the DB using tDBOutput

Employee

Re: How do I aggregate data, conditionally?

Hi,

 

    I don't think DB will have any serious impact if we hit 3 times instead of 1 especially in batch mode.

 

    If you want to use a single session, you can use a tDBConnection at the beginning and use the same connection for all 3 DB objects. End of day, you will have to use DB resources to push based on the number of records.

 

    But if you really want to collect the data to one dataset and then push to DB (definitely not my preferred method), you can first collect the data to three tHashOutput components (where the second and third hash will be in append mode to first one). Then read this Hash using a tHashInput and then push to DB (this means memory overhead due to usage of Hash).

 

    You can use files instead of Hash to do the same operation but this means you are doing additional IO and CPU operations. So my view is that the optimum way is to push through three DB components.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog