How to count the number of similar lines

One Star

How to count the number of similar lines

Here is my problem:
I've got a flow of record similar to this schema:
Ref      Open Date    Description  Type   SubType
AAA     15/09/2014  xxxxx           A1      a2
AER     16/10/2014   wyyxxx        B2      ac2
I need to add a column that give for each record, the amount of records that avec the same type and subtype for the last 30 days based on the open date.
I tried with a Tmap, passing the table in main and in lookup, filtering the lookup table on the type/SubType being equal to the Type/Subtype given in the main link, and having the lookup open date between the main open date and main open - 30 days. My problem is that I dont know how to count the number of rows that are then given in the lokkup table after being filtered.
I don't konw if it's the correct way to handle this problem.
One Star

Re: How to count the number of similar lines

Use tAggregate component , and use CoUNT function ( grouping Type & Scheme ) , or filter the last 30 days before you aggregate
One Star

Re: How to count the number of similar lines

Thanks for the quick answer!
I thought about your solution and tried it, but I cant figure how to filter the data for the last 30 days.
I've got rougthly 70 000 records over 2 years of data to parse, and I need to parse all the data each time because some old records may appear from time to time...
With tAggregate, I am able to count the number of similar records, but I have not been able to find a way to restrict the count to the last 30 days according the record I am checking. Any idea on how to do it?
I was thinking of tMap since it's parsing data after data, and I can compare it to a lookup table that may be filtered acording value of my main data... But I have not been able in this case to find a way to get the number of data in the lookup table that are returned after being filtered.
So, in both way I am currently stuck.
One Star

Re: How to count the number of similar lines

Ok, I tried those past days, but I am unable to group an aggregate on a condition (last 30 days), neither am I able to filter the data for the past 30 days for a specific record before processing it inside a tAggragete...
Anyone with a solution? I am stuck and my current solution is to perform the calculation inside Excel with Macro or Formula, which is not the goal...
Four Stars

Re: How to count the number of similar lines

Hi John,
try following flow
input----> tFilter(OpenDate-30 days) -->tJavaRow (just sync columns)--> toutput
tjava - you can get the total number of records from system variable based on tJavaRow
In tFilter you can use talend date functions to substract 30 days from open date
Try this.
One Star

Re: How to count the number of similar lines

Hi Sanvaibhav
Ok, I get the logic from what you present. I did not thought the way of processing the data into 2 subjob.
Now, to be able to implement it, I need to understand how I pass the input in it. Let me explain (I am rather a newbie in Talend):
In input I have an Excel file (but it doesn't matter the format):
Each line represent a reference to parse, with the different colums giving the information for the reference.
The whole records must be parsed, and all these records are also the base in which I must compare the different date.
That means that for each row, I must compare the date (to the last 30 days compare to the start date of the data I am looking at), with the same description, type and subtype.
With your suggestion, I dont know how to go from reading my Excel file, to go thru the Tfilter (I think I can do the rest)...
If I do a tfilter base on the open date, will Talend process each line one after one, sending to the tJavaRow with the correct selected lines?
Four Stars

Re: How to count the number of similar lines

Hi John,
- To re-confirm again (in case of confusion), two subjobs are not two different talend jobs, they are two blocks connected with OnSubJobOk connector in single job
- tFileInputExcel--->tFilterRow-->tLogRow
tJavaRow (You can get filtered count using system variable for Filter

You can use following expression in tFilterRow
Try this
One Star

Re: How to count the number of similar lines

Hi Sanvaibhav,
sorry for the late answer.
Thanks for the example, but you explanation but it was what I understood. But it shall not work, let me explain:
it will work for one raw of my imported excel file, which means:
I check the raw, I check it's closure date, then with your two subjob will allow me to count the number of line that are similar within one month. That's ok, but I have approximativedly 80 000 lines (records). I need to give for each line the number of similar records withing one month. I dont see how, with your example, I can parse each line one after one and store the result in the record.
Here is the current Job I have done. The result I need to parse is currently stored in the tFileOutputExcel_2.

What I want to build is the last column: reccurence:

What I can do with your example is build only the value for 1 line...Not for each line...Or I am confused somewhere with how Talend works.
Note: I dont want to know how many similar value for the last month compared to the current date, but compared to the close date of the record I am looking the values for. In my example, for the ref 749611, which close date is 16/10/14 18:48, I have one more similar record between the 16/09 and the 18/10...


Talend named a Leader.

Get your copy


Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables


How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration


APIs for Dummies

View this on-demand webinar about APIs....

Watch Now