Aggregation on large Table Data

One Star

Aggregation on large Table Data

I have a table containing 33 million records in MySQL database. I need to aggregate the rows. Could you help to know the best/efficient method to solve this with performance?
1. Extract all the 33 million records into Talend,  sort and aggregate in the Talend tool
2. TMySQLInput with following extract query
   SELECT A,B,C, SUM(D),SUM(E)
   from table1
   group by A,B,C
3. Any other better way other than listed above
Thanks,
Srini
Moderator

Re: Aggregation on large Table Data

Hi,
Here is a component TalendHelpCenter:tAggregateRow which receives a flow and aggregates it based on one or more columns. For each output line, are provided the aggregation key and the relevant result of set operations.
Please take a look at the related scenario in this component reference.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Eight Stars

Re: Aggregation on large Table Data

Hi,

 

A combination of tMySQLInput (with order by criteria) and tAggregateSortedRow components together. Make sure you correctly set the "Input Rows Count" which means you'll probably need to read the data into an intermediate file.

 

tCreateTemporaryFile

|

tMySQLINput -> tFileOutputDelimited

|

tFileInputDelimted -> tAggregateSortedRow -> (whatever you want to do with the aggregated data)

 

Thanks

 

David

 

Regards

David

Don't forget to give Kudos when an answer is helpful or the solution.