I'm trying to achieve an equivalent of a COUNT(*) GROUP BY 2 fields in SQL within a Talend flow but at the moment we are reading a table and writing out a sort/uniq'd output to a MySQL table, then having a second subjob that reads the output from the first step and counts the number of rows per ID.
Any idea whether it would be possible to run this in a single subjob flow or whether we should keep these jobs separate and run the count in raw SQL?
Split out the data you want to get the GROUP BY on with a tReplicate (so it gets its own feed to perform the aggregation on) and then either tSortRow/tAggregateSortedRow or tAggregateRow, depending on data volumes.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema
Pick up some tips and tricks with Context Variables