The Definitive Guide to Data Quality
1) did You understanding logic - how to select proper rows from duplicates?
2) what is Your source of data?
It is common problem of MySQL "programmers" - by default old MySQL allow GROUP BY without handling all columns from SELECT
result after this unpredictable
If Your data come from database - You can make this proper form of GROUP BY on source database,
if not and data sets not huge (acceptable time delay from additional export-import) - put both native flows into database, make proper GROUP BY - continue Job with proper data
Correct approach will be to use taggregator or group by and choose what you want to do with non duplicate columns. This will help you to have deterministic output