The Definitive Guide to Data Quality
I cannot switch to a JAVA project because perl is a requirement for my project.
I understand your approach but unfortunately it doesn't work very well because:
- in your example, the data is read from cvs files. It's then simple to replicate the input.
- in my case, the data is the result of several other transformations, mappings.
If I want to apply your approach, I would have to duplicate my transformation.
The final result would then be very complex and not very nice any more.
Ok. I could fix it by duplicating the tMySqlOutput component and using "update or insert" for "action on data".
The first branch is responsible of denormalizing the entries (CONCAT).
The second branch is responsible of aggregating the entries (COUNT).
At the end, the second branch updates the entries previously inserted by the first branch.
I just fear that collisions could happen. That means that both branches will insert new entries (rather than updating) if they are executed concurrently. To avoid that, I added a tSleep component. But I'm not sure that it works in any case.