I have a job with a tMatchGroup which matches customers together from various source systems by using name and adress, with levenshtein algorithm.
It works fine.
When a new customer is created in source system, i currently need to reload the job and the whole data are processed by the tMatchGroup.
If i have 10 000 customers, the job will process these 10 000 records together. But it is useless, I only need to match the new customer with the others.
Do i have to use another component ? Because It seems tMatchGroup use the whole table in input. (like a cartesian join)
Besides, I would like to keep the GID from the first launch, if i process the entire data, i will lost the GID, they will be replaced by new ones.
Or maybe I do not use the appropriate method...
Thanks for your help.
Solved! Go to Solution.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Find out about Talend's 2019 Summer release
Talend continues to revolutionize how businesses leverage speed and manage scale
Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend