I have a fairly straight-forward requirement. I need to allocate a nearest delivery centre based on customers post-code...I can generate 2 files
Joe Bloggs, Some Road in Reading, RG1 4XX
Tim John, Some Road in London, NW10 1AA
I want to match based on '%PostCode%' from Customer to contain PostCode from Delivery.csv
Solved! Go to Solution.
It is pretty risky proposition to identify the nearest delivery centre postcode by just doing %POST_CODE% match on customer post code. Obviously the match might not be correct in many cases and we are trying to reinvent the wheel where we are already having pretty matured SaaS solutions already available.
In a nutshell, I would suggest to use the Standardization match features on one of the above components based on your comfort level. They are using Royal Mail's Standardized Address File as their source and will give the accuracy percentages by comparing with input addresses.
Another problem is that even if you got postal address as same, what is the guarantee that you will be delivering to exactly same person if your address match is just on hard match based on postal code? There are lot of common names in UK like Adam Smith where people with same name will be there in same street.
Now, there is another scenario where father and son are having same name with Junior or II suffix. Another scenario is Husband and wife divorced and living in same location. If your address match is wrong and the letter is going wrongly to a divorced couple, you are going to end a paying lot of GDPR fine :-)
So my suggestion will be to utilize the address standardization features of Talend Data quality component rather than trying to try something yourself. First of all, you will have to put lot of efforts to reinvent the wheel and they are doing the stuff with pretty much matured algorithms in plug and play fashion.
So do you still want to try this item by yourself?
I appreciate your advise. However all out delivery centres have a post code list i.e. just the first two letters. So I am really looking for a wildcard match in tmap or similar....
That is my fall back plan. Is there any way to do wild card match as the next phase load will be 3 digit post-codes. Alternatively I will upload them to staging database and use wildcard match in join....however was seeing if we have anything available in Talend to do this.
Appreciate your answer.
That will be a good fall back method but add the necessary caveats in your design documents and development handover guide about the possible data match risk so that the issue will not snowball to you later.
If the architect of the project is deciding this is the best method, then the onus of handling any data problem/GDPR issues also should be eventually handled by him/ her. They should not throw the mud to you or testing team for it for any possible setback.
Try Talend Cloud free for 30 days.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.