Five Stars

Data standardization and cleansing using talend open studio

Hi Guys!
I am quite new to talend and I have encountered a little bit of problem. 
I have 2 tables which are csv files:
Customer.csv and Billing.csv
Customer.csv Table columns:        Customer_ID|Title|First_Name|Last_Name|Status|Email|Date_of_Birth
Billing.csv Table columns:             Customer_ID|Phone_No|Address_Line_1|City|Region|Country|Zip

I have a task to standardize the data of the following tables
In the customer.csv table, the accepted format for the "Title" column are the following: 1.) Mr. 2.) Mrs. and 3.)Ms.
For the billing.csv table, the accepted format for the "Phone_No" is: XXX-XXX-XXXX
My question is wether its possible to standardize and cleanse the data using talend open studio for data integration(not the enterprise edition of talend and not talend studio for data quality either). AND if its not possible...how can I somehow filter the data so that if it does'nt follow the right format, only that certain type of data will not pass(a little bit of error handling). Is it possible to standardize data using tMap? or tJava?
Anyways, any kind of help would be gladly appreciated
sincerely Locke
1 REPLY
Moderator

Re: Data standardization and cleansing using talend open studio

Hi,
There are Talend dq components about TalendHelpCenter:tStandardizePhoneNumberTalendHelpCenter:tRecordMatching and some name parsing routines that ship with TDQ, check out DataQuality in the expression builder to pull out first name, last name, title, etc.
Please take a look at Talend Data Quality Product:
http://www.talend.com/products/data-quality
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.