Six Stars

tReplace not removing comma in Talend workflow for multiple columns

I posted a question earlier for the same problem, however it was for single column. Now i have 3 columns instead. I have a workflow which i am using to remove comma sign , from the data. However its is not working. I tried un-checking "Whole word" field too, but to no avail. Below is how data looks like after i opened it with Notepad++.

 

id,Introduction,Vertical
"Jinshan District, Shanghai, China","international saler at Shanghai Triowin Automation Machinery Co.,Ltd","Machinery, Transportation"
"Pudongxin District, Shanghai, China","Market Communication Specialist, Shanghai JiaoTong University - Kedge Business School",Education Management
"Nanjing City, Jiangsu, China","System Tester at Jiangsu HopeRun Software Co., Ltd.",Computer Software

 

 Capture.JPG

 

Output is coming like below, not with any delimiter and not in individual columns.

 

Capture1.JPG

 

 

  • Data Integration
9 REPLIES
Nine Stars TRF
Nine Stars

Re: tReplace not removing comma in Talend workflow for multiple columns

Because you're reading input file with tFileInputFullRow, only the 1st field (Id) is populated. Try to use tExtractRegexFields instead of tReplace to cut the input row in 3 parts or use tFileInputDelimited instead of tFileInputFullRow. In this case, use \t as a field separator (that's the character which is in your sample) then, remove all "," for the fields you need to.

TRF
Six Stars

Re: tReplace not removing comma in Talend workflow for multiple columns

\t is not the delimiter. tab was generated because i pasted the data directly from excel file into the post and inserted it as code. 

 

In fact delimiter is the whole problem. my data is in CSV which means comma is the delimiter, however there are fields in which data has comma value, this is resulting in data loss because where comma starts, data is shifted into the next column instead of being in the exact column.

 

This is the reason why i have been looking for finding a solution for removing comma or doing whatever it takes to get rid of this problem. I have been trying to solve this since a week and stuck here.

Six Stars

Re: tReplace not removing comma in Talend workflow for multiple columns

One solution here while the source file(I mean source CSV file) is getting created ask them enable Text Qualifier double quotes(") for all fields. in this example for all 3 fields.

then while reading you can add text qualifier in your input file delimited component, Text Qualifier property as """ so that you will not loose data either columns will not get shifted to next columns.

I will keep you posted for another solution. Meawhile can you edit the source file with notepad and copy first 3 to 5 rows here.

Thanks,
Sid
Please like the post if it is useful
Please put to resolved if it resolves your issue.
Six Stars

Re: tReplace not removing comma in Talend workflow for multiple columns

We get the data from some external source and not sure if we can get it done. I will try though. For the second part, the original post towards the top has 4 rows, first is column name, rest 3 are data. Thanks!!
Ten Stars

Re: tReplace not removing comma in Talend workflow for multiple columns

Excel stomps all over the formatting when a csv file is opened.  Open the data file with a text editor and paste from there so we can see how the raw data looks.

Six Stars

Re: tReplace not removing comma in Talend workflow for multiple columns

This is how it looks like after i open it in Notepad++

id,Introduction,Vertical
"Jinshan District, Shanghai, China","international saler at Shanghai Triowin Automation Machinery Co.,Ltd","Machinery, Transportation"
"Pudongxin District, Shanghai, China","Market Communication Specialist, Shanghai JiaoTong University - Kedge Business School",Education Management
"Nanjing City, Jiangsu, China","System Tester at Jiangsu HopeRun Software Co., Ltd.",Computer Software
Ten Stars

Re: tReplace not removing comma in Talend workflow for multiple columns

A simple tFileInputDelimited should read this just fine. Configure the component schema with three String columns, set the Field Separator to "," and the Text Enclosure to "\""
Six Stars

Re: tReplace not removing comma in Talend workflow for multiple columns

Not working! I just tried. Can you post a screenshot if it worked for you?

Ten Stars

Re: tReplace not removing comma in Talend workflow for multiple columns

Talend azimulh input.png

[statistics] connecting to socket on port 3934
[statistics] connected
.-----------------------------------+-------------------------------------------------------------------------------------+-------------------------.
|                                                                     tLogRow_1                                                                     |
|=----------------------------------+-------------------------------------------------------------------------------------+------------------------=|
|id                                 |Introduction                                                                         |Vertical                 |
|=----------------------------------+-------------------------------------------------------------------------------------+------------------------=|
|Jinshan District, Shanghai, China  |international saler at Shanghai Triowin Automation Machinery Co.,Ltd                 |Machinery, Transportation|
|Pudongxin District, Shanghai, China|Market Communication Specialist, Shanghai JiaoTong University - Kedge Business School|Education Management     |
|Nanjing City, Jiangsu, China       |System Tester at Jiangsu HopeRun Software Co., Ltd.                                  |Computer Software        |
'-----------------------------------+-------------------------------------------------------------------------------------+-------------------------'

[statistics] disconnected