One Star

File Delimited input processing ignores escape characters

Using DI 5.2.1 trying to input simple .csv file.
Using Metadata Wizard to specify "," field delimiters and ANY escape character. The specified escape character is ignored when the file is processed.
I've tried two of the wizard's drop-down box choices, the "\"" and the "'". Both are ignored and fields with escaped ","s are split.
With ' specified as the escape character, the input string:
... ,Pembroke', MA 02359,Pembroke,Plymouth,MA, ...
shown in the file viewer in step 2 of 3 is parsed:
... |Pembroke'|MA 02359|Pembroke|Plymouth|MA| ...
when it should be parsed as:
... |Pembroke, MA 02359|Pembroke|Plymouth|MA| ...

The specified escape character, " ' " , is not processed as an escape.
What's going wrong?
This looks like a bug in tFileInputDelimited.
3 REPLIES
Moderator

Re: File Delimited input processing ignores escape characters

Hi,
Your requirement is making the "Address" into different columns? I mean string ",Pembroke', MA 02359,Pembroke,Plymouth,MA," should be separated by "," ? What you want is the following screenshot?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: File Delimited input processing ignores escape characters

Thanks for responding.
My requirement is that the input string:
,Pembroke', MA 02359,Pembroke,Plymouth,MA,
be parsed into four fields:
|Pembroke, MA 02359|Pembroke|Plymouth|MA|
The " ' " after the first occurrence of "Pembroke" is the escape character. It is supposed to prevent the comma following it from separating the field and preserve that comma as part of the string, "Pembroke, MA 02359".
The problem is, that even though I specify the escape character in the wizard and in the metadata (note the screen shot in the first post), it is not used when the job is executed.
Is the .csv escape character functionality broken or have I done something wrong?
Any thoughts?
One Star

Re: File Delimited input processing ignores escape characters

Hi holberger,
I have come across this problem as well, and I think this is a bug within Talend. You cannot escape the commas properly.
The only options available in the Escape Char and Text Enclosure drop down menus are:
- Empty
- "\""
- "'"
- "\\"
If both the Escape Char and the Text Enclosure are set to "'", in order for you to parse your fields as:
|Pembroke, MA 02359|Pembroke|Plymouth|MA|
You should have the next input string:
,'Pembroke, MA 02359',Pembroke,Plymouth,MA,
But then, if your field needed to use the apostrophe in the data, it would not be properly escaped, for example, the next input would not be properly parsed, even if you added escape characters before the apostrophe:
,'Pembroke's Hills Street, MA 02359',Pembroke,Plymouth,MA,
My conclusion is that the Metadata Wizard cannot be used because it is faulty. However, you can just define a single tFileInputDelimited and check the tickbox "CSV options" (Accepting the defaults for Escape char and Text enclosure). With this configuration your field should be properly parsed from:
,"Pembroke, MA 02359",Pembroke,Plymouth,MA,
to:
|Pembroke, MA 02359|Pembroke|Plymouth|MA|
And you would also be able to escape apostrophes and double quotes, so that this input:
,"Pembroke's Hills is ""the"" house, MA 02359",Pembroke,Plymouth,MA,
would be parsed as:
|Pembroke's Hills is "the" house, MA 02359|Pembroke|Plymouth|MA|
This is a workaround that works, but it would be better if this was possible through the Metadata Wizard.
I have tried this with versions 5.2.2 and 5.1.1, so I guess this is something that has not been tackled yet.
Does anyone have any news regarding this?
Many thanks.