Text enclosure delimiter in CSV data

One Star

Text enclosure delimiter in CSV data

I have a set of CSV data with comma as the field separator and double quote(") as the text enclosure.  In some of the fields the text enclosure character is in the data.
The behavior of the tfileInputDelimited is to truncate the field at the first " in the field.  I do not see any errors from this truncation.
e.g. "This is "a" test field" is truncated to "This is " in the output data.  The interesting thing is that the parser is smart enough to recognize the start of the next field correctly so the rest of the data looks fine.
I could change delimiters but this is a large export and I cannot be certain in any case that the text enclosure character will not be in the field.
At a minimum I need to know that the parser truncated the data on that field with some kind of exception log or I need to determine a way to handle the anomaly correctly.
Any suggestions?
Jim
Moderator

Re: Text enclosure delimiter in CSV data

Hi,

Is your expected result: This is a test field?  If so, you can use tReplace component to remove " in your output column.
Please see component reference TalendHelpCenter:tReplace


Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Text enclosure delimiter in CSV data

My expected result is: This is "a" test field (actually it is xml so I expect <myfield>This is "a" test field</myfield>)
Since this is XML I could also live with a replacement to an entity ($quote)
The replace solution does not work - it seems that when used against a field the parser has already truncated the field data at the first quote.
I would need to be able to do the replace against the whole record before it is parsed into fields - I don't see how to do that.

This feels like a bug.  All the other edge cases work great (like record ends in the middle of data fields).
This is very important for me to solve soon.
Thanks - Jim
One Star

Re: Text enclosure delimiter in CSV data

One way of working around this might be to use tFileInputFullRow and use a replace on those records, then pipe that into the tFileInputdelimited.  Firstly I don't know how to pipe that output so if there is a way to do that it would help.  If not I can probably write the data to a temp file and then process that by tFileInputdelimited but that double my disk space requirements...
Jim
Moderator

Re: Text enclosure delimiter in CSV data

Hi,
The replace solution does not work - it seems that when used against a field the parser has already truncated the field data at the first quote.p

Did you set tReplace component as screenshot shown?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Text enclosure delimiter in CSV data

Yes - the issue seems to be the field is truncated before replace gets to act on it.
Jim
Moderator

Re: Text enclosure delimiter in CSV data

Hi,
Could you please share your current job design screenshot with us?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Text enclosure delimiter in CSV data

Sorry this reply is so late - I got move to another job - now I am back... Smiley Happy
Here is a simple test case with screen shots and data.
Screen1.png shows the job and that it ran without errors.
Screen2.png shows the setup of the inputfile delimited
Screen3.png shows the replace parameters
Screen4.png shows the XML Output Parameters
Screen5.png shows the schema mapping
Screen6.png shows the XML output
EventDetails-bad.txt is the input file.  The second record has the word text quoted with the delimiter quotes.
test_csv_conversion_0.1.zip is an executable build of the job.  You will need to place the data in the correct folder for it to run.
Note there is no error message on the run.
Note that in the XML output the <DETAIL_VALUE> content for the second record is truncated where the embedded quote is.
Note also that the subsequent field <FILENAME> is correct so the data was parsed correctly but not output correctly.
I hope this helps isolate the issue.
There may be duplicates uploaded as I did not see the confirmation message for more than 4 files.
EventDetails-bad.txt.txttest_csv_conversion_0.1.zip.zip
One Star

Re: Text enclosure delimiter in CSV data

Anybody from Talend looking at this?
Thanks
Jim
One Star

Re: Text enclosure delimiter in CSV data

Anyone??? Smiley Sad