I have a set of CSV data with comma as the field separator and double quote(") as the text enclosure. In some of the fields the text enclosure character is in the data. The behavior of the tfileInputDelimited is to truncate the field at the first " in the field. I do not see any errors from this truncation. e.g. "This is "a" test field" is truncated to "This is " in the output data. The interesting thing is that the parser is smart enough to recognize the start of the next field correctly so the rest of the data looks fine. I could change delimiters but this is a large export and I cannot be certain in any case that the text enclosure character will not be in the field. At a minimum I need to know that the parser truncated the data on that field with some kind of exception log or I need to determine a way to handle the anomaly correctly. Any suggestions? Jim
My expected result is: This is "a" test field (actually it is xml so I expect <myfield>This is "a" test field</myfield>) Since this is XML I could also live with a replacement to an entity ($quote) The replace solution does not work - it seems that when used against a field the parser has already truncated the field data at the first quote. I would need to be able to do the replace against the whole record before it is parsed into fields - I don't see how to do that.
This feels like a bug. All the other edge cases work great (like record ends in the middle of data fields). This is very important for me to solve soon. Thanks - Jim
One way of working around this might be to use tFileInputFullRow and use a replace on those records, then pipe that into the tFileInputdelimited. Firstly I don't know how to pipe that output so if there is a way to do that it would help. If not I can probably write the data to a temp file and then process that by tFileInputdelimited but that double my disk space requirements... Jim
Sorry this reply is so late - I got move to another job - now I am back... Here is a simple test case with screen shots and data. Screen1.png shows the job and that it ran without errors. Screen2.png shows the setup of the inputfile delimited Screen3.png shows the replace parameters Screen4.png shows the XML Output Parameters Screen5.png shows the schema mapping Screen6.png shows the XML output EventDetails-bad.txt is the input file. The second record has the word text quoted with the delimiter quotes. test_csv_conversion_0.1.zip is an executable build of the job. You will need to place the data in the correct folder for it to run. Note there is no error message on the run. Note that in the XML output the <DETAIL_VALUE> content for the second record is truncated where the embedded quote is. Note also that the subsequent field <FILENAME> is correct so the data was parsed correctly but not output correctly. I hope this helps isolate the issue. There may be duplicates uploaded as I did not see the confirmation message for more than 4 files. EventDetails-bad.txt.txttest_csv_conversion_0.1.zip.zip