tExtractDelimitedFields on variable location field

Highlighted
Four Stars

tExtractDelimitedFields on variable location field

I am wondering why there is no option to specify CSV Parameters for tExtractDelimitedFields.

 

My use case is to parse a field with my schema definition and to allow quotes embedded in double quotes in the incoming field to make sure that the row parsing is not confused. To my amazement there is no option that I can find to specify this.

 

Even the tNormalize component allows for the specification for CSV Parameters

Escape mode:doubled | backslash

Text enclosure """

 

I ended up having to replace quoted commas with the code below with ascii "0" and then writing another similar component to switch the ASCII 0 back to commas after the tExtractDelimitedFields component.

 

There must be a better way. Any ideas?

 

tJavaRow - ReplaceQuotedCommas

int iSentinel=0;

 

String cvsString = input_row.returnVal ;

String rePattern ="(\"[^\",]+),([^\"]*\")";

// first replace

String oldString = cvsString;

String resultString = cvsString.replaceAll(rePattern, "$1"+ Character.toString ((char) iSentinel)+ "$2");

// additional replcaes until until no more changes

while (!resultString.equalsIgnoreCase(oldString)){

        oldString = resultString;

        resultString = resultString.replaceAll(rePattern, "$1"+ Character.toString ((char) iSentinel)+ "$2");

        }

System.out.println("Before: " + cvsString);

//System.out.println("Result: " + resultString);

 

output_row.returnVal = resultString;