tExtractDelimitedFields on variable location field

Highlighted
Four Stars

tExtractDelimitedFields on variable location field

I am wondering why there is no option to specify CSV Parameters for tExtractDelimitedFields.

 

My use case is to parse a field with my schema definition and to allow quotes embedded in double quotes in the incoming field to make sure that the row parsing is not confused. To my amazement there is no option that I can find to specify this.

 

Even the tNormalize component allows for the specification for CSV Parameters

Escape mode:doubled | backslash

Text enclosure """

 

I ended up having to replace quoted commas with the code below with ascii "0" and then writing another similar component to switch the ASCII 0 back to commas after the tExtractDelimitedFields component.

 

There must be a better way. Any ideas?

 

tJavaRow - ReplaceQuotedCommas

int iSentinel=0;

 

String cvsString = input_row.returnVal ;

String rePattern ="(\"[^\",]+),([^\"]*\")";

// first replace

String oldString = cvsString;

String resultString = cvsString.replaceAll(rePattern, "$1"+ Character.toString ((char) iSentinel)+ "$2");

// additional replcaes until until no more changes

while (!resultString.equalsIgnoreCase(oldString)){

        oldString = resultString;

        resultString = resultString.replaceAll(rePattern, "$1"+ Character.toString ((char) iSentinel)+ "$2");

        }

System.out.println("Before: " + cvsString);

//System.out.println("Result: " + resultString);

 

output_row.returnVal = resultString;

 

Moderator

Re: tExtractDelimitedFields on variable location field

Hello,

We have raised a new feature jira issue on talend bug tracker.

https://jira.talendforge.org/browse/TDI-30555

Could you please try this solution to see if it  may work for your use case?

tFileInputRaw -> tConvertType -> tNormalize -> tExtractDelimitedFields -> tLogRow
tFileInput raw is reading a csv-file as a string. The schema has one column content, of type document.
tConvertType converts column content type document to column content type string
tNormalize normalizes on "\n" so from 1 input object flow getting a flow with multiple single rows
tExtractDelimitedFields extracts columns from every single row

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

An API-First Approach to Modernizing Applications

Learn how to use an API-First Approach to Modernize your Applications

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

Talend API Designer – Technical Overview

Take a look at this technical overview video of Talend API Designer

Watch Now