tNormalize and enclosed field

One Star

tNormalize and enclosed field

Hello,
I use a tNormalize object to split lines with space as filed separator. Some of my columns have text fields delimited with double quotes.

Jan  1 00:01:44 qid=tBVN1fmH035294 subject="Delivery Status Notification (Delay)" virusname= duration=0.037 elapsed=0.266



I would like this text field not to be split so I use in advanced tab of the tNormalize the option use CSV parameters and text enclosed with """ but it does not work...:-/
Could anyone help me to make it working properly ?
Thanks
Regards

Moderator

Re: tNormalize and enclosed field

Hi,

Have you tried to check the "CSV Options" checkbox and type """ and """ in "escape char" and "text enclosure" fields on component tfileinputdelimited component to see if your text field displays well then using tNormalize to split lines?
[font=Verdana, Helvetica, Arial, sans-serif]Best regards[/font]
[font=Verdana, Helvetica, Arial, sans-serif]Sabrina[/font]
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: tNormalize and enclosed field

Hello,
Yes I tried with to check the "CSV Options" checkbox and type """ and """ in "escape char" and "text enclosure" fields on component tfileinputdelimited component and it is not working.
Clearly this components do not do what they should do, it is full of bugs
Regards
Ten Stars

Re: tNormalize and enclosed field

There is a difference between a "bug" and software not doing what you "think" it should do. Your format does not suit the CSV options because it does not conform to that standard. 
First of all, let's look at this logically. Why would you choose a space to normalize the data when you have a column (or more) which legitimately have spaces in them?
Your example text is below.....
Jan  1 00:01:44 qid=tBVN1fmH035294 subject="Delivery Status Notification (Delay)" virusname= duration=0.037 elapsed=0.266


The first thing that comes to mind is that (ignoring the date at the beginning) your data appears to arrive in the format "{field name}={field value}". So the first thing you need to do is remove the date section from the problem. This can be done by using Java String manipulation methods (substring with indexOf, for example) to either search for "qid" if it will always be the first non-date value or for an unbroken String with a "=" at the end. This is entirely possible in Java.
Once you are left with the rest of the String, you can use a variation on what you did to retrieve the Date, to split the rest of the data.
This is not a standard problem and as such there isn't necessarily a Talend component for precisely solving this. However, with a bit of Java knowledge and a logical approach to the problem, it is not that difficult to solve.

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog