Five Stars

Problematic delimiter

Hi,

 

I need to solve a problem of duplicate/punctuation in my job.

 

Here is an exemple of my input file :

baba;”bibi”;”bobo”;bubu;

tata; ”titi;titi”; ”toto”;tutu;

dada ; ” ”didi”didi”;”dodo”;dudu;

 

In this exemple, we have 3 lines with 4 columns.

  • In the first line, everything is clear. That should be the proper format. note : the double quotes are not necessary in the output file.
  • In the second line, I have my first problem to solve. I have a “;” in the middle of my string and since the given delimiter is also a “;”, Talend thinks I have 5 columns instead of 4.
  • In the third line, I have some “”” (double quote) inside my columns that already possess double quotes.

 

So! Basically, what is my problem? My problem is this “;” located on my 2nd line. I need to make Talend think that I do have 4 columns and not 5. So I use a tMap where I erase this “;” by “” (nothing) because the absence of this “;” inside the column is not critical. And to do so, I used the option “CSV options” in my “Basic settings”. Without this CSV option, I can not erase the “;” in my column.

 

But, by using the “CSV options”, the double quotes on the extremities on my columns disappear on all my lines! As a matter of fact, the “;” problem is no more. But I have a new problem. On my line 3, my second column is seen as empty. Instead of selecting “didi”didi, Talend selects  (the space just before).

 

I tried a few things, but I am running out of options. I also tried using the “length” column in my schema. But it seems useless since that tFileInputDelimited doesn’t check the length of the data when it reads it.

 

Do anyone has any solution to submit to deal with those “;” and “”” ? The ideal solution would be to erase the “;” without erasing any “”” (double quote). 

I have seen this solution which really looks like my actual problem. But my multiples quotes inside my columns make me unable to use this solution...

1 REPLY
Twelve Stars

Re: Problematic delimiter

solution possible to be only one - fix the source of information

 

all other is not a solutions, it only tricks, which could work, or not work, and even if it work, it not mean its continue work with next not accurate case.

 

often motivations look like - we can not change source of information ... but in most of cases it not true, and this is only one proper way

-----------