I need to solve a problem of duplicate/punctuation in my job.
Here is an exemple of my input file :
tata; ”titi;titi”; ”toto”;tutu;
dada ; ” ”didi”didi”;”dodo”;dudu;
In this exemple, we have 3 lines with 4 columns.
So! Basically, what is my problem? My problem is this “;” located on my 2nd line. I need to make Talend think that I do have 4 columns and not 5. So I use a tMap where I erase this “;” by “” (nothing) because the absence of this “;” inside the column is not critical. And to do so, I used the option “CSV options” in my “Basic settings”. Without this CSV option, I can not erase the “;” in my column.
But, by using the “CSV options”, the double quotes on the extremities on my columns disappear on all my lines! As a matter of fact, the “;” problem is no more. But I have a new problem. On my line 3, my second column is seen as empty. Instead of selecting “didi”didi, Talend selects (the space just before).
I tried a few things, but I am running out of options. I also tried using the “length” column in my schema. But it seems useless since that tFileInputDelimited doesn’t check the length of the data when it reads it.
Do anyone has any solution to submit to deal with those “;” and “”” ? The ideal solution would be to erase the “;” without erasing any “”” (double quote).
I have seen this solution which really looks like my actual problem. But my multiples quotes inside my columns make me unable to use this solution...
solution possible to be only one - fix the source of information
all other is not a solutions, it only tricks, which could work, or not work, and even if it work, it not mean its continue work with next not accurate case.
often motivations look like - we can not change source of information ... but in most of cases it not true, and this is only one proper way
Watch the recorded webinar!
Introduction to Talend Open Studio for Data Integration.
Test drive Talend's enterprise products.
Practical steps to developing your data integration strategy.