Parsing CSV - Commas within double quotes

One Star

Parsing CSV - Commas within double quotes

I'm evaluating TOS. I created a new project and added a delimited file under Metadata -> File Delimited.
Going thru the wizard I get to step 3 where you define the encoding, file separator, and a few other parameters. This screen reads the file you specified in step 2 and parses it according to the delimiter. In my case it is a CSV file. But the file has double quotes around all the field values and some of those contain commas within them.
Is it possible to tell TOS to ignore commas that are between double quotes? This is a very common situation and is handled by default in spreadsheet programs such as Open Office or Excel.
This seems like very elementary functionality and the reason I started looking for an ETL tool in the first place is because the COPY command in Postgres cannot handle this situation.
My other alternative is to use dbf file downloads of the same data. Is it possible in TOS to define the dbf delimiiter?
One Star

Re: Parsing CSV - Commas within double quotes

Never mind. I figured it out with the Text Enclosure setting.
One Star jpn
One Star

Re: Parsing CSV - Commas within double quotes

I would be interest in how you over came this problem as I experienced a similar problem with my delimiter files.
I didn't spend too much time to find a solution, i simply changed my delimiter character.
Employee

Re: Parsing CSV - Commas within double quotes

Hello,
This is the difference between tFileInputDelimited (no text enclosure) and tFileInputCSV (text enclosure).

Regards,
One Star

Re: Parsing CSV - Commas within double quotes

JPN - When you are adding File Delimited Metadata the wizard wiill give you the opportunity to set the "text enclosure". Select "\"" and it reads the file correctly.
One Star

Re: Parsing CSV - Commas within double quotes

Hello! thank you for your hint. with this option (using Text Enclosure "\"") it is possible to import google-contact exports (CSV).
--Robert
Seventeen Stars

Re: Parsing CSV - Commas within double quotes

hi all,
a few more explanation about enclosure
the back slash "\" is the way to escape any character following it which can be a reserved one for the code (\', \\ , ..)
I tell the program : "don't use it like a code character but like any other one "
like in a regex "\\d" : d is reserved pattern so put "\" before , but '\' is reserved too ..so also reserved it => \\d
++
One Star

Re: Parsing CSV - Commas within double quotes

My Text Enclosure character is ~... I can edit this property on the tFileDelimited properties but I can't edit it under the Schema definition in the Repository - I have to choose from the drop list and none of them apply.
Employee

Re: Parsing CSV - Commas within double quotes

Currently running 4.0.2 r43696 and it looks like a bug has crept in. (There isn't a tFileInputCSV for this version and appears deprecated)
When setting up the metadata/file_delimited and select a CSV similar to vclark's file. (EX)
"type","location","setting","complexity",'prod_type","category","stock","website"
"admin","base","default","simple","hardware","56,51","0","www.example.com"
"admin","base","default","simple","hardware","32,61","1","www.example.com"
should read in the following values with Set Heading Row as Column Names checked.
Type Location Setting Complexity Prod_Type Category Stock Website
admin base default simple hardware 56,51 0 www.example.com
admin base default simple hardware 32,61 1 www.example.com
However upon setting the CSV (Escape Char Settings) - Text Enclosure to "\"" and Comma - Field Separator as "," yields the following incorrect results:
"Type" "Location" "Setting" "Complexity" "Prod_Type" "Category" "Stock" "Website"
"admin" "base" "default" "simple" "hardware" "56 51" "0"
"admin" "base" "default" "simple" "hardware" "32 61" "1"
Changing the Field Separator to Semicolon - ";" will read the data fields in correctly but will read the column names as single column and therefore drop all of the fields into a single column (quotes and all).
As you can see the Category field is not parsed correctly.
Anthony
Talend Certified
One Star

Re: Parsing CSV - Commas within double quotes

I think this bug is still in the 4.0.3 release: I've been evaluating talend with some very simple CSV transforms and it turns out this is the thing that's been frustrating me! I guess I can preprocess my CSVs before they hit talend. :-/
Steve

Currently running 4.0.2 r43696 and it looks like a bug has crept in. (There isn't a tFileInputCSV for this version and appears deprecated)
...
Changing the Field Separator to Semicolon - ";" will read the data fields in correctly but will read the column names as single column and therefore drop all of the fields into a single column (quotes and all).
As you can see the Category field is not parsed correctly.
Anthony
Talend Certified
One Star

Re: Parsing CSV - Commas within double quotes

Sorry, I meant this bug still exists in the TOS-All-r49467-V4.1.0 release.
Steve
One Star

Re: Parsing CSV - Commas within double quotes

I am experiencing the same problem as anthonyp. Is there a solution or work around? I am using the open version.
Seven Stars

Re: Parsing CSV - Commas within double quotes

The bug in v4.0.2 only affected the New Delimited File dialog and was corrected in v4.0.3 and v4.1.0.
An actual job in v4.0.2 reading the file worked fine; see screen-print.