One of the column has many separators.

Six Stars

One of the column has many separators.

Hi  All, 

I get data in the below format  in txt file 

"a","b","c","d","e","f,g,h,i,j,k","l","m","n","o","p","q","r","s"

We want "f,g,h,i,j,k" to be populated as one column to the next component. When we are using "," separator, it is producing many lines. In this "f,g,h,i,j,k", there are number of random commas. How do we get "f,g,h,i,j,k" as one column to the next components?

Your response would appreciated.


Accepted Solutions
Four Stars

Re: One of the column has many separators.

Here's a job which exactly does the requirement.

 


All Replies
Employee

Re: One of the column has many separators.

Hello, are you using the Talend Data Streams AMI?  I gave this a quick check and it returned one record with 14 fields for your given example.

 

To be clear, CSV processing is not entirely well-documented.  It should be the format from https://tools.ietf.org/html/rfc4180 except that record delimiters are not permitted inside quotes.  Is the comma a field delimiter or record delimiter? This is an important note: most big data text files forbid the use of record delimiters inside fields (even with quotes), since it makes the file unsplittable across nodes.

 

We have work in progress to add configurable quote enclosures.  Does your use case require record delimiters inside quotes?  In this case, would it be acceptable if each file was unsplittable?

Six Stars

Re: One of the column has many separators.

Yes, it is important field in a row. We are using talend aws cloud integration
Six Stars

Re: One of the column has many separators.

@rskraba, could you please provide me the code/job here?

Employee

Re: One of the column has many separators.

Ah, my apologies -- we are not speaking of the same Talend product!  This forum is for Talend Data Streams (see https://community.talend.com/t5/Data-Streams/Introducing-Talend-Data-Streams/td-p/120373), which is where I tested.

 

For Talend AWS Cloud Integration, you should probably raise the question at https://community.talend.com/t5/Design-and-Development/bd-p/integrating -- it would be useful to include whether you running in Big Data (such as Spark) or DI.  My comment above for "record delimiters inside fields" only applies to big data.

 

Best regards, Ryan

Four Stars

Re: One of the column has many separators.

Here's a job which exactly does the requirement.

 

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.