I get data in the below format in txt file
We want "f,g,h,i,j,k" to be populated as one column to the next component. When we are using "," separator, it is producing many lines. In this "f,g,h,i,j,k", there are number of random commas. How do we get "f,g,h,i,j,k" as one column to the next components?
Your response would appreciated.
Solved! Go to Solution.
Hello, are you using the Talend Data Streams AMI? I gave this a quick check and it returned one record with 14 fields for your given example.
To be clear, CSV processing is not entirely well-documented. It should be the format from https://tools.ietf.org/html/rfc4180 except that record delimiters are not permitted inside quotes. Is the comma a field delimiter or record delimiter? This is an important note: most big data text files forbid the use of record delimiters inside fields (even with quotes), since it makes the file unsplittable across nodes.
We have work in progress to add configurable quote enclosures. Does your use case require record delimiters inside quotes? In this case, would it be acceptable if each file was unsplittable?
Ah, my apologies -- we are not speaking of the same Talend product! This forum is for Talend Data Streams (see https://community.talend.com/t5/Data-Streams/Introducing-Talend-Data-Streams/td-p/120373), which is where I tested.
For Talend AWS Cloud Integration, you should probably raise the question at https://community.talend.com/t5/Design-and-Development/bd-p/integrating -- it would be useful to include whether you running in Big Data (such as Spark) or DI. My comment above for "record delimiters inside fields" only applies to big data.
Best regards, Ryan
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.