Deliminator for Input Files

One Star

Deliminator for Input Files

I am having difficulty getting an input file to read correctly. The file is deliminated "|" And for the most part this works. However there is a description column which occasionally has commas "," that are picked up as well. I have double checked my schema to make sure it is supposed to separate by pipes "|" only, as well as the input expression in my mapping.
Is there anything I could have missed? I have gone into the advanced options and selected CSV option there as well to insert parentheses around each field between the pipes to see if this would mitigate but so far has not.
Suggestions? having the source files sent ahead of time with parenthesis around each field between pipes cannot be done. Unless someone knows a way to use sed or some other Linux approach to clean up the file before entering the Talend job.
Community Manager

Re: Deliminator for Input Files

Hi
Can you please give us an example of your data? and what's expected result you want?
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Deliminator for Input Files

Sure. I am unable to figure out out to do a screen shot to better explain. The first 4 rows are as expected for the three columns. Product Width and Product Height should be "0"
The bottom 4 rows are similar however the columns have shifted midway through Product Description. In the source data, the raw data would looked like " ...Handle, inside..." Where the comma occurs between Handle and inside, it appears to be marking that as a new column and causing the output to shift over for that row.
My delimiter is set to pipes -- "|" so I am not sure why a comma would read as a deliminator as well.
Output Result
Product Description | Product Width | Product Height
####### REAR DOOR Lock & hardware Actuator Toyota Camry 1997-2001 | 0 | 0
####### COWL Cowl Front insulator Toyota Camry 2010-2011 | 0 | 0
########FENDER Structural components & rails Seal Toyota Camry 2010-2011 | 0 | 0
####### FENDER Structural components & rails Seal Toyota Camry 2010-2011 | 0 | 0
--------------------------------------------------------------------------------------------------------------
######## FRONT DOOR Lock & hardware Handle | inside Toyota Camry 2005-2006 | 0
######## REAR DOOR Lock & hardware Handle | inside Toyota Camry 2005-2006 | 0
######## REAR DOOR Lock & hardware Handle | inside Toyota Camry 2005-2006 | 0
####### FRONT DOOR Lock & hardware Handle | inside Toyota Camry 2005-2006 | 0
Seventeen Stars

Re: Deliminator for Input Files

I am working nearly every day with the component tFileInputDelimited and I have never seen this component mix up the separators. Could you please provide a screenshot of the basic settings of the input component?
To be honest I cannot belief this. I am pretty sure there is a misconfiguration.
One Star

Re: Deliminator for Input Files

Hey jlolling,
I'm not allowed to post images or screen shots as my account is too new I believe (10 posts minimum first). Hopefully the info below will be enough.
This was designed as a Metadata Schema > File Delimited
------------------------------------------------------------------
File: "input_file"
Format: Unix
Encoding: US-ASCII
Field Separator: Custom ANSI > "|"
Escape Char Settings: {checked} Delimited
File_Delimited objected in the ETL Job
---------------------------------------------
Basic Settings:
CSV Row Separator "LF("\n")
Field Separator: "|"
Text enclosure: """
Escape Char: """
CSV Options: {checked}
Schema: DELIM: {schema I created and listed above}
Advanced Settings:
Advanced Separator (for numbers) Thousands separator: "," <---- The description field is varchar so I do not expect this setting would be the cause

My solution for now is to run a linux script that will remove all commas directly and everything runs as expected. I would prefer not to move forward with this method so any input would be much appreciated.
Seventeen Stars

Re: Deliminator for Input Files

Could you please provide an example of you input data? Shong has already ask for. We have seen so far only the wrong result. I will start to reproduce it and give you feedback about my test results.
BTW what version of Talend do you use?
One Star

Re: Deliminator for Input Files

The current placement of parenthesis is a bit sporadic and I am unable to force the source file provider to correct this. I am unable to get the data to read correctly so my first step was a Linux script sed command that removed all (") in the file. This would impact Handle,inside type situations where originally the parenthesis would have placed correctly around the characters.
Make Name|Model Name|Year|Section Name|Category Name| Sub Category Name|Part#|item Name|item Price|item Description,,,,,
"Toyota|Camry|""2005""-""2006""|""FRONT DOOR""|""Lock & hardware""|""Handle"," inside""|""XXXXXXXXXXX""|""All"," Charcoal Left""|00.00|"" HANDLE SUB-ASSY- DOO""",,,
"Toyota|Camry|""2005""-""2006""|""REAR DOOR""|""Lock & hardware""|""Handle"," inside""|""XXXXXXXXXXXX""|""All"," Charcoal Left""|00.00|""""",,,
"Toyota|Camry|""2005""-""2006""|""REAR DOOR""|""Lock & hardware""|""Handle"," inside""|""XXXXXXXXXXXX""|""All"," Charcoal Left""|00.00|""""",,,
"Toyota|Camry|""2005""-""2006""|""FRONT DOOR""|""Lock & hardware""|""Handle"," inside""|""XXXXXXXXXXXX""|""All"," Charcoal Right""|00.00|"" HANDLE SUB-ASSY- DOO""",,,
Seventeen Stars

Re: Deliminator for Input Files

OK I have a solution without external tools like sed:
tFileInputDelimited ---> tJavaRow ---> tFileExtractDelimited ---> .....
in tFileInputDelimited:
set as field delimiter anything what never occurs in your content to prevent line splitting, we want the whole line.
set as schema one column: line
in tJavaRow:
output_row.line = input_row.line.replaceAll("\"", "");

in tFileExtractDelimited
set your target schema (your 10 columns) and take care the input schema with only the line column is unchanged (you will have here different schemas for input and output)
set as delimiter ""
This way you will receive an output like this (the pipe is from my tLogRow):
make model year section category subcategory part item item_price item_desc
Toyota|Camry|2005-2006|FRONT DOOR|Lock & hardware|Handle, inside|XXXXXXXXXXX|All, Charcoal Left|00.00|HANDLE SUB-ASSY- DOO,,,
Toyota|Camry|2005-2006|REAR DOOR|Lock & hardware|Handle, inside|XXXXXXXXXXXX|All, Charcoal Left|00.00|,,,
Toyota|Camry|2005-2006|REAR DOOR|Lock & hardware|Handle, inside|XXXXXXXXXXXX|All, Charcoal Left|00.00|,,,
Toyota|Camry|2005-2006|FRONT DOOR|Lock & hardware|Handle, inside|XXXXXXXXXXXX|All, Charcoal Right|00.00|HANDLE SUB-ASSY- DOO,,,
One Star

Re: Deliminator for Input Files

Thanks jlolling, this should help a lot. I had not really considered a tJavaRow approach but I like the way you send the data through.
I'll keep you posted if I have anymore questions and thank you for the help!