Variable Number of Delimited Fields

One Star

Variable Number of Delimited Fields

Hi,
I need help with a task in Talend.
I have a delimeted tab file, but the number of columns is variable in each row.
Let me sample it:
john 23 productx productx add-info
jack 25 productx add-info
july 33 productx productx productx productx add-info
Theres no reserved space for products, and I need to get the add-info after.
Example (output that I need to generate):
john 23 productx productx (blank) (blank) add-info
jack 25 productx (blank) (blank) (blank) add-info
july 33 productx productx productx productx add-info
Don't know if I made myself clear enough.... But thanks for any help.
One Star

Re: Variable Number of Delimited Fields

Hope someone from Team Talend take a look at your case :-) it's kinda pretty hard :-)
These are the things that make the input file complicated:
1. the unknown number of productx
2. there is no identifier or key to each column(this will help us to determine the header value)
3. the "add-info" has no permament place.
4. it is not typical input flat file :-)
But still, your case is very much interesting :-)
Seven Stars

Re: Variable Number of Delimited Fields

If you read across each row, how do you tell which value is a "productx" and which is "add-info". Is it just that there is always exactly one "add-info" as the last value in a row?
One Star

Re: Variable Number of Delimited Fields

Hi alevy,
It seems like, the add-info always at the end, assuming yes, how can this be done?
Seven Stars

Re: Variable Number of Delimited Fields

Well, assuming:
-- we know only that there is always exactly one "add-info" as the last value in a row
-- we do not know the maximum number of "productx" there can be on any row
-- the output is also to a delimited file
-- the "add-info" must remain the last value in the row
Then we need to first read the file to find the maximum number of "productx" across all rows. Use tFileInputFullRow and send to tMap. There define a new field ProductCount = StringHandling.COUNT(row1.line,"\t")-2. The output of tMap goes to tAggregateRow, which calculates the max of all ProductCounts. The output of tAggregateRow goes to tSetGlobalVar.
Then link the first tFileInputFullRow to another identical tFileInputFullRow using OnSubjobOK. The flow from the second tFileInputFullRow goes to tJavaRow, which contains the following code:
Integer LastDelimiter = input_row.line.lastIndexOf('\t');
output_row.line = input_row.line.substring(0,LastDelimiter)
+StringHandling.STR('\t',(Integer)globalMap.get("MaxProductCount")-StringHandling.COUNT(input_row.line,"\t")+2)
+input_row.line.substring(LastDelimiter);

The flow from tJavaRow should be what you need to write to tFileOutputDelimited.
One Star

Re: Variable Number of Delimited Fields

Hi alevy,
Confirmed, it works! Smiley Happy (assuming it is delimited) You are great Smiley Wink
One Star

Re: Variable Number of Delimited Fields

Thanks alevy and lovely, will try this solution asap.
One Star

Re: Variable Number of Delimited Fields

Hi, sorry for taking so long to do the test, but I was really busy.
I tried your solution alevy, but is just outputted the same file that was inputted. Did I miss something?
Thanks.
Seven Stars

Re: Variable Number of Delimited Fields

I'd have to say: probably Smiley Happy. Did you set MaxProductCount correctly in tSetGlobalVar? Add a tLogRow after tAggregateRow and a tJava with the following code after tSetGlobalVar to test that the max has been correctly stored.
System.out.println((Integer)globalMap.get("MaxProductCount"));

If both print the same result then put up screenprints of your job.
One Star

Re: Variable Number of Delimited Fields

Geeez it's so nice to be back in Talend ( got hooked with Silverlight project recently and it's driving me nuts!!!) Smiley Happy
MaxProductCount -> This did the trick, right Alevy? Smiley Happy You must know the maximum count of the productx so that the add-info will be put on the nth(maxproductcount) place.