One Star

Help! How to denormalise arbitrary numbers of multiple columns

Hi,
I'm struggling to get one of our ETL jobs working as flexibly as is needed.
We have a data file like so:
A,B,C,D,F1,F2,G1,G2,H1,H2 and so on (In reality there are 9 columns per "set" rather than 2 in this example)
There are an arbitrary number of column pairs on the end and is different on every row.
We essentially want to produce something like this:
A,B,C,D,F1,F2
A,B,C,D,G1,G2
A,B,C,D,H1,H2
I've tried using tFileInputRegex to produce A,B,C,D,pair_string columns and then used tDenormalize but the tDenormalize component only appears to use 1 column so we'd end up with:
A,B,C,D,F1
A,B,C,D,F2
Using a really wide table and then using split row is fine, except that we have to "hard-code" a maximum number of pairs.
It feels like I need some sort of tNormalizeMultipleColumns or tExtractRegexFields_Recursive component to do this.
I'm hoping I've missed something obvious and someone can just go "you need to whack together X,Y,Z and it'll work fine!"
Help!
Thanks,
Rob
9 REPLIES
Seven Stars

Re: Help! How to denormalise arbitrary numbers of multiple columns

I don't think there's an easy way to do this. I'd be looking at reading the file using tFileInputFullRow and then in tJavaRow, splitting the fields to an array and then looping through the results putting them back together in groups but with a different separator between the groups so that you can then use tNormalize to put each group on a separate row.
Seventeen Stars

Re: Help! How to denormalise arbitrary numbers of multiple columns

You could try the Talend Exchange component tSchemaNormalize.
You need at first a schema what reflects the whole width of your input data and this schema can be normalized to a schema like you need.
http://www.talendforge.org/exchange/index.php?eid=817&product=tos&action=view&nav=1,1,1
One Star

Re: Help! How to denormalise arbitrary numbers of multiple columns

I don't think there's an easy way to do this. I'd be looking at reading the file using tFileInputFullRow and then in tJavaRow, splitting the fields to an array and then looping through the results putting them back together in groups but with a different separator between the groups so that you can then use tNormalize to put each group on a separate row.

This is pretty much what I was coming to; however I was trying to find a way to "produce" rows from the tJava or tJavaFlex components but it seems to be impossible/not well documented. I'll try switching every n-th comma to a semic-colon for now and use tNormalize; thanks for the suggestion.
You could try the Talend Exchange component tSchemaNormalize.

Thanks! I'd not seen that, very useful! Unfortunately like you said, I'd have to add a maximum number of columns and this won't quite fit the bill.
Many thanks for your input!
Rob
Seven Stars

Re: Help! How to denormalise arbitrary numbers of multiple columns

You could try coding a loop into tJavaFlex to create multiple rows (I did get this to work once but then couldn't make it work the next time I tried it) but this is very non-standard and liable to break in later versions.

Re: Help! How to denormalise arbitrary numbers of multiple columns

to make a javaFlex produce data, you have a few options:
As the "start" component:
this is the easy way. start a loop in the "Begin" section, assign values to the output row in the "Main" section and close the loop in the "End"
As part of a flow:
When the flex is part of a flow, the MAIN section will be called once per input row and will always produce at least one output row per input row (so filtering rows is not really possible). To add in extra rows mid-flow, you will need a loop in your MAIN section that assigns to the output row-- This loop will get processed once per input row.

edit: on second thought, adding rows when the flex is a part of a flow is probably not possible. use a tMap to multiply and then a flex to process. Smiley Happy
Seven Stars

Re: Help! How to denormalise arbitrary numbers of multiple columns

Like I said, I did manage to get the flex to filter or multiply my rows by putting a for loop declaration in the main section and closing it in the end section but it depends on which other components are in your flow. It worked following tSort but not following tFileInput (because my closing brace was put after the code closing the input).
One Star

Re: Help! How to denormalise arbitrary numbers of multiple columns

edit: on second thought, adding rows when the flex is a part of a flow is probably not possible. use a tMap to multiply and then a flex to process. Smiley Happy

Hiya,
Sorry I'm totally confused by that, how can you configure a tMap to produce rows?
Thanks,
Seventeen Stars

Re: Help! How to denormalise arbitrary numbers of multiple columns

I have no clue, what prevents you from using my suggestion. The mentioned component tNormalizeSchema does EXACTLY what you need.
This component has a PDF documentation and if you would take a look into the examples, you would find exactly your use case.
I have created this component because I had a use case with 44 columns in 23 sets. You can probably imagine I was totally unable to handle this with the build in possibilites.
One Star

Re: Help! How to denormalise arbitrary numbers of multiple columns

I have no clue, what prevents you from using my suggestion. The mentioned component tNormalizeSchema does EXACTLY what you need.
This component has a PDF documentation and if you would take a look into the examples, you would find exactly your use case.

Hi jlolling,
Apologies, I might not have explained the situation correctly. I have actually investigated this component (after your suggestion); what's stopping me is that the number of columns in the input schema varies wildly. Some of these files are 60 columns, others are 700; it is expected that the number of columns will increase over time and *needs* to be dynamic (requirement set on me by the data provider ¬_¬, I'm currently attempting to get them to change it into a more sensible schema but that's another issue).
Therefore (sadly I might add!) I can't create the input schema in a generic way; I was hoping that I might be able to team it with a dynamic schema (that takes the names from the headings). However the headings don't go all the way out!
Thanks,
Rob