tExtractDelimited, tUnpivot, etc...

One Star

tExtractDelimited, tUnpivot, etc...

Hi,
I have an input file like this:
ID;First_Name;Last_Name;Team
835;Lionel|Cristano;Messi|Ronaldo;Barca|Real
836;Andres|Lionel;Iniesta|Messi;Barca|Barca
And I'd like to have something like this:
ID;First_Name;Last_Name;Team
835;Lionel;Messi;Barca
835;Cristiano;Ronaldo,Real
836;Andres;Iniesta;Barca
836;Lionel;Messi;Barca
(Just for information, the final goal is to have two tables:
-one with the referential of person:
ID_Person;Name;Last_Name;Team
1;Lionel;Messi;Barca
2;Cristiano;Ronaldo,Real
3;Andres;Iniesta;Barca
-and one with the link ID-ID_Person:
ID;ID_Person
835;1
835;2
836;3
836;1)
I use Talend all the time for my job but I don't manage to do this, I try with tExtractDelimitedFields, tDenormalize and tUnpivot... Anyone could help me?
Thank you,
Romain
One Star

Re: tExtractDelimited, tUnpivot, etc...

For information, with 3 tExtractDelimited and 1 tUnpivotRow, I manage to have something like this, but it's not really what I want... :
ID;Column1;Column2
835;First_Name;Lionel
835;First_Name;Cristiano
835;Last_Name;Messi
835;Last_Name;Ronaldo
835;Team;Barca
835;Team;Real
836;First_Name;Andres
836;First_Name;Lionel
836;Last_Name;Iniesta
836;Last_Name;Messi
836;Team;Barca
836;Team;Barca
One Star

Re: tExtractDelimited, tUnpivot, etc...

I would use simple tFileInputDelimited using ";" as a delimiter.
so the schema returns 4 columns
you can now connect the schema to a tMap with a lookup of fixed values (0 and 1)
The lookup must have Load Once and All rows settings with no real join to the main flow.. I am using hardcoded values tFixedFlowInput with schema: Idx, integer
csv.First_Name.split("\\|") - gives me first persons fname for the (0) value of Idx
csv.First_Name.split("\\|") - gives me second persons fname for the (1) value of Idx
apply the same function for Last_Name and Team columns
The lookup will act as a multiplier so if the tMap has 2 main rows from csv and 2 rows from your lookup the output will be 2x2 = 4
if 3 rows from csv the output will be 3x2 = 6 and so on ...
Now it should be easy if you have output with the normalised values:
ID;First_Name;Last_Name;Team
835;Lionel;Messi;Barca
835;Cristiano;Ronaldo,Real
836;Andres;Iniesta;Barca
836;Lionel;Messi;Barca
you can do an insert update on a Person table and than your link table...
Hope you got the idea.
One Star

Re: tExtractDelimited, tUnpivot, etc...

Hello Lubod,
thank you very much for your answer, it works! Very good trick.
In fact, sometime I have one person per ID, sometime 2, sometime 3 (or more).
The input could be like that:
ID;First_Name;Last_Name;Team
835;Lionel|Cristano;Messi|Ronaldo;Barca|Real
836;Andres|Lionel|Franck;Iniesta|Messi|Ribery;Barca|Barca|Bayern
837;Lionel;Messi;Barca
So I use one tFixedFlowInput (name=two_id) with (0,1), on another (name=three_id) with (0,1,2)
And I choose which one to use with a StringHandling.COUNT.
In the tmap, the function is:
StringHandling.COUNT(First_Name,"|")==0?First_Name:
StringHandling.COUNT(First_Name,"|")==1?First_Name.split("\\|") :
StringHandling.COUNT(row10.First_Name,"|")==2?First_Name.split("\\|") :"more than 3 persons"
I use a tUniqRow after that because I have two many rows (due to the two lookup inputs), but it works.
The job doesn't look so bad, but if I have 10 persons per ID, I have to put 10 lookup outputs, and it will generate a lot of rows before the tUniqRow component!
If you have a better idea, let me know!
thank you again,
Romain
One Star

Re: tExtractDelimited, tUnpivot, etc...

What I ment was just one lookup with multiple rows.
You could either have tFixedFlow with fixed number of rows or use tRowGenerator to generate required number of rows.

This transformation would be executed for all values/numbers returned by your lookup.
csv.First_Name.split("\\|")