One Star

[resolved] Transitivity Problem

Hi there,
I have a MySQL Table with potential Duplicates in it.
Each record is a couple of Id like*
Cp1 Cp2
Cp1 Cp3
Cp2 Cp4
Cp5 Cp6
( ... )
I'm assured that each couple is uniq.
I would like to have, in output, a table with an unique Id for each Duplicate Group... and a Row for each Cp.
So with the former example, i would like to have in output
Group1 Cp1
Group1 Cp2
Group1 Cp3
Group1 Cp4
Group2 Cp5
Group2 Cp6
I sorted my first table by Cp1 then Cp2 and tried to work with Vars to Create my Groups...
The problem i have is that i would like to ba able to make a lookup on the same table i'm writing in... just to be sure i don't write two time the same Cp...
Is this at all possible ?
Or do you have a better way to solve this problem ?
Thanks for reading me and for providing me any input on this case.
Regards,
Amaranthe.
1 ACCEPTED SOLUTION

Accepted Solutions
One Star

Re: [resolved] Transitivity Problem

Sorry i understood my mistake !!!!
I normalized on the Id and not on the Value !
It now works exactly as i want !!
Many Many Thanks.
Regards,
Amaranthe.
7 REPLIES
Community Manager

Re: [resolved] Transitivity Problem

Hi
One quick question, based on the input example, why CP1, 2, 3, and 4 are grouped into group1? and CP5, 6 are grouped into group2?
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] Transitivity Problem

I want to have in output Groups of potential Duplicates...
So if
Cp1 Cp2
Cp1 Cp3
Cp2 Cp4
Cp5 Cp6
Cp1 is a Dup of Cp2
Cp1 is a Dup of Cp3
So Cp1, Cp2 & Cp3 are to be in the same potential duplicates group...
And so on...
Hope that i answered your question.
Seven Stars

Re: [resolved] Transitivity Problem

I had fun with this one and learnt a bit more Java myself!
tFixedFlowInput_2 is just to create the flow of ID pairs. You would replace with tMysqlInput.
tJavaFlex Start code
java.util.ArrayList<String> list1 = new java.util.ArrayList();
String group = "";
Boolean incl1 = false;
Boolean incl2 = false;
tJavaFlex Main code
for (int i=0; i<list1.size(); i++) {
group = list1.get(i);
incl1 = group.contains(row4.ID1);
incl2 = group.contains(row4.ID2);
if (incl1) {
if (!incl2) {
group += "," + row4.ID2;
list1.set(i,group);
break;
}
} else if (incl2) {
group += "," + row4.ID1;
list1.set(i,group);
break;
}
}
if (!incl1 && !incl2) {
list1.add("" + row4.ID1 + "," + row4.ID2);
}
tJavaFlex End code
globalMap.put("list", list1);
tLoop
From: 0
To: ((java.util.ArrayList)globalMap.get("list")).size()-1
Step: 1
tFixedFlowInput_3
GroupID = "Group"+(Integer)globalMap.get("tLoop_1_CURRENT_ITERATION")
GroupValue = ((java.util.ArrayList<String>)globalMap.get("list")).get((Integer)globalMap.get("tLoop_1_CURRENT_VALUE"))
One Star

Re: [resolved] Transitivity Problem

i'm flabbergasted !
It's exactly what i asked for !
Or... almost.
You provided me with :
Group1 Cp1;Cp2;Cp3;Cp4
Group2 Cp5;Cp6
Group3 Cp7;Cp8
And i would like to have
Group1 Cp1
Group1 Cp2
Group1 Cp3
Group1 Cp4
Group2 Cp5
Group2 Cp6
Group3 Cp7
Group3 Cp8
I will try to modify the stage to be able to do what i want.
Problem is i'm VERY new to talend. I had a talend developer last week to work on this project but he is now gone on another and i have to work this out ofr myself and what he left me.
I'm a DataStage expert so i'm not new with ETLs... but i don't know JAVA at all and he explained me how Talend works in only a couple of hours...
So i'm learning... and i'm learning the hard way Smiley Wink
So many many thanks for you't most welcomed help !
Regards,
Amaranthe.
Seven Stars

Re: [resolved] Transitivity Problem

It seems you've not followed exactly what I suggested, apparently using semi-colons to append the values in tJavaFlex and comma in tNormalize, so tNormalize has not done anything. It works exactly as you want for me.
What you want is quite a complex requirement, requiring a significant amount of Java. Normally, most Talend jobs should not need a tJavaFlex or ArrayLists.
One Star

Re: [resolved] Transitivity Problem

Thanks for being so rapid to answer.
I'm using comma in the tJavaFlex. I only modified your code to replace Id1 & ID2 by the real name of the fields in my table.
I'm also using a "," delimiter in the tNormalize.
What i get in the tlogrow is :
GroupID GroupValue
Group1|CMP-000030306-EV2310,CMP-000026251-FR3897
Group2|CMP-000026250-FR6731,CMP-000029469-EL0010,CMP-000029469-EL2007,CMP-000029469-EL0011,CMP-000029469-EL0012,CMP-000029469-EL0014,
Group3|CMP-000026264-EV0253,CMP-000006790-FR0111
Group4|CMP-000029469-EL0020,CMP-000029469-EL0021,CMP-000029469-EL0022
Group5|CMP-000262549-FR4853,CMP-000032238-JVFDOV
Group6|CMP-000002092-D01853,CMP-000030794-RQO6V1
I'm sorry if don't understand something obvious but as i explained it's my first dev on Talend... the developper who left let me the very last job to develop on my own and yes, i know it's not the easier to start with.
Regards
Amaranthe.
One Star

Re: [resolved] Transitivity Problem

Sorry i understood my mistake !!!!
I normalized on the Id and not on the Value !
It now works exactly as i want !!
Many Many Thanks.
Regards,
Amaranthe.