[resolved] Transitivity Problem

One Star

[resolved] Transitivity Problem

Hi there,
I have a MySQL Table with potential Duplicates in it.
Each record is a couple of Id like*
Cp1 Cp2
Cp1 Cp3
Cp2 Cp4
Cp5 Cp6
( ... )
I'm assured that each couple is uniq.
I would like to have, in output, a table with an unique Id for each Duplicate Group... and a Row for each Cp.
So with the former example, i would like to have in output
Group1 Cp1
Group1 Cp2
Group1 Cp3
Group1 Cp4
Group2 Cp5
Group2 Cp6
I sorted my first table by Cp1 then Cp2 and tried to work with Vars to Create my Groups...
The problem i have is that i would like to ba able to make a lookup on the same table i'm writing in... just to be sure i don't write two time the same Cp...
Is this at all possible ?
Or do you have a better way to solve this problem ?
Thanks for reading me and for providing me any input on this case.
Regards,
Amaranthe.

Accepted Solutions
One Star

Re: [resolved] Transitivity Problem

Sorry i understood my mistake !!!!
I normalized on the Id and not on the Value !
It now works exactly as i want !!
Many Many Thanks.
Regards,
Amaranthe.

All Replies
Community Manager

Re: [resolved] Transitivity Problem

Hi
One quick question, based on the input example, why CP1, 2, 3, and 4 are grouped into group1? and CP5, 6 are grouped into group2?
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] Transitivity Problem

I want to have in output Groups of potential Duplicates...
So if
Cp1 Cp2
Cp1 Cp3
Cp2 Cp4
Cp5 Cp6
Cp1 is a Dup of Cp2
Cp1 is a Dup of Cp3
So Cp1, Cp2 & Cp3 are to be in the same potential duplicates group...
And so on...
Hope that i answered your question.
Seven Stars

Re: [resolved] Transitivity Problem

I had fun with this one and learnt a bit more Java myself!
tFixedFlowInput_2 is just to create the flow of ID pairs. You would replace with tMysqlInput.
tJavaFlex Start code
java.util.ArrayList<String> list1 = new java.util.ArrayList();
String group = "";
Boolean incl1 = false;
Boolean incl2 = false;
tJavaFlex Main code
for (int i=0; i<list1.size(); i++) {
group = list1.get(i);
incl1 = group.contains(row4.ID1);
incl2 = group.contains(row4.ID2);
if (incl1) {
if (!incl2) {
group += "," + row4.ID2;
list1.set(i,group);
break;
}
} else if (incl2) {
group += "," + row4.ID1;
list1.set(i,group);
break;
}
}
if (!incl1 && !incl2) {
list1.add("" + row4.ID1 + "," + row4.ID2);
}
tJavaFlex End code
globalMap.put("list", list1);
tLoop
From: 0
To: ((java.util.ArrayList)globalMap.get("list")).size()-1
Step: 1
tFixedFlowInput_3
GroupID = "Group"+(Integer)globalMap.get("tLoop_1_CURRENT_ITERATION")
GroupValue = ((java.util.ArrayList<String>)globalMap.get("list")).get((Integer)globalMap.get("tLoop_1_CURRENT_VALUE"))
One Star

Re: [resolved] Transitivity Problem

i'm flabbergasted !
It's exactly what i asked for !
Or... almost.
You provided me with :
Group1 Cp1;Cp2;Cp3;Cp4
Group2 Cp5;Cp6
Group3 Cp7;Cp8
And i would like to have
Group1 Cp1
Group1 Cp2
Group1 Cp3
Group1 Cp4
Group2 Cp5
Group2 Cp6
Group3 Cp7
Group3 Cp8
I will try to modify the stage to be able to do what i want.
Problem is i'm VERY new to talend. I had a talend developer last week to work on this project but he is now gone on another and i have to work this out ofr myself and what he left me.
I'm a DataStage expert so i'm not new with ETLs... but i don't know JAVA at all and he explained me how Talend works in only a couple of hours...
So i'm learning... and i'm learning the hard way Smiley Wink
So many many thanks for you't most welcomed help !
Regards,
Amaranthe.
Seven Stars

Re: [resolved] Transitivity Problem

It seems you've not followed exactly what I suggested, apparently using semi-colons to append the values in tJavaFlex and comma in tNormalize, so tNormalize has not done anything. It works exactly as you want for me.
What you want is quite a complex requirement, requiring a significant amount of Java. Normally, most Talend jobs should not need a tJavaFlex or ArrayLists.
One Star

Re: [resolved] Transitivity Problem

Thanks for being so rapid to answer.
I'm using comma in the tJavaFlex. I only modified your code to replace Id1 & ID2 by the real name of the fields in my table.
I'm also using a "," delimiter in the tNormalize.
What i get in the tlogrow is :
GroupID GroupValue
Group1|CMP-000030306-EV2310,CMP-000026251-FR3897
Group2|CMP-000026250-FR6731,CMP-000029469-EL0010,CMP-000029469-EL2007,CMP-000029469-EL0011,CMP-000029469-EL0012,CMP-000029469-EL0014,
Group3|CMP-000026264-EV0253,CMP-000006790-FR0111
Group4|CMP-000029469-EL0020,CMP-000029469-EL0021,CMP-000029469-EL0022
Group5|CMP-000262549-FR4853,CMP-000032238-JVFDOV
Group6|CMP-000002092-D01853,CMP-000030794-RQO6V1
I'm sorry if don't understand something obvious but as i explained it's my first dev on Talend... the developper who left let me the very last job to develop on my own and yes, i know it's not the easier to start with.
Regards
Amaranthe.
One Star

Re: [resolved] Transitivity Problem

Sorry i understood my mistake !!!!
I normalized on the Id and not on the Value !
It now works exactly as i want !!
Many Many Thanks.
Regards,
Amaranthe.

15TH OCTOBER, COUNTY HALL, LONDON

Join us at the Community Lounge.

Register Now

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Downloads and Trials

Test drive Talend's enterprise products.

Downloads