@xdshi Thank you for taking your time to reply.
Yes, you are correct to a certain degree. I want to only insert new data in the person telephone table if the duplicated candidate contains unique values. So, I will first need to identify the duplicate records based on telephone numbers, then from this duplication list I will need to establish which record from the duplication will be master and which will be child- depending on various scenarios, but as an example, let say latest activity against the record is classed as master. If the record contains same value in this case telephone number, then skip insert, otherwise insert new data in Master record (person telephone table), then remove child data or perhaps overwrite one of the values to "merged" in order to carry out deletion at later stage.
In order to identify duplicates I would have written query in the talend, however, I do not think this will help, as this will only pull through records where the numbers are same, I want duplication but also to see additional numbers against the record.
select
if(act.actDate >b.duplicateDate,'Master',
if(b.duplicateDate > act.actDate,'Master','Child')) Classification
,if(act.actDate >b.duplicateDate,a.PersonID,
if(b.duplicateDate > act.actDate,b.duplicateID,'Unsure')) ClassificationID
,concat(p.firstname,' ',p.surname) name
,a.PersonID
,a.CleanedTelephoneNumber
,b.duplicateName
,b.duplicateID
,b.duplicateTele
, act.actDate
,b.duplicateDate
from person_telephone a
join person p on p.PersonID = a.PersonID
join candidate c on c.Candidate_ID = p.PersonID
join
(
select
t.PersonID duplicateID,CleanedTelephoneNumber duplicateTele,concat(p.firstname,' ',p.surname) duplicateName,p.firstname,p.surname, act.actDate duplicateDate
from person_telephone t
join person p on p.PersonID = t.PersonID
join candidate c on c.Candidate_ID = p.PersonID
left join
(
select
j.JournalItemId
,j.ItemCreationDate actDate
,mx.can
from journal_item j
join
(
select max(je.journalitemid) ID
,je.actionid can
from journal_entity je
join journal_item j on j.JournalItemId = je.JournalItemId
where EntityType = 1
#and je.ActionId = 87009
group by je.ActionId
) mx on mx.id = j.JournalItemId
) act on act.can = c.candidate_id
) b
on b.duplicateTele = a.TelephoneNumber and b.duplicateID <> a.PersonID and b.surname = p.Surname and b.firstname = p.FirstName
left join
(
select
j.JournalItemId
,j.ItemCreationDate actDate
,mx.can
from journal_item j
join
(
select max(je.journalitemid) ID
,je.actionid can
from journal_entity je
join journal_item j on j.JournalItemId = je.JournalItemId
where EntityType = 1
#and je.ActionId = 87009
group by je.ActionId
) mx on mx.id = j.JournalItemId
) act on act.can = c.candidate_id
limit 10
;
Now from that I want only the unique phone number from child to go into master record in the person telephone table.
**Note: all our tables have auto incremented PK therefore the two duplicate records will contain different ID**
I hope this makes sense.