tUniqrow: Sending all duplicate rows to Duplicates destination

One Star

tUniqrow: Sending all duplicate rows to Duplicates destination

I am attempting to use a tUniqRow transform for its intended purpose -- row deduplication. However, tUniqRow sends the "First" duplicate row to the Uniques output path and each subsequent duplicate row to the Duplicates output path.
The requirements for my current project dictate that ALL rows which match one another (based on defined keys) are passed to the Duplicates output path and only rows which are truly unique be passed to the Uniques path.
To comply with this requirement, I am attempting to outer join (via tMap or tFuzzyMatch) the Uniques output with the Duplicates output from the same tUniqRow transform. However, Talend does not allow me to connect the two data paths from tUniqRow ? even when those paths are ?interrupted? by another transform. The designer refuses to connect the 2nd (lookup) row path to the tMap or tFuzzyMatch component.
The described section of this job is designed as follows:
         --> tSortRow  -->
tUniqRow tMap or tFuzzyMatch
--> tSortRow -->

Is this a bug or is there an inherent reason why the Uniques and Duplicates data streams can?t be connected?
Community Manager

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

Hello
Talend doesn't allow a cycle flow in a job. see 1468
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

Talend doesn't allow a cycle flow in a job. see 1468

So, the solution is to manually "cache" (by creating interim file or table outputs) the data between steps?
Also, do you know of a better way to approach this problem?
Community Manager

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

Hello
Also, do you know of a better way to approach this problem?

Can you take an example to describe your input data and expected result?
Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

Can you take an example to describe your input data and expected result?

Example input csv file:
Name, Address
John Smith, 111 Main Street
Bob Dole, 1234 Pine Street
John Smith, 111 Main Street

The tUniqRow component processes this file as follows:
--Uniques--
John Smith, 111 Main Street
Bob Dole, 1234 Pine Street

--Duplicates--
John Smith, 111 Main Street

But the project requirements dictate that we produce output as follows:
--Uniques--
Bob Dole, 1234 Pine Street

--Duplicates--
John Smith, 111 Main Street
John Smith, 111 Main Street

I suspect this can be done by joining the 2 tUniqRow output streams to each other (several times), but I am still working through the proof-of-concept. I am hoping there is a better way to approach this problem.
Community Manager

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

Hello
You need to split your job to two subjobs, as Talend don't allow cycle flow in a job. Please see my scenario,
in.csv:

John Smith, 111 Main Street
Bob Dole, 1234 Pine Street
John Smith, 111 Main Street
shong, 222 main Street

Result:
Starting job forum6139 at 09:45 13/04/2009.
.--------+-----------------.
| tLogRow_1 |
|=-------+----------------=|
|name |address |
|=-------+----------------=|
|Bob Dole| 1234 Pine Street|
|shong | 222 main Street |
'--------+-----------------'
Job forum6139 ended at 09:45 13/04/2009.

Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

Hi,
It seams a bit of a misnomer to have the Uniqrow component return non unique rows (I've also just found out that the tMap Unique Match is not a unique match but last match).
Perhaps the option to not return rows with duplicates could be added to advanced options for tUniqrow?
Cheers Andy
Edit -----------------------------
My Solution
1. Store Unique's and Duplicates in Hash Map
2. Inner Join these back together on the same basis (This case First/Last Name)
3. Catch Inner join rejects
These are then your Unique Unique's
Edit ---------------------------
See https://jira.talendforge.org/browse/TDI-28405 for tUniqRow bug and
https://jira.talendforge.org/browse/TDI-28406 tMap bug
One Star

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

I agree with Andy, this is odd that we cannot really filter / identify the duplicates directly with tUniqRow.
On my side , I've got a required to identify all duplicates line, I've done a variation of the previous solution using a tJoin with the initial file and the duplicates rows that the tUniqRow give as a result.
Community Manager

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

Hi,
It seams a bit of a misnomer to have the Uniqrow component return non unique rows (I've also just found out that the tMap Unique Match is not a unique match but last match).
Perhaps the option to not return rows with duplicates could be added to advanced options for tUniqrow?
Cheers Andy

Hi Andy
Yes, a new feature can be added to this component to get the rows that really unique in the source data, this is a common request, can you please report a feature issue in our bugtracker?
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

Hi added bugs to the tracker links above, I consider these bugs as this behaviour contradicts the expected results and is not documented.
Community Manager

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

Hi moinerus
Can you please add the issue number here? It will be convenient and helpful for others to trace and monitor the issue you have reported.
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star RSA
One Star

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

TDI-28405 tUniqRow returns first duplicate for uniques and not all duplicates for duplicates
One Star

Re: tUniqrow: Sending all duplicate rows to Duplicates destination

hi folks , can any one help > Need use tUniqRow I have no Ideia  .. Need to do exactly what is show bellow .. 
Name, Address
John Smith, 111 Main Street
Bob Dole, 1234 Pine Street
John Smith, 111 Main Street

The tUniqRow component processes this file as follows:
--Uniques--
John Smith, 111 Main Street
Bob Dole, 1234 Pine Street

--Duplicates--
John Smith, 111 Main Street

Tks for your time
Rgs