[resolved] Difference between "unique" and "Distinct"

Highlighted
One Star

[resolved] Difference between "unique" and "Distinct"

When I run the profiler, the 'simple' stats include two different counts: one for "unique" values and the other for "Distinct" values.
I don't understand the difference between them.
Can someone please explain how I should interpret the case when:
Row Count = 658,645 (OK, I think I understand this one Smiley Happy )
Null Count = 0 (I understand this one too Smiley Happy )
Distinct Count = 5,097
Unique Count = 541
Duplicate Count = 4,556

Employee

Re: [resolved] Difference between "unique" and "Distinct"

"Distinct count" counts the number of distinct values of your column (SELECT DISTINCT ...)
"Unique count" counts the number of distinct values with only one instance. It is necessarily less or equal to "distinct counts"
"Duplicate count" count the number of values appearing more than once.
You have the relation:
"Duplicate count" + "Unique count" = "Distinct count"
For example,
a,a,a,a,b,b,c,d,e => 9 values, 5 distinct values, 3 unique values, 2 duplicate values.

Calling Talend Open Studio Users

The first 100 community members completing the Open Studio survey win a $10 gift voucher.

Start the survey

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Enabling Data Governance

Learn how to enable Data Governance

Watch Now

The Definitive Guide to Government Data Quality

Take a peek at the definitive guide to Government Data Quality

Read