How to do Profile on multiple tables (profiling is on single table only but data set is from multiple table)

Five Stars

How to do Profile on multiple tables (profiling is on single table only but data set is from multiple table)

Hi Experts,

 

I am exploring how best Talend DQ can be best fit into my clients req ... I have a business scenario as follows I have 3 tables customer,customer_email,email_status (common column across 3 tables is emailid(an int column)) below is the structure and data

 

 

FIND THE SCRIPTS for mysql 
 
drop table customer;
 
drop table customer_email;
 
drop table email_status;
 
 
 
create table customer(custid int,cname varchar(50),cemailid int,country varchar(5));
create table customer_email(emailid int,email varchar(100));
create table email_status(emailid int,emailstatus varchar(1));
 
 
insert into customer(custid  ,cname  ,cemailid  ,country)values(1,'raj',11,'US');
 
insert into customer(custid  ,cname  ,cemailid  ,country)values(2,'anil',12,'US');
 
insert into customer(custid  ,cname  ,cemailid  ,country)values(3,'tarun',13,'US');
 
insert into customer(custid  ,cname  ,cemailid  ,country)values(4,'kanth',14,'US');
 
insert into customer(custid  ,cname  ,cemailid  ,country)values(5,'pavan',15,'UK');
 
insert into customer(custid  ,cname  ,cemailid  ,country)values(6,'abhi',16,'UK');
 
insert into customer(custid  ,cname  ,cemailid  ,country)values(7,'vikky',17,'UK');
 
insert into customer(custid  ,cname  ,cemailid  ,country)values(8,'srihas',18,'UK');
 
insert into customer(custid  ,cname  ,cemailid  ,country)values(9,'rithwik',19,'CA');
 
insert into customer(custid  ,cname  ,cemailid  ,country)values(10,'tej',20,'CA');
 
 
insert into customer_email(emailid  ,email)values(11,'Raj+123@gmail.com' );
 
insert into customer_email(emailid  ,email)values(12,'anil+anil@gmail.com' );
 
insert into customer_email(emailid  ,email)values(13,'tarun@gmail.com' );
 
insert into customer_email(emailid  ,email)values(14,'kanth+123@gmail.com' );
 
insert into customer_email(emailid  ,email)values(15,'papvan+234@gmail.com' );
 
insert into customer_email(emailid  ,email)values(16,'abhi@gmail.com' );
 
insert into customer_email(emailid  ,email)values(17,'vikky+vikky@gmail.com' );
 
insert into customer_email(emailid  ,email)values(18,'srihas@gmail.com' );
 
insert into customer_email(emailid  ,email)values(19,'rithwik+rithwik@gmail.com' );
 
insert into customer_email(emailid  ,email)values(20,'tej@gmail.com' );
 
 
insert into email_status(emailid  ,emailstatus )values(11,'A' );
 
insert into email_status(emailid  ,emailstatus )values(12,'I' );
 
insert into email_status(emailid  ,emailstatus )values(13,'A' );
 
insert into email_status(emailid  ,emailstatus )values(14,'I' );
 
insert into email_status(emailid  ,emailstatus )values(15,'A' );
 
insert into email_status(emailid  ,emailstatus )values(16,'I' );
 
insert into email_status(emailid  ,emailstatus )values(17,'A' );
 
insert into email_status(emailid  ,emailstatus )values(18,'I' );
 
insert into email_status(emailid  ,emailstatus )values(19,'A' );
 
insert into email_status(emailid  ,emailstatus )values(20,'A' );
 
 
 
select * from customer;
 
select * from customer_email;
 
select * from email_status;
 
 
 
select c.cemailid,ce.email,c.country,es.emailstatus from  customer c, customer_email ce, email_status es
where c.cemailid=ce.emailid and ce.emailid=es.emailid and es.emailstatus='A' and  email NOT REGEXP '^[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,4}$';
 
 
 
I want to profile on email column(varchar column)  in customer_email
(basically want to apply a regex to find invalid emails) 
 
But i have some different conditions like (want to profile on country "US" only and want to profile only for active email's) 
because of which I have to join all those 3 tables
 
I have following 2 queries
 
  1. How can I join those 3 tables and do profiling on one table column(What type of analysis I have to do)
  2. I want the o/p of regex on email column should be country wise 
how can I achieve this ??
expected o/p
 
 
Country US
label      Match %            Not Match %            Match#         Not Match#
email       50%                        50%                          1                    1
 
 
Country UK
label      Match %            Not Match %            Match#         Not Match#
email         0%                        100%                          0                  2 
 
 
Country CA
label      Match %            Not Match %            Match#         Not Match#
email       50%                        50%                          1                    1
 
 
Country US+UK+CA
label      Match %            Not Match %            Match#         Not Match#
email       33.33%                  66.67%                    2                       4        
 
 
here total records for US country with status "A" are 2
 
here total records for UK country with status "A" are 2
 
here total records for CA country with status "A" are 2 
 

The other general queries as follows

 

  1. how to generate PDF for DQ analysis report
  2. what is talend_DQ data mart will it be created automatically ?
  3. how to view the reports (generated PDF's) from where can we view these reports? do we need to make any separate configuration for this I have downloaded big data version ...
  4. Do we have any training material or videos for reference ?
  5. How can I see the data quality over a period of time(how it improved )

15TH OCTOBER, COUNTY HALL, LONDON

Join us at the Community Lounge.

Register Now

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Enabling Data Governance

Learn how to enable Data Governance

Watch Now

The Definitive Guide to Government Data Quality

Take a peek at the definitive guide to Government Data Quality

Read