Listing values missing from lookup tables while loading?

One Star

Listing values missing from lookup tables while loading?

Hello.

I'm loading rows from a csv file, transforming them a bit and
loading into a DB, so on a high level my process looks like this:

tFileInputDelimited_1 --(row1)-> tMap_1 --(out1)-> tPostgresqlOutput_1


Once the data is in the DB, some of the columns are joined against
lookup tables to get more readable values in reports and what not.
Well, every now and then now values which aren't in the lookup
tables appear and I'm trying to work out a way to list the new values
from Talend while loading the data. I added a new output out2 to
tMap_1 to which I INNER JOINED with catch lookup inner join rejects
the values that are missing from the lookup table:


tPostgresqlInput_1
|
(row2)
|
v
tFileInputDelimited_1 --(row1)-> tMap_1 --(out1)-> tPostgresqlOutput_1

|
(rejects out2)
|
v
tLogRow

This produced all the missing values to tLogrow all right
(with duplicates, though but guess I can handle that with
Insert or update action in tPostgresqlOutput_1 later) but
it also removed the rows in question from out1.

I would like the missing values to be outputed to out2
but also to output the rows with missing values to out1.
How would you handle described situation?
One Star

Re: Listing values missing from lookup tables while loading?

                         tPostgresqlInput_1
|
(row2)
|
v
tFileInputDelimited_1 --(row1)-> tMap_1 --(out1)-> tPostgresqlOutput_1
|
(rejects out2)
|
v
tLogRow

OK, that latter "image" didn't come out right. Sorry about that.
Four Stars

Re: Listing values missing from lookup tables while loading?

Hi James,

I think two step process is required...
1 - Insert data from flat file to database
OnSubjobOk
2 - Read input file once again, lookup with tMap using Inner join and get the reject output

Will this help?

Thanks
Vaibhav
One Star

Re: Listing values missing from lookup tables while loading?

sanvaibhav wrote:
Read input file once again, lookup with tMap using Inner join and get the reject output

Hi Vaibnav,

and thanks for your reply.

I guess that would work but the destination table, due to
reasons not in my power, is a dump with lots of rows and
no indexes so I was looking for some other kind of solution.

I guess I could try to use tReplicate and process them in
two different branches in lack of more elegant solutions.
I know, it's only ETL.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch