Only Load Files not in the LoadFiles Table (in case of duplicates)

Six Stars

Only Load Files not in the LoadFiles Table (in case of duplicates)

Hi, I have been able to figure out most of what I needed in Talend so far.  However, I'm at a point that I'm not sure how to make Talend work for what I need.  I have a table in SQL that I load file names and counts into as they process.  I need to add a step at the beginning that will check that table to see if a file with the same name has been loaded before and exclude it if it has.  I've tried figuring out a join for this and a tMap.  You can see my setup below.  The very end (circled) you see where i load the files into the table.  I have indicated where I want to put an element or series of, that will check that Files Loaded table before loading.  

 

Capture.JPG


Accepted Solutions
Employee

Re: Only Load Files not in the LoadFiles Table (in case of duplicates)

Hi,

 

     Please try below snippet before flat file read.

 

image.pngRead list of files usinf filelist and pass the current file name as parameter in where clause

 Connect it to a tflowtoIterate component with default key value option turned on (under Basic settings)

 

Use a Run if condition and add the below condition in it

 

!Relational.ISNULL(((String)globalMap.get("row1.file_name")))

 

image.png

 

Please mark the topic as solution provided if the answer has helped you. Kudos are also welcome :-)

 

Warm Regards,

 

Nikhil Thampi

View solution in original post


All Replies
Employee

Re: Only Load Files not in the LoadFiles Table (in case of duplicates)

Hi,

 

     Please try below snippet before flat file read.

 

image.pngRead list of files usinf filelist and pass the current file name as parameter in where clause

 Connect it to a tflowtoIterate component with default key value option turned on (under Basic settings)

 

Use a Run if condition and add the below condition in it

 

!Relational.ISNULL(((String)globalMap.get("row1.file_name")))

 

image.png

 

Please mark the topic as solution provided if the answer has helped you. Kudos are also welcome :-)

 

Warm Regards,

 

Nikhil Thampi

View solution in original post

Five Stars

Re: Only Load Files not in the LoadFiles Table (in case of duplicates)

Hi,

May this will help to load the files only once(in case of duplicates file will come)...

First pipeline: In this we store the data with the file name using thashinput component through appending the data

Second pipeline: We store that data in a temp table

Third Pipeline: In this we can join the tgt table and temp table on the basis of file name. And it will load unique files to the Target table.

 

Regards,

Akash

Six Stars

Re: Only Load Files not in the LoadFiles Table (in case of duplicates)

This looks great, I have it all setup but for some reason it's reading the file name as a column.

I'm getting and "Invalid column name" error and it lists the file name as the column.  It's failing on the tDBInput_1 and kicking out the error.

 

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog