How to perform data insertion in an already filled postgresql table without duplication ?

Highlighted
Five Stars

How to perform data insertion in an already filled postgresql table without duplication ?

Hi 

I want to insert rows of data in a Postgresql table via Talend, how can i make sure that if a row already exist in the database it won't get inserted ? I don't have a specific key to differ each row .

 

Here's the job im working on 


Accepted Solutions
Forteen Stars

Re: How to perform data insertion in an already filled postgresql table without duplication ?

of course, you can use a set of columns as key

 

you do not need to define all these columns in a database as a primary key (it good but not mandatory)

but definitely, you need to create indexes for these columns (could be 1 combined)

 

after, you could use "insert or update" in tPostgreSQLOutput component

 

add. if you need "insert-ignore" which is not available, you have a choice from 3:

- insert all data into temp table on target server and use tSQLRow for run INSERT SELECT WHERE NOT EXISTS

- use tSQLRow with prepared statements instead of tDBOutput - and run INSERT INTO TABLE (a,b,c) values (?,?,?) - check docs or forum about prepared statements in Talend

- if data relativly small - use tMap with catch rejected by INNER JOIN (tMap also can join by many columns) 

-----------

All Replies
Forteen Stars

Re: How to perform data insertion in an already filled postgresql table without duplication ?

Hi,

 

not depending from the database and used tools (not only for PostgreSQL + Talend)

 

for prevent duplicates - you MUST have a unique key or combination of columns

 

in other case you will always have a risk of duplicates

 

what possible to do, always depend from environment and you real goals and needs

-----------
Five Stars

Re: How to perform data insertion in an already filled postgresql table without duplication ?

hello vapukov and thank you for your answer ,

The dataset im working on are tickets extracted from JIRA. The table basically have the name of the ticket, date of creation, status of the ticket ( open, in progress, in review , closed...) . Now my new question is can i have a set of column as a key? for exemple those three columns at once as a key ( because obviously the same ticket can have different updated status in one day ) 

Thank you for your time 

Forteen Stars

Re: How to perform data insertion in an already filled postgresql table without duplication ?

of course, you can use a set of columns as key

 

you do not need to define all these columns in a database as a primary key (it good but not mandatory)

but definitely, you need to create indexes for these columns (could be 1 combined)

 

after, you could use "insert or update" in tPostgreSQLOutput component

 

add. if you need "insert-ignore" which is not available, you have a choice from 3:

- insert all data into temp table on target server and use tSQLRow for run INSERT SELECT WHERE NOT EXISTS

- use tSQLRow with prepared statements instead of tDBOutput - and run INSERT INTO TABLE (a,b,c) values (?,?,?) - check docs or forum about prepared statements in Talend

- if data relativly small - use tMap with catch rejected by INNER JOIN (tMap also can join by many columns) 

-----------
Five Stars

Re: How to perform data insertion in an already filled postgresql table without duplication ?

Thank you!

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch

Downloads and Trials

Test drive Talend's enterprise products.

Downloads