Five Stars

tmap -Single transaction delete before insert

We tried to figure out how to do the following within a tamp mapping?
 Scenario :
For each record read
Start transaction
     Identify existing records
     Delete the existing records
     Insert new records from source
End transaction
Job flow  :
                                                          DeleteFlow
                                                             |
read from sql server(source) -->write into HDFS -->tmap -->insert Flow -->tgreenplumoutput
                                                              |
                                                            Lookup target file

Source to target Lookup model screenshot :
source table left join Target table -->Output Flow--> Delete complete record otherwise Insert flow all data again (when Matched ID is null)
 
Tested :
When i am pulling 6 records from source it is inserting 6 well as output. but then if i increase `1 record in source  job goes to infinite loop it is not even throwing error. It should first delete the 6 already inserted and then insert all 7 again. but it is not working as expected.
Any help on it would be much appreciated
Job flow screenshot:


If we did not have the requirement of a single transaction, I would build two separate jobs, one to delete, the other to insert.  But the requirement for a single transaction is making this hard.  Also will need to guarantee that we Delete before we Insert.
8 REPLIES
Community Manager

Re: tmap -Single transaction delete before insert

Hi 
Do an inner join to get exiting records and new records, cache the new records or old records in memory for used in next subjob, eg:
t...connection
   |
source data--...-->tMap---eixsting records--tgreenplumoutput (for deleting)
                                --new records--tHashOutput
   |
onsubjobok

   |
tHashInput--main--tgreenplumoutput (for inserting)
   -onsubjobok--t...commit
   -onsubjoberror--t..Rollback.

Regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
Five Stars

Re: tmap -Single transaction delete before insert

Hi shong,
First of all thanks for data flow. it worked for me.
But also when I Executed same as you mentioned. for ~90k records. 
job execution time =18 min. 
Goal is :
How do i get incremental load in greenplum data warehouse. please guide because 2 flow based current approach is not improving the performance as expected.
 it is protocol to write  source data into hadoop  -source data ---> hdfs (for archive purpose)-->tgreenplumoutput.
Any help on it would be much appreciated .

 
Community Manager

Re: tmap -Single transaction delete before insert

Do you really need to check if the same records already exist in target table? or just delete all the data in target table and insert the source data. Cache the small data in memory, if the lookup data set is a little large, cache the data on disk instead of memory on tMap. 
----------------------------------------------------------
Talend | Data Agility for Modern Business
Five Stars

Re: tmap -Single transaction delete before insert

Thanks in advance  shong !
I  would like to write the Just delete all from target and then  insert all again.However, Lookup Contains huge amount of data. When i  turned on Stored data on disk =TRUE . 
Still ,getting the same execution time because  Job visual clearly says data batch while outputting into tgreenplumOutput is slow. it consist of default batch value =10000. 
Community Manager

Re: tmap -Single transaction delete before insert

If you want to delete all data in target table before inserting, you can use a tGreenplumRow to execute a delete statement to delete all data.

t...connection
   |
tGreenPlumRow (delete statement)--oncomponentok--source data--...write into HDFS --oncomponentok-->-tgreenplumoutput (for inserting)
   -onsubjobok--t...commit
   -onsubjoberror--t..Rollback.
----------------------------------------------------------
Talend | Data Agility for Modern Business
Five Stars

Re: tmap -Single transaction delete before insert

 Thanks shong !
I have tried this flow. This looks better But again it is full load. 
Also trying for Incremental Load.
In that,I am not able to Identify changed / new row. However,I cannot perform CDC  from source system. As i have read access only for extracting the data from source system. Is that possible to perform Incremental Load with given pipeline. it is difficult to me track  new/updated rows and output only this into greenplum. Instead of Full load.  
I'm hoping you can guide me with.
Community Manager

Re: tmap -Single transaction delete before insert

Without CDC, you have to compare the source data and target data to get respectively rows that need to be updated and inserted, that is to do an inner join between source data and target data like you did, obviously, the performance of this way is much lower than CDC if the target data volume is big.
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star kmv
One Star

Re: tmap -Single transaction delete before insert

Hi Shong,
I was not able to find how to start a new topic so I have to do this way. I am currently working on a CDC job that should run in both sequences depending upon the timestamp of changes or logs. Sequences are a job that runs in the order of Insert update delete or in the order of delete update insert job run on two conditions. This is to avoid constraint error. Because if row being delete and reinsert with the same primary key will throw error or lose records. I am also passing many context variables to those jobs.  I am not able to work with conditional run using tjavarow or tjava with if statements on string. Is there a way to do with if else condition on trunjob  and also pass values to the tmap from that conditions.  
Thanks,
kmv