Four Stars

Retrieve primary key Id for each inserted row

Hi,

 

I have 2 DB tables (a source and target in MSSQL).

For each row in the source table I would like to do the following:

 

- Insert the row into the target table

- Retrieve the primary key Id from the target table and put it in the source table

 

Is there any straightforward way of retrieving the Id from tMSSQLOutput component or is an alternative technique required whilst ensuring good performance (since source table has over 1 million rows)? 

  • Data Integration
1 ACCEPTED SOLUTION

Accepted Solutions
Ten Stars

Re: Retrieve primary key Id for each inserted row

The idea you suggest in your second paragraph would be my next suggestion. Not ideal given what you have said, but you can easily mitigate for other tables loading data into this table. Your id does not need to be a number, you can use a String. As such you can have control on uniqueness using a table prefix to the id, for example. So your data could look like below....

 

tableA_1
tableA_2
tableA_3
tableB_1
tableC_1
tableB_2
tableA_4
tableC_2

Since this will only ever be a temporary key used for updating the source and target with the correct autoincrement id, it shouldn't matter if it is a little inelegant.

Rilhia Solutions
7 REPLIES
Four Stars

Re: Retrieve primary key Id for each inserted row

I was able to get a solution to this by using tFlowToIterate followed by tMSSQLRow.

 

Insert_For_Each_and_Get_Id_1.PNG

 

In the tMSSQLRow I had an insert to the target table followed by an update to the source table using the SCOPE_IDENTITY() function.

 

tMSSQLRow_Query.PNG

 

However this is quite slow and takes several hours for 1 million rows.

Employee

Re: Retrieve primary key Id for each inserted row

Hi,

 

This design is effectively very slow since you are doing a tFlowToIterate, which means it will execute a statement for every record.

What you want to do:

- User one or more tMSSQLConnection components to enable batch commit

- Use batch commit, and batch lookup to push the ids into the source table

- Depending on whether the whole process should commit or rollback, you may get even more performance through the TestStaging by using NoLock on your query to MSSQL.

- Do not use any tFlowToIterate (that will slow your job significantly)

- Use tMSSQLOutput instead of tMSSQLRow

 

Can you provide an export of the job so that I can look at the columns in each components and what you are doing?  Is the ID in the target table an autoincrement?  What other columns are you inserting into the target?  Are there other operations happening on the target, i.e. can any other systems be inserting data into it and incrementing the auto-increment keys?  If not, then you can actually reserve keys at front. For example if you ar batching by 50K records, you know that your batch will you 50K ids, hence you actually know from where your next increment will start.

 

Yes, it gets a bit complex, but that job can go from several hours to few mins if designed further.

 

Four Stars

Re: Retrieve primary key Id for each inserted row

Hi @iburtally

 

My objective is to have a single entry in the target table for each row in the source table and then link them (by using the target table Id).

 

To answer some of your questions:

- The target table Id is auto-incremented

- There are other operations that are writing to the target table hence I cannot reserve any Ids

- The other column being inserted from source to target is Name. However there could be more than 1 row in the source table with a particular name so it does not serve as an effective lookup.  

Employee

Re: Retrieve primary key Id for each inserted row

You can achieve this using tMSSQLOutput and tMSSQLLastInsertId

But can't guarantee the peformance if you are doing autocommit.

 

There may be ways to reserve ids before hand, but need to research it as it depends on the database.

Ten Stars

Re: Retrieve primary key Id for each inserted row

I've read through this and what @iburtally has suggested is sound, but as he says it may be slow. If you want something that will perform a little better you might want to consider doing this in a different way.

 

In the data you are inserting, is there a column or a collection of columns you could use as a key or composite key before your new id is generated on insert? In a lot of cases given the scenario you've shown, there will be. If there is, you can solve this problem with another subjob that will run after the first. That one will update the source table with your new id using the record or records that act as a key. 

 

A lot of people try and solve problems like this in one process. Sometimes it is easier and quicker to break them down into multiple tasks.

Rilhia Solutions
Four Stars

Re: Retrieve primary key Id for each inserted row

Hi @rhall_2_0 @iburtally

 

Thanks for the suggestions.

 

The issue is that there is no column or collection of columns that could be used as a composite key i.e. duplicates can and will be present.

So for example there may be 2 TestStaging rows with Name = 'abc' and this results in 2 rows created in TestOutput. However when looking up on Name (which is the only column apart from Id) both rows in TestStaging will end up with the same TestOutput.Id. I would like each TestStaging row to link to a unique TestOutput row.

 

Is there any other way to speed this up? One thing I have considered is to add a temporary column in TestOutput called TestStagingId and use that to update TestStaging with the TestOutput.Id, however this is not ideal as there may be other tables that may enter data into TestOutput and it will become more chellenging to sustain this approach due to Ids clashing etc.

Ten Stars

Re: Retrieve primary key Id for each inserted row

The idea you suggest in your second paragraph would be my next suggestion. Not ideal given what you have said, but you can easily mitigate for other tables loading data into this table. Your id does not need to be a number, you can use a String. As such you can have control on uniqueness using a table prefix to the id, for example. So your data could look like below....

 

tableA_1
tableA_2
tableA_3
tableB_1
tableC_1
tableB_2
tableA_4
tableC_2

Since this will only ever be a temporary key used for updating the source and target with the correct autoincrement id, it shouldn't matter if it is a little inelegant.

Rilhia Solutions