Four Stars

Keep track of inserts in a DB

Hi,

I'm kind of a newbie and I'm facing a really basic problem: I need to keep track of the number of records inserted in a DB (with tMSSqlOutput) DURING the job. Since the variable NB_LINE_INSERTED is available only at the end of the subjob itself I cannot read it while the insertion is still going on: can you suggest me the correct variable or component?

 

Thanks so much

Giuseppe

1 ACCEPTED SOLUTION

Accepted Solutions
Six Stars

Re: Keep track of inserts in a DB

Hi Giuseppe,

 

Unfortunately, as SQL Server may potentially reject your inserts after you've calculated the ID, any counting you're doing could easily get messed up. Even if you could identify when an insert had failed for each row, you'd need to know whether it was rejected because of a duplicate ID, or some other reason, and then adjust your count accordingly before optionally retrying the insert. In short, it's messy.

 

Even with the approach you've taken, if at some point in the future (I know you say that current this isn't the case) other records are being inserted into the table from elsewhere, duplicate IDs can still arise, due to the slight delay between your query returning ID + 1 and inserting the new record into the database.

 

Assuming you can't just use an Integer IDENTITY column, the only way to guarantee unique, sequential numbers for newly added records is an AFTER INSERT trigger on the database table, which will simply set your ID field to ID + 1 from a sub-query.

 

The plus with doing this of course, is that your Talend job becomes much simpler, and faster, as the processing can be done in batches, rather than line-by-line.

 

Regards,

 

 

Chris

5 REPLIES
Six Stars

Re: Keep track of inserts in a DB

Hi Giuseppe,

 

Within the sub-job, this is not as easy as it at first sounds, because some inserts may be rejected by the server, which means we can't simple count the number of records in the flow which are being sent to the tMSSqlOutput component.

 

Whilst it's possible to turn off batch insert mode on the component, which will enable the rejected rows output, all records would then be inserted in one go, and you'd only get the reject output after it had finished. As presumably the database insert is the final step in your sub-job, at this point the NB_LINE_INSERTED variable is available anyway, there's no point in doing this.

 

I've personally not come across a case when it would have been useful to know the count of inserted records during a sub-job, so are you able to provide more details of what you'll be using this count for, and I/we may be able to suggest another way to achieve this?

 

Regards,

 

 

Chris

Four Stars

Re: Keep track of inserts in a DB

Hi,

thanks for your quick response, the context of my work is the following:

- I take data from a CSV, i filter them and manipulate them on tMap using as lookup some tables from a MSSql database

- then I need to update/insert the records on a table of that MSSql DB: the action to be performed is based on a  "CODE" field present both in the  CSV and the destination table. If a record with the CODE of my input is not present in the DB I insert a new record, otherwise I update.

 

So far, so good. Up to this point everything works fine.

 

But in the destination MSSql table I need to have a unique "ID" field and I'd like to avoid jumps between IDs. To generate this new field in tMap, up to now I constantly (at every row) query the table to get the last ID in order to insert the new one as the lastID +1 by asking to lookup at each record. Of course this is really slowing my jobs down, although it correctly generate unique IDs without jumps (and I know there are no concurrent things happening on the DB while I run the job).

 

Having a variable to keep track of the successful insertions would help me because I would query the DB at the beginning and then work with the variable alone: that's the reason why I tried to work with NB_LINE_INSERTED and alike in a tJavaRow inside subjobs but with no luck.

 

Is there a better way to keep track of the insertions in tMap or in the Output component?

 

Giuseppe

 

Six Stars

Re: Keep track of inserts in a DB

Hi Giuseppe,

 

Unfortunately, as SQL Server may potentially reject your inserts after you've calculated the ID, any counting you're doing could easily get messed up. Even if you could identify when an insert had failed for each row, you'd need to know whether it was rejected because of a duplicate ID, or some other reason, and then adjust your count accordingly before optionally retrying the insert. In short, it's messy.

 

Even with the approach you've taken, if at some point in the future (I know you say that current this isn't the case) other records are being inserted into the table from elsewhere, duplicate IDs can still arise, due to the slight delay between your query returning ID + 1 and inserting the new record into the database.

 

Assuming you can't just use an Integer IDENTITY column, the only way to guarantee unique, sequential numbers for newly added records is an AFTER INSERT trigger on the database table, which will simply set your ID field to ID + 1 from a sub-query.

 

The plus with doing this of course, is that your Talend job becomes much simpler, and faster, as the processing can be done in batches, rather than line-by-line.

 

Regards,

 

 

Chris

Four Stars

Re: Keep track of inserts in a DB

Hi,

I see your point.

So the plan now, is to enforce the rule on the DB instead.

The problem is that many of this DBs are already in place,with many customisations. Anyway I will try, thanks.

 

Giuseppe

Six Stars

Re: Keep track of inserts in a DB

It's always tempting to try and do everything within a Talend job, but for performance and/or practical reasons, sometimes minor tweaks to the databases or systems we're connecting to make more sense, assuming this is an option.