One Star

Commit Every... again.

Here is the discussion that we had with plegall before, I would like to return to this issue and discuss it further.
To give you some background - I would like to add Commit Every... option to the tXXXOutput component even if the shared connection is used. It may cause some problems, but I believe that users are smart enough to understand, that they should create a separate connection if they want to use "commit every..." option.
Could you please review the way how "commit every..." data appears on the tXXXOutput? We only use existing connection and still need to use "commit every..." functionality in our Talend script. If we would like not to commit data for the existing connection - we'll just create a new one. I think it is really important to show "commit every..." on all tXXXOutput. Maybe we could add a warning messsage that it may have a problem with the existing connection if it is used for multiple tables simultaneously.

Yes, yesterday while reviewing your forum posts and bugtracker requests, I've thought about to this issue. I was about to implement it (in the tOracleCommit) but I saw your last note about having the option back in tOracleOutput.
I admit I don't know what the right choice is. I find it weird to let the tOracleOutput manage the commit while the transaction was opened in the tOracleConnection. I have in mind the following case : you have 2 tOracleOutput components in 2 output branches of a tMap. If the tOracleOutput_1 has a "commit every" set to 1000, it will also commit lines of tOracleOutput_2. On the other side, if we had a tOracleCommit ("commit every" set to 1000) right after the tOracleOutput_1, it doesn't make the design much cleaner.

I think I understand your concern - and I feel your pain.
But if you think of a Talend program as of a regular perl script - you can easily use 1 connection in perl and also run 2 queries - and commit at the random point. But you just never do it - just because you understand what you are doing... If something is well documented and user is aware of what is going on - then it should be OK. And this way you get more flexibility to the way how you construct your ETL. Maybe we can add some warning - if somebody will decide to turn this option on?
The other thing is - I don't really like tCommit component - I think everything should be done within the tXXXOutput - there are several reasons why I don't like it - one of the reasons - I always forget to put Commit after the Output - and it's very confusing sometimes, the other reason - why do something in 2 places when you can do everything in one.
So the question to the community. Any thoughts? Ideas? I would appreciate your input.
One Star

Re: Commit Every... again.

Anybody? Ideas? Thoughts?
This is an important conversation - we probably can make our (my for sure) life much easier if we convince Talend that this functionality is good for us Smiley Happy Somebody interested? I can give more detailed explanation of my point.
One Star

Re: Commit Every... again.

Hi Timson,
I'm also having trouble with commits. I have one process that reads clientdata from one table, it gets split in to five separate sub-streams each of these sub-streams needs to write data in the same table. If I use one connection for all database actions I can only put the commit at the end of the process. Given that the process generates roughly 5 million records the database temp-tablespace overflows so I would like to set a certain commit interval for my common connection, but this is apparently not possible. The workaround is to have 11 database connections each with their own commit interval. Given that some branches return more rows I'm still having problems with my tablespaces.
I would like to be able to set a commit interval when I open a connection. Each tXXXOutput should (if it uses that connection) check whether it's counter has reached the interval and perform a commit. I don't mind having to perform an extra commit at the end of a job using the post-job component.
One Star

Re: Commit Every... again.

Something new about this important subject ?
I've the same problem : I have on tMap which rich two Oracle outputs. These outputs must be done in the same connection, due to constraints between theses tables.
But I process a lot of rows, and so I have temp-tablespace oracle error.
A interval commit on the connection would be a great solution to this problem.
In case of common connection, I think the "commit every ..." should be in the tConnection component. All tOutput using this connection will have the same "commit every ...".
The tCommit component is usefull to close the connection.