Using cursor (tOracleInput)

One Star

Using cursor (tOracleInput)


Hi everybody,
I'm wondering several things about this option :
1. Is it the size of the array returned by the "tOracleInput" query at one time so that we can perform operations on several rows at one time ? (Is it an Oracle functionnality ?)
2. I've noticed that if I put a cursor or 1 row, I get lower performances. What's the default value ?
3. There is not such an option on files. Does it mean TOS treats only one row each time ?

Many thanks in advance.
Best regards,

Nicolas
One Star

Re: Using cursor (tOracleInput)

Hi everyone,
No answer ?
Yet, this is a stuff concerning everyone.
It would be fine to get some help.
Regards,
Nicolas
One Star

Re: Using cursor (tOracleInput)

The oracle cursor functionality PROBABLY makes use of Oracle cursors, you won't find it on files.
I don't know how it's implemented, the components documentation doesn't help much.
I think I'll evaluate the performance with different options and then choose the best one.
One Star

Re: Using cursor (tOracleInput)

Hi shine007,
I've been looking for some informations since posting this message.
You're right, it is based on an Oracle functionality (fetch array size in the JDBC driver), so we don't have the same with files. I don't know how it works for files for the moment.
Anyway, this is the number of lines you'll get at one time, that's why you get a nice improvement of time loading lookups.
However, there is not such a subsequent improvement in a data flow (as you won't perform operations on more rows at one time I guess).
Best regards,
Nicolas
One Star

Re: Using cursor (tOracleInput)

However, there is not such a subsequent improvement in a data flow (as you won't perform operations on more rows at one time I guess).
Nicolas

i tried to remove computations, so that Talend only reads from tables and...Surprisingly nothing changes. I mean that changing the cursor size doesn't affect much the speed at which data is extracted.
We're using Oracle 9 and with some tables (especially lookup ones) we reach even thousands of row per second (4-5000)...While with other tables (300k rows, 10 columns) it's impossible to go beyond 250 rows/sec.

I really can't understand the issue.
One Star

Re: Using cursor (tOracleInput)

Hi Shine,
The pb is that it only permits to load more data at the same time.
You should notice an improvement in your lookups (try high values : 2500 for instance), because it only loads data in memory (I tried it locally, it almost ran twice faster...). However, I admit a lookup is never the longer step in a job.
Tell me if I'm wrong, but, in a main data flow, you will perform operations line by line, whatever the cursor size, that's why it won't affect much the row speed.
If you just have a job like that : "tOracleInput" ---> "tOracleOutput", you may notice a little improvement, but not so much.
Best regards,
Nicolas
One Star

Re: Using cursor (tOracleInput)

Thank you for your quick reply!
Well, the answer is no. As you, I thought the reason was in tMap and other computations (i use tUnique and tUnite, for example) but then I created another job removing every computation, just to compare different speeds across different tables changing the cursor size and other options...
I had to admit that the very different read speed (250r/s VS > 1000r/s) mainly depends on source table, and not on read/cursor options.
The problem I can't understand is that smaller tables (less than 5 cols) seems to be slower and every time the flowmeter counters are updated (1 second, i think) only 200 rows are read, while on "bigger" tables with more cols and even a second lookup table at each counter refresh about 1000 rows are read...
I began thinking it's not a Talend issue. But I'm not a DBA so my Oracle tuning skills are low...
One Star

Re: Using cursor (tOracleInput)

Hi again shine007,
You told me that tables taking more time are especially lookup ones... I guess that's the only ony, isn't it ?
It is normal that lookup ones take less time because if you do not perform any computation on : you just load the data in the memory.
For the others, you first load some data in memory, and then you perform your computations (I mean by computations all the operations (mapping, Java/Perl methods), and output actions (delete, insert, update,...)).
So, in lookups you get the speed of data loading; in the other case, you get the speed of your computations (+ loading your data).
And cursor has just impact on loading the data.
Hope I'm clear and that's what you need.
And hope I'm not mistaken Smiley Wink.
Best regards,
Nicolas
One Star

Re: Using cursor (tOracleInput)

you are very kind , thank you.

You're right, using the same table as a lookup one gets me 10x faster, but it's not what I have to do ...I must speed up the main flow, because doing the SAME processing gets me 1000r/s on some tables, but only 250r/s on some other ones...I have to understand why
I'll let you know my progresses
Thank you very much again.
Alessandro
One Star

Re: Using cursor (tOracleInput)

Well, i think i found the bottleneck

Infact i use a simple cache mechanism implemented inside the custom java classes. It is used in fields filled with few possible values, i.e. not to execute a SELECT statement at each row. It is a pity I can't use it everywhere, as it causes thrashing..

IMHO it is difficult to get more than 1000 rows per second if, at each row, you have to query for data on more fields...
As for main output tables i use tOraclebulk. I think it's a good solution to have two different levels of logging, moreover SQL Loader is fast and rocksolid.
One Star

Re: Using cursor (tOracleInput)

Hi everybody,
3. There is not such an option on files. Does it mean TOS treats only one row each time ?
Nicolas

I think you should use a "iterate" link and activate parallel execution. I also think parallelism in iterate links is different from job parallel execution.
One Star

Re: Using cursor (tOracleInput)

Hi shine007,
Infact i use a simple cache mechanism implemented inside the custom java classes. It is used in fields filled with few possible values, i.e. not to execute a SELECT statement at each row. It is a pity I can't use it everywhere, as it causes thrashing..

What's this cache system ?
Hi everybody,
3. There is not such an option on files. Does it mean TOS treats only one row each time ?
Nicolas

I think you should use a "iterate" link and activate parallel execution. I also think parallelism in iterate links is different from job parallel execution.
Iterate link is not usable everywhere. It's useful to perform one computation for each result you get of action performed by a component. However, you cannot perform some operations on a data flow with an iterate link.
But I think you meaned at the top of my job. I'm experimenting it but I do not obtain so much improvement for the moment... Trying to find why.
Best regards,
Nicolas