tOracleTableTransfer different parameters

One Star

tOracleTableTransfer different parameters

Hi,
I'm using tOracleTableTransfer for loading ~90 big tables, and noticed different speed for 2 tables with similar row counts, and found out the one took longer was due to the big table size!
I tried to play with the "Log interval in seconds", "Source select fetch size" and "Insert Batch size" parameter numbers and hope for the best results but it still taking a very long time for my tables to get loaded! (5 hours for about 90 tables with average row counts of 50K/table)
Does anyone have more depth knowledge on how this perfect custom-made component works?
I have even divided my table list into 2 for relatively large and small tables and gave 50,000 to small ones and 500,000 to large ones for "Source select fetch size" and "Insert Batch size" parameters, but it didn't help with the speed.
Any other idea on how to speed the load up and enhance the performance?
Any help is appreciated!
Moderator

Re: tOracleTableTransfer different parameters

Hi,
tOracleTableTransfer is a custom component written by talend community user and shared on talend exchange portal.
Feel free to contact the author jlolling directly please.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Seventeen Stars

Re: tOracleTableTransfer different parameters

This component is a bit outdated now. I have worked a lot on the pedants for MySQL and PostgreSQL. I will update the Oracle part now very soon and keep you updated.
One Star

Re: tOracleTableTransfer different parameters

Thank you xdshi!
Thank you jlolling, and I'm very looking forward to it! Smiley Happy
Vielen Dank!
Seventeen Stars

Re: tOracleTableTransfer different parameters

I have updated the component tOracleTableTransfer now.
Because of Talend Exchange is extremely slow in approving updates I have provided this component on my personal web space:
http://jan-lolling.de/talend/components/tOracleTableTransfer.zip
Could you please check this version.
One word to large tables. It could be the transfer slows down if the tables grows. This is a effect caused by the database itself. Check if you have some indices. They will be updated too and can cause serious performance issues while writing the data. 
One Star

Re: tOracleTableTransfer different parameters

Thank you so much for quick response!
I'm running my job right now with the updated version of the component. However I do have some questions regarding the component:
1. Now I see the following log are being generated by the component, I was wondering if there is a way to avoid those for some jobs?

2. This component does commit the data even if I don't put commit at the end. So I can't have a "all or none" if I'm loading multiple tables within 1 job?
3. Is it correct to assume the larger table, the larger fetch size then higher performance and smaller table, smaller number for fetch size?
4. Does "Log interval seconds" have any impact on the performance? I've tried 0, 1 and 5, and 0 gave me a better performance, but not sure if there is any correlation!
5. I used the new version of this component and ran the same amount of data and number of tables, with the old version it took me 10min, with the new one took 19min, is it something that I'm not setting up correctly that I observe this difference? 
My job logic is to have 1 tOracleTableTransfer component, and I loop through my table list, and plug them into the component to load the data from source to target. Is there something wrong with doing that in terms of the speed. Of course my tables are not the same in the size, but all of them are over 40K records. Also the data in my source are growing but I have where clause that only grabs the data that are not changing (not getting any updates), and my target is not changing, so basically I'm just loading data into a repository kind of database once a day. My source does have PK and FK and indices but my target has primary keys with no foreign keys.
Thank you again!
Seventeen Stars

Re: tOracleTableTransfer different parameters

OK, The sped actually should not become slower. I am surprised about this happens in your job.
1. With the Setting log-interval in seconds = 0 you can switch off the logging.
I will change the logging to a log4j logging. In the past Talend had no support for log4j, now this is the default
2. Yes the component commits the data (after every batch). I can add an option to switch off the commit and let you do this.
3. The fetch size and the batch size is highly depending on the database and the table structure. The fetch size works in the same way the the tOracleInput and here the Cursor size (which is actually the fetch size) and the batch size is exactly the same as the batch size in the normal tOracleOutput components. You should play with the parameters and find out the best values. If you set the batch size to high you will fill to much space in the Oracle SGA and this can lead to a reduced performance. I can tell you, much more then 10.000 is actually useful only in seldom situations.
4. To write to the standard output can cause performance issues. To switch off this be setting the log interval to 0 is a good choice
5. In my projects the update had no negative impact to the performance. Take care every run of a table transfer will use a new database connection (never use a connection for parallel runs!).
You can usually drastical increase the performance if you disable the primary key before.
ALTER TABLE table_name
DISABLE CONSTRAINT constraint_name;


One hint: this new release can use not only tOracleConnection as source. You can also use PostgreSQL or any other database as source to feed your Oracle database this way.
One Star

Re: tOracleTableTransfer different parameters

Again thank you so much jlolling!
I have set the log interval in second to 0 but I'm still getting log output in my Talend output console!
Adding the optional commit would be an awesome option to have! 
Regarding the batch size, you mentioned more than 10.000 is very seldom, but the default there is 50.000, is this correct? So it's better to change it to less 10K for better performance in y job?
Not being limited to oracle in terms of the datasource it's a big break through for our projects. We will incorporated soon.
Thank you sooooo much!
Seventeen Stars

Re: tOracleTableTransfer different parameters

With the log output, I will change all of them to the DEBUG level, so you have the INFOI level free for any business informations.
Regarding the batch size: Please keep in mind, it is about increasing the network speed because of the better efficient TCP packages and it is a matter of the database of using a kind of bulk load facility.
If you want to find out the maximum (to much batch site can lead into a decrease of the performance) check it out, try it for you. Let it run over a time and check the overall performance.
The usage of other database types can sometimes a bit difficult because of different data types. Because of that, I have added the option to map database types to Java types. The java type will be used get the data from the source and to send the data to the target.
But please keep in mind, a transaction should not be that big, there are mostly limitations about the size of a transaction. Check that with your database administrator.
It will take a bit - I guess until Monday to change the component. Stay tuned.
Seventeen Stars

Re: tOracleTableTransfer different parameters

I have finished my work on the tOracleTableTransfer component. If you want, you can try it:
http://jan-lolling.de/talend/components/tOracleTableTransfer.zip
In the advanced settings you will find the option "User a separate tOracleCommit component..." and this will prevent the component from committing. Please keep in mind to use this option carefully because usually transactions have a maximum size.
Cheers
Jan
One Star

Re: tOracleTableTransfer different parameters

Thank you SOOOOOO MUCH jlolling again!
The log is VERY clean and neat, and having the option for committing is a big plus for this project of ours.
I will be testing the component more and monitor the performance with/out the commit, and will update on which works better for us.
BTW, I changed my "Source select fetch size" to 25K and "Insert Batch size" to 10K and I see better performance, I will keep playing with those numbers to see what works better!
One other good thing I noticed about this component is that it does not truncate the 6 digits in timestamps (even though Java is limited and truncating the last 3 digits in milliseconds).
Thanks!
Seventeen Stars

Re: tOracleTableTransfer different parameters

You are welcome!
Thanks for getting me updated.
One Star

Re: tOracleTableTransfer different parameters

jLolling, I'm testing the latest component version to see the performance. What I have been doing is giving the component different values for "Source select fetch size" and "Insert Batch size".
One thing that I noticed is that if I run the component multiple times with the same values for "Source select fetch size" and "Insert Batch size" (using the same table, the same records), I get different performance. My table has 21 columns, 46000 records and one time it took 3510 milliseconds, and the next time 6208 milliseconds.
On average the best speed that I got for this table was with "Source select fetch size" =50,000 and "Insert Batch size" =50,000 , but it varies each time I try.
Is there something I'm doing wrong? Or a step that I'm missing? I run the garbage collector in between my each runs!
One Star

Re: tOracleTableTransfer different parameters

Here is the screenshot of my job and my settings.

Seventeen Stars

Re: tOracleTableTransfer different parameters

The time differences are not caused by the component, instead caching or load situations in the database. The code of the component does the work always in the same way, no possible root clause for different durations.
The garbage collect is actually not necessary because there is a huge time (from the perspective of the JVM) between both runs and the GC has already done what ever is necessary.
I suggest you check the mode (option: Backup in file + Only backup in file, no inserts into target table) in which you only write the file and not in the table and check if this only reading varies in the duration and what time it needs. 
You cause huge transaction size and this depends on a huge space in the SGA.
One Star

Re: tOracleTableTransfer different parameters

I applied your you suggestion to check the back up in file mode and not inserting into the target table, and ran the job few times for the same table and same settings and these are the duration in milliseconds:
25693,2574,2558,2480,3120,2808,14118,3417,14758,3307,2479,2583
If it only depends on the DB setting and not component, then I think we are good. It was just an observation that I had and wanted to make sure I'm using this component correctly!
Thanks!
One Star

Re: tOracleTableTransfer different parameters

Also we tested the component with MSSQL as the source, and were not able to run the job. This is the error that we got:
: sqlrunner.datamodel.SQLDataModel - loadSchemas (schemas) for catalog: Infra failed: The server principal "username" is not able to access the database "Infra" under the current security context.
ERROR: Write failed in line number 35 message:Invalid column type
java.sql.SQLException: Invalid column type
.....
ERROR: Read has been failed. Stop write in table.
tOracleTableTransfer_1 read:23641 inserted:35
disconnected
Nothing was written into DB, even though the log says 35 inserted.
We tested the connection separately using tMSSQLInput, and it connects and returns results just fine, but when we use tOracleTableTransfer we get the above error.
We tried using the "Self defined source query" and separately with "source table where clause", both cases failed with that error.
SELECT  
MRN,
col1,
col2,
col3,
col4
FROM SchemaName.TableNameMSSQL
where SchemaName.TableNameMSSQL.col2 is not null
One Star

Re: tOracleTableTransfer different parameters

Any update on the source connection issue?
Also I was wondering if instead of a DB connection for the source we could use a flat file? I believe the process does at already, but it doesn't have an option to select a file as your source data.
Thanks!
Seventeen Stars

Re: tOracleTableTransfer different parameters

Could you please switch on the debug mode. 
I agree with you, the log output is confusing and needs a review.
One Star

Re: tOracleTableTransfer different parameters

jLolling,
This is the error that we get, one note is that LAST_SERVICE_DATE is type of data and there is no way to force tOracleTableTransfer to recognize it as a date, it does not read schema.
We believe the type date is being treated/processed as a varchar type!
Starting job tOracleTabletransfer at 11:32 10/11/2016.
 
connecting to socket on port 3817
connected
: sqlrunner.datamodel.SQLDataModel - loadSchemas (schemas) for catalog: Infra failed: The server principal "sqlaaccusr.prompt.pr" is not able to access the database "Infra" under the current security context.
DEBUG: createSourceSelectStatement SQL:
SELECT
mrn as MEMBER_ID,
Carepath_Code,
Carepath_Display_Name as Carepath_Name,
Last_Service_Date,
active
FROM  AACC.vwGetCarepaths
where AACC.vwGetCarepaths.Status_Message_Patient is not null and MRN is not null
 
DEBUG: No MySQL class loaded.
DEBUG: createTargetInsertStatement SQL:insert into CD_ETLSB1.T_PHP_IMMUNIZATION (MEMBER_ID,CAREPATH_CODE,CAREPATH_NAME,LAST_SERVICE_DATE,LAST_SERVICE_DATE_STRING,NEXT_DUE_DATE,NEXT_DUE_DATE_STRING,STATUS_MESSAGE_1,STATUS_MESSAGE_2,STATUS_MESSAGE_BADGE,ACTIVE,STATUS,UPDATE_TS,CREATE_TS)
values(?,?,?,?,?,?,?,?,?,?,?,?,?,?)
Starting....
DEBUG: Start writing data into target table CD_ETLSB1.T_PHP_IMMUNIZATION
DEBUG: Start fetch data from source query
SELECT
mrn as MEMBER_ID,
Carepath_Code,
Carepath_Display_Name as Carepath_Name,
Last_Service_Date,
active
FROM  AACC.vwGetCarepaths
where AACC.vwGetCarepaths.Status_Message_Patient is not null and MRN is not null
DEBUG: Analyse result set...
DEBUG: Name: member_id,  Type: CHAR
DEBUG: Name: carepath_code,  Type: VARCHAR
DEBUG: Name: carepath_name,  Type: VARCHAR
DEBUG: Name: last_service_date,  Type: NVARCHAR
DEBUG: Name: active,  Type: BIT
DEBUG: Start fetching data...
DEBUG: Output class mapping: #1 (MEMBER_ID) use: String
DEBUG: Output class mapping: #2 (CAREPATH_CODE) use: String
DEBUG: Output class mapping: #3 (CAREPATH_NAME) use: String
DEBUG: Output class mapping: #4 (LAST_SERVICE_DATE) use: String
DEBUG: Output class mapping: #11 (ACTIVE) use: Boolean
DEBUG: Write execute insert batch
ERROR: Write failed in line number 10000 messageSmiley SurprisedRA-01861: literal does not match format string
 
java.sql.BatchUpdateException: ORA-01861: literal does not match format string
 
      at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:11190)
      at oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:244)
      at de.jlo.talendcomp.tabletransfer.TableTransfer.writeTable(TableTransfer.java:482)
      at de.jlo.talendcomp.tabletransfer.TableTransfer.access$100(TableTransfer.java:58)
      at de.jlo.talendcomp.tabletransfer.TableTransfer$2.run(TableTransfer.java:223)
ERROR: Next exceptionSmiley SurprisedRA-01861: literal does not match format string
 
java.sql.SQLDataException: ORA-01861: literal does not match format string
 
      at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)
      at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:399)
      at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1017)
      at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:655)
      at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:249)
      at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:566)
      at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:215)
      at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:58)
      at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:943)
      at oracle.jdbc.driver.OraclePreparedStatement.executeForRowsWithTimeout(OraclePreparedStatement.java:10932)
      at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:11100)
      at oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:244)
      at de.jlo.talendcomp.tabletransfer.TableTransfer.writeTable(TableTransfer.java:482)
      at de.jlo.talendcomp.tabletransfer.TableTransfer.access$100(TableTransfer.java:58)
      at de.jlo.talendcomp.tabletransfer.TableTransfer$2.run(TableTransfer.java:223)
ERROR: Read has been failed. Stop write in table.
DEBUG: Finished write data into target table schemaName.tableName, count inserts:10000
DEBUG: Write table finished.
tOracleTableTransfer_2 read:20315 inserted:10000
DEBUG: Close source connection...
Job tOracleTabletransfer ended at 11:34 10/11/2016.
Seventeen Stars

Re: tOracleTableTransfer different parameters

I think it has to do with the problem you are trying to use String as transport data type for the column LAST_SERVICE_DATE.
A String must match specific patterns to be accepted as date value in Oracle.
Why do you not able to use Date as type? What happens if you set as data type Date for this column?
What is the original data type of this column in the source database?
One Star

Re: tOracleTableTransfer different parameters

We did 2 rounds of tests.
1. We didn't define any data type in the component, used the default setting, where no data type is being defined. (Can we in this component define the column type specifically?)
2. In the advanced settings we defined "Date" to "timestamp" under custom type mapping.
In both cases we got the same error.
The data type for this column on both source and target is Date.