Issue Clarification on TOS vs (TIS / TDQ)

One Star

Issue Clarification on TOS vs (TIS / TDQ)

Good day,
I am writing this to understand more about Talend products mentioned above. The main purpose I am asking this is because the webinar I?ve attended were not using open source, there were on TIS / TDQ. So, using TOS is it able to achieve same performance of transferring rate for (heterogeneous & homogenous environment)?
We experienced slow transferring rate using JasperETL Pro for large datasets ie: >2 millions of records. See below:

Heterogeneous environment benchmark testing (oracle to mysql)
Total Number of Rows: 1095934
Number of Rows per second: 617.03
AVG time in mints: 29.6

Total Number of Rows: 2025100
Number of Rows per second: 159.12
AVG time in mints: 212.11(3.53hr?s)

Homogenous environment benchmark testing (mysql to mysql)
Total Number of Rows: 1095934
Number of Rows per second: 4345.24
AVG time in mints: 4.20

Total Number of Rows: 2020000
Number of Rows per second: 595.88
AVG time in mints: 56.49

Does Talend having benchmark test result to share with us? I afraid there are 1 or more settings we could miss out that result poor performance?
I seek clarification on this to clear our doubts. Furthermore, would it be possible for us to get evaluation product for TIS & TDQ?
It would be highly appreciated for your input to shine our direction.

Regards,
Yoke Yew
Employee

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hi,
We experienced slow transferring rate using JasperETL Pro for large datasets ie: >2 millions of records.

Do you use bulk capabilities? Do you handle some transformations in memory? Can you post a screenshot of your job?
Does Talend having benchmark test result to share with us? I afraid there are 1 or more settings we could miss out that result poor performance?

You can find some benchmak on the internet. For example, see at http://www.manapps.tm.fr/manapps/images/documents/ETLBenchmarks_Manapps-090203.pdf

Furthermore, would it be possible for us to get evaluation product for TIS & TDQ?

Have you been in contact with a saleman since your last post?
HTH,
-cedric
One Star

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hi Cedric Carbone,
Thanks for your reply and please find my screenshot for the job prepared on benchmark testing.
Yes. For source table we selected (enable stream). For target table we selected (extend insert). We are here using Jasper ETL professional, which the same as Talend TOS + AMC.
Regards,
Yoke Yew
Employee

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hi,
Your performance is very bad.
What about the "commit every x rows" field? Can you try with a bulk load?
Are the 2 DB on the same server? If not, what is the kind of network do you use?
One Star

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hi Cedric Carbone,
The are 2 environments mentioned. Homogenous (mysql -> mysql) happen in the same machine.
Heterogenous (oracle -> mysql) is on 2 different servers. We are using 100mbps LAN. Please comment on this further.
ToDo:
1)We will provide you another result for "commit every x rows" field.
2)Regarding bulk load, is it "t000BulkExec" where 000=mysql, oracle...I understand that is loading data from a file but our design is loading data from 1 RDBMS to another RDBMS.
Regards,
Yoke Yew
Employee

Re: Issue Clarification on TOS vs (TIS / TDQ)

2)Regarding bulk load, is it "t000BulkExec" where 000=mysql, oracle...I understand that is loading data from a file but our design is loading data from 1 RDBMS to another RDBMS.

Try t000OutputBulkExec
One Star

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hi Cedric Carbone,
For item as discussed above, please find details result in the diagram below.
1) We are testing on 2 different server2 (Oracle -> Mysql) using 100mbps LAN. The new setting is commit every 1 row.
2) We are not using Load Data Infile. Therefore we did not test on t000OutputBulkExec. Most of situation we will transfer datasets involve 2 different RDBMS server (source, target)
Please advise further.
Employee

Re: Issue Clarification on TOS vs (TIS / TDQ)

1) We are testing on 2 different server2 (Oracle -> Mysql) using 100mbps LAN. The new setting is commit every 1 row.

Can you set up the commit every X rows field to 50000 or 100000.
2) We are not using Load Data Infile. Therefore we did not test on t000OutputBulkExec. Most of situation we will transfer datasets involve 2 different RDBMS server (source, target)

I don't understand your answer. Our tMySQLOutputBulkExec runs good even if you are on 2 different RDBMS server. Have you try this method, I think it's the best method for performance.
One Star

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hi Cedric Carbone,
For item as discussed above,
1) We have tried setting commit every x rows field with "50,000" and it is slower than the default value "10,000" (see the 1st result prepared above)
2) Using tMySQLOutputBulkExec, it is only allow to output to csv file. We are unable to output to MySQL DB. Could you provide a sample for us to refer?
Please advise further.
Regards,
Yoke Yew
Community Manager

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hello
2) Using tMySQLOutputBulkExec, it is only allow to output to csv file. We are unable to output to MySQL DB. Could you provide a sample for us to refer?

tMySQLOutputBulkExec apply 'bulk insert' way when loading records into db, you could read the more demonstration on user documenation. Here is a simple demo. see my screenshot.
Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hi Shong,
As I replied earlier, our situation here involved 2 RDBMS. Which mean the source & target always referring to (either same / different RDBMS). Can you show me any component can work faster to transfer datasets from 1 DB to another DB.
Thanks & regards,
Yoke Yew
Community Manager

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hello
In you case, just need replace tMysqlOutput with tMysqlOutputBulkExec, eg;
tOracleInput---tMysqlOutputBulkExec.
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hi Shong,
We have tried that option earlier, but it is not allowing to execute the job with error return as follows:
Starting job mysqlbulk at 15:44 23/11/2009.
connecting to socket on port 4115
Exception in thread "main" java.lang.Error: Unresolved compilation problems:
The constructor File() is undefined
Syntax error on token ";", delete this token
at etl_performance_poc.mysqlbulk_0_1.mysqlbulk.tOracleInput_2Process(mysqlbulk.java:1872)
at etl_performance_poc.mysqlbulk_0_1.mysqlbulk.runJobInTOS(mysqlbulk.java:3441)
at etl_performance_poc.mysqlbulk_0_1.mysqlbulk.main(mysqlbulk.java:3350)
connected
Job mysqlbulk ended at 15:44 23/11/2009.

<b> the field showing mandatory for FILE_NAME </b>
Thanks & regards,
Yoke Yew
Community Manager

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hello
<b> the field showing mandatory for FILE_NAME </b>

There is a compilation error in generated code, you must specify the file path.
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hi Shong,
As I indicated earlier, we are not involving with file. It is extracting datasets from a Database and loading into another Database.
Regards,
Yoke Yew
Community Manager

Re: Issue Clarification on TOS vs (TIS / TDQ)

As I indicated earlier, we are not involving with file. It is extracting datasets from a Database and loading into another Database.

The file is a intermediate output file, it loads all records to the file first and then bulk insert into target db from that file. Smiley Wink
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hi Shong, Cedric,
Thanks for the hint provided. Yes we have successfully redo our benchmark testing:
- 2 millions records transfer from Oracle DB server to Mysql DB server
- 1g/ps LAN
It has reduced from earlier 2hr 30mins to 20mins.
I have 1 doubt. Is the transfer rate would give better performance using TDQ/TIS compare to TOS?
Regards,
Yoke Yew
One Star

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hi Shong, Cedric, all,
Is the transfer rate would give better performance using TDQ/TIS compare to TOS?
Regards,
Yoke Yew
Community Manager

Re: Issue Clarification on TOS vs (TIS / TDQ)

Hello Yoke Yew
Yes, we add most of features dedicated for commercial subscription product, some of them are:
*Grid Computing* :
It's Grid computing for *project* (slit the project on several execution servers, it?s not grid computing inside a job). For each job, the JobConductor will select the best available execution server to deploy and run the job. it optimizes the scalability and availability of the integration processes by ensuring an optimal use of the execution grid, automatically distributing Jobs across the execution servers grouped in a virtual server.
*tParallelize* :
We have tParallelize that help you parallelize and synchronize the execution of numerous subjobs in your main job.(see the third screenshot)

*ParallelizingDataFlows*:
Parallel processing of data refers to the concept of speeding-up the execution of a job by dividing the data flow into multiple fragments that can execute simultaneously. The current processed data being executed across N fragments might execute N times faster than it would if processed as a single fragment. (see the second screenshot)

*FileScale* :
It?s a new big project (started more than 1 year ago) to allow high parallelization on ?big? computers (with a lot of CPU). See more information at http://www.talend.com/products-data-integration/talend-integration-suite-mpx.php
*SOA Manager*:
Run several time the same job* :
In the SOA Manager, you can set up some parameters to allow to fork the JVM for each call (very useful on EAI architecture when you have 10 calls per second).
More information at http://www.talend.com/products-data-integration/talend-integration-suite-rtx.php
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business