One Star

Performance on a job

Hi,
For me, Talend is a very good solution but now, I try to do performance tests. And, my results are disastrous.
I join a screen of my job which discribe my business rules.
My project is to replace a loader done with Access.
With my current loader, the execution time of this job is 4minutes and 30 seconds.
With Talend, the execution time of this job is 1 hour and 28 minutes.
How can I improve the performance ?
Thx.
29 REPLIES
One Star

Re: Performance on a job

Hi,
Some questions :
- How many rows are treated by your job ?
- Do you launch routines in your tMap ?
- Your databases are local or on network ?
And an experience :
Before launching the job, check the "Statistics" box to see where the dataflow is slow
One Star

Re: Performance on a job

There are 421 000 rows on my job.
Yes I launch some routines in my two tmap but not complex.
My database Oracle is on network and access is local.
It's exactly the same configuration that my other loader.
More precisions : on my job the tAccessInput and the tAccessOutput is the same database Access.
One Star

Re: Performance on a job

Before launching the job, check the "Statistics" box to see where the dataflow is slow
With this, you'll be able to see where things are slow (is it on oracle or access ??)
Once you have check with statistics, try to separate access input and output in two files
One Star

Re: Performance on a job

Show the statistics is not very interisting because the stat are identical in all the job about 120 rows/seconds
One Star

Re: Performance on a job

it is interesting, it shows that dataflow is slowing at the entry point : your oracle database on network
Could you show a screenshot of your tOracleInput component's configuration ?
One Star

Re: Performance on a job

Precision :
Oracle is connect by ODBC. I try by JDBC and it's about the same perfs.
Access is local : my file is on my computer.
Oracle is distant is on my network.
I add a screen of my param in my first post. See top.
Employee

Re: Performance on a job

Talend Open Studio generate Java or Perl code.
None of these language manage Access database natively.
Java Access DB Components communicate through an ODBC Bridge.
It will never be as fast as a native connexion or as fast as a real JDBC connexion.
Generally speaking, Input is not the problem. Writing is always longer...
That's why when you change tOracleInput, to ODBC it doesn't change the results.
You can try to tweak Avanced Settings / Autocommit value in your tAccessOutput : increase this value.
HTH,
One Star

Re: Performance on a job

For me, it's much faster to read/write an SQL database than a flat file (I can reach 40000 rows/sec against 3000 for a flat file)
All my jobs use Java
Employee

Re: Performance on a job

I only said that reading is faster than writing.
About writing to file, on my laptop with a quite slow hard disk, I can easily go up to 75000 rows per second.
My Input file has 1 000 000 rows and 11 columns with different data types (Integer, String, and Date).
I write to a simple tFileOuptutDelimited without any CSV options...
My only "special" configuration is to temporary disable my Antivirus.
Can you give me more details about your own tests ?
One Star

Re: Performance on a job

Maybe I'm missing something - sorry to intrude.
However, it was mentioned earlier that one of the databases in Oracle, yet I see no tOracle components in the job. How come?
One Star

Re: Performance on a job

You don't see tOracleInput because I use the component ODBC to connect to the data warehouse Oracle. The tInput component is DWH on my screen.
For Mhirt : You can load at 75000 rows/s on what type of database ? Oracle ? You display the statistics to see the performance or not ?
Thx
Employee

Re: Performance on a job

sorry suzchr, I get 75000 rows / s with file to file. My message was for Maverick (he is limited to only 3000 rows persecond and I don't understand why)
For Databases, the best performance are obtained with bulk components (not available for Access).
Otherwise, it's mainly relative to Autocommit tweaking
In Java you can show statistics, it don't affect much performance.
In Perl, it has more impacts..
HTH,
One Star

Re: Performance on a job

I'll checked again.
I was exagerating with 2000 rows/sec :s
It's 7000 rows/sec when reading from a delimited flat file with following specs :
- Number of rows : 7 000 000
- Number of columns : 12
- My job write to an excel file, If I write to a delemited file, the number of rows/sec is growing to 15000
- HDD speed : 7200 RPM
- I have an antivirus, but I cant disable it (SBS behind Smiley Happy)
But nevermind, I dont have any problem with this Smiley Happy
One Star

Re: Performance on a job

mhirt, I have a question for you ! I see that your status is Talend Team. Do it significate that you work for Talend company ?
One Star

Re: Performance on a job

I have an other question according to my job. To improve performance, I need to modify the commit on tAccessOutput. However I don't know if it's better with a big commit (each 20000 rows for example) or a little commit (each 10 rows for example).
I use an computer with 1Go of ram memories and my process write 422188 rows in my database Access.
One Star

Re: Performance on a job

Somebody know how the commit is done if I write 0 like value in commit every ?
Employee

Re: Performance on a job

suzchr,
I have a question for you ! I see that your status is Talend Team. Do it significate that you work for Talend company ?

Yes I'm working for Talend ! :-)
However I don't know if it's better with a big commit (each 20000 rows for example) or a little commit (each 10 rows for example).

In general, it's better with a big "commit every" value, but it not as simple as that.
You may have better performance with a commit every of 40000 than with a comit every of 50000.
You have to make tests to find the better value.
Somebody know how the commit is done if I write 0 like value in commit every ?

With 0 or empty, there won't be any commit at all.
HTH,
One Star

Re: Performance on a job

Thank you for all your answers !
I realise benchmark in my job and after I will give my results.
My first impression is with Access the most efficiant is to commit every 1 values. It's rare but in my case it's like this.
One Star

Re: Performance on a job

So I am realising my benchmark and the result are not good...
In fact I realize two types of benchmark. The first is the job complete and the best time is get with a commit value on the tAccessOutput at 10. The best time is 22 minutes versus 9 minutes with my loader in Access.
Then, I create the same job without write on Access (I delete the tComponentOutput). The time is 4min 40 seconds. This is very good.
Then, I create a job where I just write on Access. I write 500 000 lines generated by the tRowGenerator. The best time is get by a commit value at 125 000. This best time is 5 minutes 39 secondes. This is also efficient.
All in all, I create two job one which extract only the data and finish by the tBufferOutput component and an other which get the data of the job and write on Access. But the performance are bad. After two hours I have just write 125 000 rows.

How can I do to improve my performance ? Someone has a good idea ?
One Star

Re: Performance on a job

Hi suzchr,
There is a special reason using a tBufferOutput ? Can't you load directly in Access instead of loading in the buffer first ?
One Star

Re: Performance on a job

Yes, it's done but the best time get is 22 minutes. I try to use 2 jobs to improve this time.
After my test I can say that when you write on Access, the best commit value is 125 000 rows, but if I write on Access in the same job that I read in my data warehouse, with a commit of 125 000 rows the time is 6h 40minutes...
One Star

Re: Performance on a job

For help I add two screen shot which shows the two jobs.
Four Stars

Re: Performance on a job

Hi suzchr,
Try to test with only one tMap,it's accept many input and output ?
you may put a better Expression key to improve performence.
Regards
One Star

Re: Performance on a job

I don't have a key define on my schema. Do you think if I define a key I improve the performance ?
Employee

Re: Performance on a job

Be careful, don't confuse between the key column into a schema and Expression key column into tMap.
Check the Key column from a schema will not improve performance into tMap, yet set an Expression key into one of your lookup will do.
One Star

Re: Performance on a job

Ok I define a key on my second TMap. I will send the results after.
Thx.
One Star

Re: Performance on a job

Have you tried to load a temporary table as output in the first job ?
Then you could map it with the output table in the 2nd job.
One Star

Re: Performance on a job

Hi,
Make sure that the metadata schema in yout tMap and in your tAccessOutput match exactly (including datatype) with the schema in your final database
One Star

Re: Performance on a job

hi can some one tell me how to increase the speed of the rows(no.of.rows) passing in between source to destination...for eg...suppose if a csv file consists more than 1lakhs rows means while running the job..the rows passing in a speed of only 200 rows/s to 300rows/s...how to increase it to more than 1000rows/s...pls anyone reply me...