Five Stars

Create multple workseets in excel dynamically - Performace

I am trying to create multiple worksheets in an excel workbook and the number of worksheets is not known at the start of the job. This is dependent on the values of  a field from my Oracle input. I followed the post to design the same and this works wonders when there is very less data. Even with 18k records from my toracleinput, the iterations completed when run for whole night are just about 500.

https://community.talend.com/t5/Design-and-Development/Create-Multiple-Tabs-with-single-excel-file-b...

I tried to modify the approach using toraclerow (Prepared statement setting) and tparserecordset and couldn't see anything better. I have attached both the approaches design.

The rest of the job completes in about 10 minutes and just the part of final excel creation is the problem now. How can I improve the performance. My toracleinput has a simple SELECT query that return 18k records.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Fifteen Stars

Re: Create multple workseets in excel dynamically - Performace

You only iterate over the groups. The main bulk of your data rows should be supplied by the queries that are triggered by each iteration. For example, given the dataset I gave you earlier.....

 

Iteration 1: You will have a query for class A, which will return rows for every student in class A.

Iteration 2: You will have a query for class B, which will return rows for every student in class B.

Iteration 3: You will have a query for class C, which will return rows for every student in class C.

Iteration 4: You will have a query for class D, which will return rows for every student in class D.

 

Iterating is a little slower, but you will only be iterating over the classes (in this example). All of the data will be returned per class as a normal flow. You are unlikely to notice any performance issues.

Rilhia Solutions
5 REPLIES
Five Stars

Re: Create multple workseets in excel dynamically - Performace

Fifteen Stars

Re: Create multple workseets in excel dynamically - Performace

There may be a component in the Exchange which will do this for you. I believe @jlolling has built a couple of Excel components. Maybe he can point you in the right direction. 

 

Alternatively, if you want to use standard Talend components, you can do this by grouping your data by prospective "sheet" in your query and then running a query for each group and loading the Excel worksheet in an iterative fashion. For example, lets say I have some data that looks like this....

 

ClassNameAge
ABen12
AJohn12
AJane12
AGemma12
CMohammad13
DAbi13
BMike11
BSafron13
BJade14
BAbdul12
CHenry13
CSam13
CMikela12
DSeb13
DGiles12
DJess13
DRachel12

 

Now, if I want a sheet per group, I can do it this way.

 

tDBInput_1-----> tFlowToIterate_1--iterate--->tDBInput_2------>tFileOutputExcel

 

The tDBInput just performs a query on the data above and returns a distinct list of the classes. That is it. It passes that to the tFlowToIterate.

The tFlowToIterate allows us to iterate over the classes for the next step AND saves the current class to the globalMap. This can be used in our where clause for the tDBInput_2 which will return all records for that class. It can also be used in the Excel file sheet name. 

 

When this jobs runs, it will go through an iteration for each class and it is this which will allow you to dynanically add your sheets.

Rilhia Solutions
Five Stars

Re: Create multple workseets in excel dynamically - Performace

Thank you for your response.

For very smaller data sets, tflowtoiterate is working absolutely great. But even with 18k records, tDBinput1-->tflowtoiterate is doing around 1-2 rows/second. Is this the expected performance? I have used "Use the default (Key,value) in global variables" in tflowtoiterate basic settings.

 

Fifteen Stars

Re: Create multple workseets in excel dynamically - Performace

You only iterate over the groups. The main bulk of your data rows should be supplied by the queries that are triggered by each iteration. For example, given the dataset I gave you earlier.....

 

Iteration 1: You will have a query for class A, which will return rows for every student in class A.

Iteration 2: You will have a query for class B, which will return rows for every student in class B.

Iteration 3: You will have a query for class C, which will return rows for every student in class C.

Iteration 4: You will have a query for class D, which will return rows for every student in class D.

 

Iterating is a little slower, but you will only be iterating over the classes (in this example). All of the data will be returned per class as a normal flow. You are unlikely to notice any performance issues.

Rilhia Solutions
Five Stars

Re: Create multple workseets in excel dynamically - Performace

This worked as a charm. Thank you so much Smiley Happy
Kudos!!