One Star

generating multiple excel files

Hi,
I need to export a db table into a xls (excel2003) file, but as my table contains more than 65535 rows, i want to generate as many excel file as needed to not be blocked by the excel sheet rows limit.
I know i could use xlsx file (excel2007) in talend to bypass this limit, but the rest of the export process accept only xls file, so i can't use xlsx...
I made a job like this :
tDBInput --> tMap --> tJavaRow --> tFileOutputExcel
The tMap add a sequence to count the number of rows extracted from the table and output a field (named "seq") with this expression :
DataOperation.FIX(Var.seq/65000)+1
Where Var.seq is simply a Numeric.sequence("s1",1,1) 
This field is use in the tJavaRow to produce :
globalMap.put("filename_no", new java.io.FileOutputStream("filename_"+input_row.seq+".xls",false));
and i set my tFileOutputExcel to use outputStream like this :
(java.io.OutputStream)globalMap.get("filename_no")
When i launch my job, it create a first file named "filename_1.xls", and i can see that when the read pass the 65000 rows, a second file named "filename_2.xls" is created, but the first file is empty and when the read reach the 65535 rows, the job crash due to the excel sheet limit.
Even if the wanted files are created, the component still try to write all the rows in the last file.
Can anyone tell me what i did wrong, and how i can get the component to write the bunch of 65000 rows in the correspondant file ?
Thanks
3 REPLIES
Moderator

Re: generating multiple excel files

Hi,
How about spliting a file in several rows?
Here is an option 'Split output in several files' in the advanced setting tab of tFileOutputDelimited. You can generate N rows in each file.
Could you please take a look at a forum:https://www.talendforge.org/forum/viewtopic.php?id=44266 to see if it is satisfying your need.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: generating multiple excel files

Hi,
Yes, that work, thanks Sabrina.
But can't I generate directly xls files ? the job i did can't work the way i did it ?
If not i'll stick with the solution you give, but i really want to understand why my job don't work, I thougt it was just a matter of component configuration...
One Star

Re: generating multiple excel files

Hi,
Yes, that work, thanks Sabrina.
But can't I generate directly xls files ? the job i did can't work the way i did it ?
If not i'll stick with the solution you give, but i really want to understand why my job don't work, I thougt it was just a matter of component configuration...

jmarle,
I'm new to Talend Open Studio so I'm sorry that I don't have a solution for you, but you said you really want to understand why your job doesn't work, and I can give you a bit of insight on that.  I was working on a job where I wanted to perform different actions based on the incoming rows as well, and I had the same issue.  The problem is based on the type of row connections used.  You're using the Flow type of connection correct?  You're probably using the Main flow connection to connect everything, so it's using a Flow, or passing everything over to the next component at once.  So how I understand what happens is your tDBInput sends all rows to tMap at once, and your tMap does ALL of its processing and once it's done, it passes everything to tJavaRow.  Now during tJavaRow's execution, it does set the "filename_no" variable correctly I'm sure, but your issue is because tJavaRow sends all data at once to your tFileOutputExcel since it is a Flow connection.  This means that even though tJavaRow sets the variable correctly while it is executing, the global variable "filename_no" is going to be set to the value of the very last value in tJavaRow, in this case "filename_2.xls".  So when your tFileOutputExcel receives the data flow, it only sees the global variable "filename_no" is set to "filename_2" and only writes to that workbook.
Using the Row->Main flow seems to always lead the following components seeing the global variables as only the very last value.  In order to get this to work, I used the tFlowToIterate component, which takes the Row->Main Flow connection and turns it into a list of individual rows, where it iterates each individual row and sends that row to the next component.  So in this case the following components see the global variable EACH TIME it is set rather than only the very last value it was set to.
In your case though, I'm not sure that this would be a good option, since you have a very large amount of data.  You could definitely try it though.  Since the Output components only take the Row->Main Flow type, I ended up using tFlowToIterate and then using tRowGenerator to convert that individual row to a Flow for the output.  It worked for me and I'm using tFileOutputExcel too, so it might work for you!
Thanks,
Paul