tMap with lookup table for large Excel file

One Star

tMap with lookup table for large Excel file

Dear community,
I am reading Excel files with the tFileInputExcel, tMap and tJavaRow components. If the Excel files get too large, I get an Out of Memory Error concerning the Java Heap Space. I've read in several forums and internet pages that in this case I can use a Lookup table in my tMap component, where I can set the property "Store on disk" to true and also the path, where the input data will be saved on the disk. To make this work I added a second tFileInputExcel setting the Lookup link to the tMap, which I use to map all incoming rows to the output table. The initial tFileInputExcel with the Main link to the tMap now hasn't got any mapping columns to the output table anymore. When I start the job, I still get the Java Heap Space error. Did I forget something? Or is there an example somewhere how to design the job if I only need the Lookup table for this purpose?
Kind regards
Sylvia
One Star

Re: tMap with lookup table for large Excel file

Update: Now that I've created an account I add some screenshots to my problem.
Seven Stars

Re: tMap with lookup table for large Excel file

If your problem is reading the Excel file, this won't help you. "Store temp data on disk" is to overcome the fact that otherwise Talend tries to hold the whole lookup file in memory. But except for lookups (and some other components like tAggregateRow and tSortRow) which hold all the rows in memory, mostly rows are read and processed one at a time.
You need to increase the memory allocated to your job - see https://community.talend.com/t5/Migration-Configuration-and/OutOfMemory-Exception/ta-p/21669?content....
One Star

Re: tMap with lookup table for large Excel file

Thank you for your answer, alevy! Increasing the memory is in our case not an option, as the maximum heap space we can allocate on our machine is 1024MB.
For this problem I openend a ticket for the Talend Support Team and they gave me instructions to use the "Store on disk" feature with two tFileInputExcel components. But I don't know how to adjust the design of my job so the "Store on disk" is used, and they wrote that it's a job design issue and I have to post it here. But they already told me that using "Store on disk" is the correct way.
I now need instructions on how to use two tFileInputExcel components, where one of them uses the Main link and the other the Lookup link to the tMap, when reading one Excel file (see screenshots attached in my second post).
Kind regards
Sylvia
Seven Stars

Re: tMap with lookup table for large Excel file

I'm dubious about their suggestion in general but, specifically, why read the Excel file twice? I can't see how that's going to improve your situation since you're not joining the data. Try making your main flow just a tFixedFlowInput with only one row and one empty column.
One Star

Re: tMap with lookup table for large Excel file

I inserted the tFixedFlowInput and now the job obviously uses the feature "Store on disk", because in my temp folder, where I configured the data to be stored, I get for each job start a new .bin file. But the file size remains 0kb and I still get the Out of Memory error with my Excel file. Do I have to adjust something else? I tried to solve the problem by changing the "Max buffer size (nb of rows)" to a smaller number inside my tMap configuration, but this didn't change anything in the result. Do you have any idea what is missing?
Seven Stars

Re: tMap with lookup table for large Excel file

Just keep reducing the "Max buffer size (nb of rows)" until it works but you really need to be able to allocate more memory to the job.
One Star

Re: tMap with lookup table for large Excel file

I entered "1" as a value in "Max buffer size (nb of rows)", allocated 4096MB max Heap Space to my job and I still get the Out of Memory error, because the .bin files in my temp folder don't grow after their creation. And my Excel file is only 8.2MB in size! If I remove some sheets inside the Excel file and when it has about 6-6.5MB I can read the file without problems, although the reading is very slow. And it is also not some specific sheet which causes the problem, because I tried it with different ones... could this be a bug? I use TOS version 5.0.2...
Seven Stars

Re: tMap with lookup table for large Excel file

Doesn't sound right. Why don't you try with a later version of the studio?
One Star

Re: tMap with lookup table for large Excel file

I am getting the same issue with TOS 5.3.1. Similar scenario.. tMap with 8MB delimited file! The temp files doesnt grow beyond their size during creation........
Seventeen Stars

Re: tMap with lookup table for large Excel file

Probably the problem is reading the large Excelfile. For the main flow you can try the component tFileExcelWorkbookOpen (with memory saving mode if you use xlsx files) and tFileExcelSheetInput (and here read only the columns you need).
This way I guess you can decrease the memory usage significant.
tFileExcelWorkbookOpen --OnComponentOK--> tFileExcelSheetInput_1 -- main flow --> tMap_1
tFileExcelSheetInput_2 -- lookup flow --> tMap_1
If you read for lookup the same excel file, you need only one tFileExcelWorkbookOpen.
One Star

Re: tMap with lookup table for large Excel file

Hi jlolling, I could not find the tFileExcelWorkbookOpen component in TOS 5.3.1. Guess it might be available only in enterprise edition..
Is there anything else I can do to make tMap work with files that have high number of rows..?
Community Manager

Re: tMap with lookup table for large Excel file

This is not a native talend component, but it was developed by Mister Lolling: http://www.talendforge.org/exchange/?eid=623&product=tos&action=view&nav=1,1,1