Hello, I have an excel spreadsheet which has 5 Lac records and of size 37 MB ( It has roughly 78 columns). I am unable to create metadata for that file. Talend is throwing "Java heap space exception" I need to process that file in couple of jobs. Please help.
Create a smaller file to read the metadata and use the original file for your import. In case of you experiences java heap space errors while processing your file in the job, try alternative components to read Excel files (tFileExcelWorkbookOpen and tFileExcelSheetInput). The component tFileExcelWorkbookOpen has an memory saving mode (works only for the newer XLSX format). tFileExcelSheetInput needs tFileExcelWorkbookOpen.
Yes, as mentioned by jlolling, Use a smaller file for creating metadata. If you do not have smaller file then you can probably open the XML file through some editor like Notepad++ or Edit plus and select only few records to create an example or sample file. One you have created metadata then I do not think that you should be having any issues with reading the large XML file.
Hi Jilolling and Vikram, Yes am able to create metadata with a smaller file. Now the problem is with operations with the large excel files. As suggested by you, I have used tFileExcelSheetInput and tFileExcelWorkbookOpen to open the excel files and able to read only two files. When the result of those two files is tried to join with third file, again am facing "java heap space exception".Please help.
Did you switched on the memory saving mode or do you have to deal with the old OLE based format? If you have very large xls files, yes there is no way without adding more memory or you have to splitt the input file. You could also decide to load the data files without look ups into staging tables and do the job in the database.
Hi, Am using tjoin to join the xlsx file. Actually we are migrating old legacy data ( data in spread sheets) into our new oracle 10g database. Here I need to join the spreadsheets (xlsx files) and load the necessary columns into a single staging table and from there actual business logic is to be applied. My problem is that I am uable to join those xlsx files only. Am getting "java heap space exception" while reading/joining a few of them. Can u please suggest me a way to getting the source data (data in xlsx) files into the staging table.
You can select the required fields from both the excel sheet load then to different csv files. As delimited files are lighter and easier to read. Now read delimited files and join using tMap. Also do not forget to switch on store temporary data to disc setting in tMap. Let me know if it helps.
That is the part which I do not understand. To save the content from a large Excel file into a database you should not be a big deal! I suggest loading all files - WITHOUT joining to anything else - at first in the database and THAN with these staging tables (one file == one staging table) doing your lookups in the database not between files.
Hi, Your suggestion helped me.. Thanks a lot. I have loaded individual files on to individual tables first and later on joined them later on. Thanks once again. Just for information, file with large data got loaded only from UNIX box and only after increasing heap size in shell script file.