I'm just getting started loading Excel files into Talend. When adding an Excel sheet to the Metadata, it will find the Excel sheet, but it errors when pulling up a preview. I added all the parameters to the tFileInputExcel component as built-in. All it is doing is reading from Excel file to tLogRow. I get this error. Exception in thread "main" java.lang.NoClassDefFoundError: org/dom4j/DocumentException at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:154) at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:141) at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:186) at org.apache.poi.POIXMLDocument.openPackage(POIXMLDocument.java:67) at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:232) at lists.loadmasterfile_0_1.LoadMasterFile.tFileInputExcel_1Process(LoadMasterFile.java:763) at lists.loadmasterfile_0_1.LoadMasterFile.runJobInTOS(LoadMasterFile.java:1665) at lists.loadmasterfile_0_1.LoadMasterFile.main(LoadMasterFile.java:1530) Caused by: java.lang.ClassNotFoundException: org.dom4j.DocumentException at java.net.URLClassLoader$1.run(Unknown Source) disconnected at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) ... 11 more
I checked under Modules and everything related to poi.*.java is installed. Not sure what I'm missing UPDATE: Forgot to say that I am using Talend Open Studio 5.4.1
This should not happened. Check in the plugin dir the plugin org.talend.libraries.excel_5.4.1.r111943 and you should find theses files: poi-3.6.jar poi-3.8-20120913_modified_talend.jar poi-3.9-20121203.jar poi-ooxml-3.8-20121127_modified_talend.jar poi-ooxml-3.9-20121203.jar poi-ooxml-schemas-3.8-20120326.jar poi-ooxml-schemas-3.9-20121203.jar poi-scratchpad-3.8-20120326.jar simpleexcel.jar talendExcel.jar I guess it is necessary to load modules.
dom4j.jar was not in the jdk/lib folder (the destination of my classpath). So I copied it into that location and tried it again, but got the same errors. I just downloaded the Talend Open Studio installation file. The path is TOS_DI-Win32-r111943-V5.4.1\, but I am actually using the TOS_DI-win-x86_64.exe application to run the program. My understanding was that executable runs it in 64 bit mode. I have Windows 8.1 64 bit and I installed java 64 bit. Is there a different installation file I can use to install a 64 bit version of Talend?
Correction on my last entry. I was not getting the same errors today. Sorry I didn't look that close when I got in today. I restarted my machine today and when I got in, I was no longer getting java.lang.NoClassDefFoundError. (In my previous post I said I copied the dom4j.jar into the classpath. I removed it so I could test with my settings as of this morning.) Now I was getting error: java.lang.OutOfMemoryError: GC overhead limit exceeded I am trying to read data from a .xlsx file. I opened the Excel file and saved it as a .xls and as a .csv to see what the difference was. Using .xlsx file: Original file, size 14,600 KB tFileInputExcel -> row1(main) -> tLogRow tFileInputExcel settings: Read excel2007 file format(xlsx) = CHECKED, All sheets = CHECKED, Header = 1, Limit = 20 (for testing) This resulted in the error java.lang.OutOfMemoryError: GC overhead limit exceeded Using .xls file: Copy of the original saved from Excel, size 16,569 KB tFileInputExcel -> row1(main) -> tLogRow tFileInputExcel settings: Read excel2007 file format(xlsx) = NOTchecked, All sheets = CHECKED, Header = 1, Limit = 20 (for testing) This results in a warning: Cannot read name ranges for _FilterDatabase - setting to empty But the 20 rows are written to the output anyway. Using .csv file: Copy of the original saved from Excel, size 19,239 KB tFileInputDelimited -> row1(main) -> tLogRow tFileInputDelimited settings: Row Separator = "\n", Field Separator = ",", Header = 1, Limit = 100 (for testing) The result is that the 100 rows are written to the output In Talend, I then went to Window -> Preferences -> Talend -> Run/Debug. I changed the Job Run VM argument from -Xmx1024M to -Xmx10G (I have 16G ram in my 64 bit machine). Reran the .xls file. Same result as above. Reran the .xlsx file. Result is that the 20 rows are written to the output. No idea what was going on with my original error: java.lang.NoClassDefFoundError. But it seems to be working now and after increasing the memory usage, reads the Excel files just fine.
Wau, 10G for a job. OK that is by far to much. There must be something in your job what "eats" memory. It could happen the class loader needs also some memory and if it starts working just when the next classes should be loaded and if at this time the memory is full you probably will get this misleading error message.
Hi, >> Job Run VM argument from -Xmx1024M to -Xmx10G is 10G is something very unusual because your file size is very small. 1) Are there any other memory intensive processes running on machine? 2) Can you check how much memory talend is using through task manager? 3) Can you monitor memory usage while execution? 4) Can you restart the machine and restore original memory size for talend and execute the job? I think we must either get an issue or resolve issue. Thanks Vaibhav