Error running tFileInputExcel

One Star

Error running tFileInputExcel

I'm just getting started loading Excel files into Talend.
When adding an Excel sheet to the Metadata, it will find the Excel sheet, but it errors when pulling up a preview.
I added all the parameters to the tFileInputExcel component as built-in. All it is doing is reading from Excel file to tLogRow. I get this error.
Exception in thread "main" java.lang.NoClassDefFoundError: org/dom4j/DocumentException
at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:154)
at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:141)
at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:186)
at org.apache.poi.POIXMLDocument.openPackage(POIXMLDocument.java:67)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:232)
at lists.loadmasterfile_0_1.LoadMasterFile.tFileInputExcel_1Process(LoadMasterFile.java:763)
at lists.loadmasterfile_0_1.LoadMasterFile.runJobInTOS(LoadMasterFile.java:1665)
at lists.loadmasterfile_0_1.LoadMasterFile.main(LoadMasterFile.java:1530)
Caused by: java.lang.ClassNotFoundException: org.dom4j.DocumentException
at java.net.URLClassLoader$1.run(Unknown Source)
disconnected
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 11 more

I checked under Modules and everything related to poi.*.java is installed. Not sure what I'm missing
UPDATE: Forgot to say that I am using Talend Open Studio 5.4.1
Seventeen Stars

Re: Error running tFileInputExcel

This should not happened.
Check in the plugin dir the plugin org.talend.libraries.excel_5.4.1.r111943 and you should find theses files:
poi-3.6.jar
poi-3.8-20120913_modified_talend.jar
poi-3.9-20121203.jar
poi-ooxml-3.8-20121127_modified_talend.jar
poi-ooxml-3.9-20121203.jar
poi-ooxml-schemas-3.8-20120326.jar
poi-ooxml-schemas-3.9-20121203.jar
poi-scratchpad-3.8-20120326.jar
simpleexcel.jar
talendExcel.jar
I guess it is necessary to load modules.
One Star

Re: Error running tFileInputExcel

All of these files exist under this folder path:
C:\Talend\TOS_DI-Win32-r111943-V5.4.1\plugins\org.talend.libraries.excel_5.4.1.r111943\lib\
Do they need to be moved up to the plugins folder?
One Star

Re: Error running tFileInputExcel

I read something about the classpath variable. I have that set. Would that cause a problem?
Four Stars

Re: Error running tFileInputExcel

Make sure dom4j.jar is in the classpath.
Vaibhav
Moderator

Re: Error running tFileInputExcel

Hi,
I read something about the classpath variable. I have that set. Would that cause a problem?

Did your studio work well before? We have seen that you use 32 bit talend studio. Did you use 32 bit JDK?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Error running tFileInputExcel

dom4j.jar was not in the jdk/lib folder (the destination of my classpath). So I copied it into that location and tried it again, but got the same errors.
I just downloaded the Talend Open Studio installation file. The path is TOS_DI-Win32-r111943-V5.4.1\, but I am actually using the TOS_DI-win-x86_64.exe application to run the program. My understanding was that executable runs it in 64 bit mode.
I have Windows 8.1 64 bit and I installed java 64 bit.
Is there a different installation file I can use to install a 64 bit version of Talend?
One Star

Re: Error running tFileInputExcel

Correction on my last entry. I was not getting the same errors today. Sorry I didn't look that close when I got in today.
I restarted my machine today and when I got in, I was no longer getting java.lang.NoClassDefFoundError. (In my previous post I said I copied the dom4j.jar into the classpath. I removed it so I could test with my settings as of this morning.)
Now I was getting error: java.lang.OutOfMemoryError: GC overhead limit exceeded
I am trying to read data from a .xlsx file. I opened the Excel file and saved it as a .xls and as a .csv to see what the difference was.
Using .xlsx file: Original file, size 14,600 KB
tFileInputExcel -> row1(main) -> tLogRow
tFileInputExcel settings: Read excel2007 file format(xlsx) = CHECKED, All sheets = CHECKED, Header = 1, Limit = 20 (for testing)
This resulted in the error java.lang.OutOfMemoryError: GC overhead limit exceeded
Using .xls file: Copy of the original saved from Excel, size 16,569 KB
tFileInputExcel -> row1(main) -> tLogRow
tFileInputExcel settings: Read excel2007 file format(xlsx) = NOTchecked, All sheets = CHECKED, Header = 1, Limit = 20 (for testing)
This results in a warning: Cannot read name ranges for _FilterDatabase - setting to empty
But the 20 rows are written to the output anyway.
Using .csv file: Copy of the original saved from Excel, size 19,239 KB
tFileInputDelimited -> row1(main) -> tLogRow
tFileInputDelimited settings: Row Separator = "\n", Field Separator = ",", Header = 1, Limit = 100 (for testing)
The result is that the 100 rows are written to the output
In Talend, I then went to Window -> Preferences -> Talend -> Run/Debug.
I changed the Job Run VM argument from -Xmx1024M to -Xmx10G (I have 16G ram in my 64 bit machine).
Reran the .xls file. Same result as above.
Reran the .xlsx file. Result is that the 20 rows are written to the output.
No idea what was going on with my original error: java.lang.NoClassDefFoundError.
But it seems to be working now and after increasing the memory usage, reads the Excel files just fine.
Seventeen Stars

Re: Error running tFileInputExcel

Wau, 10G for a job. OK that is by far to much. There must be something in your job what "eats" memory. It could happen the class loader needs also some memory and if it starts working just when the next classes should be loaded and if at this time the memory is full you probably will get this misleading error message.
Four Stars

Re: Error running tFileInputExcel

Hi,
>>
Job Run VM argument from -Xmx1024M to -Xmx10G is
10G is something very unusual because your file size is very small.
1) Are there any other memory intensive processes running on machine?
2) Can you check how much memory talend is using through task manager?
3) Can you monitor memory usage while execution?
4) Can you restart the machine and restore original memory size for talend and execute the job?
I think we must either get an issue or resolve issue.
Thanks
Vaibhav