Four Stars

Big excel files as input

Hello and greetings from Germany Smiley Wink,

 

this is my routine:

tFileInputExcel -> tMap -> tMysqlOutput

 

My problem is that the Excel file contains 190000 rows.

I read a lot and understand that talend can't process that large Excel files.

I tried to convert it to csv but it's complicated, because there are semicolons as data in columns.

Also my file uses commas as decimal seperators which throws errors by talend (or mysql or both).

 

All in all I need a different solution. Any suggestions?

I read about spliting the Excel in 10000 row parts but I can't reconstruct it without tFileOutputDelimited and this is not working because of the semicolons in columns and my missing knowledge in talend Smiley Frustrated Smiley Very Happy . I need a solution that works fully self-sufficient.

 

Thanks for your support!

 

[Using: Talend DataIntegration 6.4.0]

  • Data Integration
1 ACCEPTED SOLUTION

Accepted Solutions
Ten Stars

Re: Big excel files as input

On the Advanced Settings tab of your tFileInputExcel component, there's a Generation Mode setting with two choices: User mode and Event mode.  Event mode consumes less memory and may allow you to read directly from your Excel file.

6 REPLIES
Nine Stars TRF
Nine Stars

Re: Big excel files as input

The problem is probably not the numer of lines in the input file, but the size of the memory to deal with the content.

Do you have any error when the job runs? If yes, which one?


TRF
Ten Stars

Re: Big excel files as input

On the Advanced Settings tab of your tFileInputExcel component, there's a Generation Mode setting with two choices: User mode and Event mode.  Event mode consumes less memory and may allow you to read directly from your Excel file.

Four Stars

Re: Big excel files as input

@TRF:

Spoiler
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.apache.xmlbeans.impl.store.Cur.createElementXobj(Cur.java:260)
    at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Cur.java:2997)
    at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1420)
    at org.apache.xmlbeans.impl.store.Locale.loadNodeChildren(Locale.java:1403)
    at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1445)
    at org.apache.xmlbeans.impl.store.Locale.loadNodeChildren(Locale.java:1403)
    at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1445)
    at org.apache.xmlbeans.impl.store.Locale.loadNodeChildren(Locale.java:1403)
    at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1445)
    at org.apache.xmlbeans.impl.store.Locale.loadNodeChildren(Locale.java:1403)
    at org.apache.xmlbeans.impl.store.Locale.loadNode(Locale.java:1445)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1385)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1370)
    at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:370)
    at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:144)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(Unknown Source)
    at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:183)
    at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:175)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.parseSheet(XSSFWorkbook.java:438)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:403)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:266)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:336)
    at local_project.wwi_excel_0_1.wwi_excel.tFileInputExcel_1Process(wwi_excel.java:1313)
    at local_project.wwi_excel_0_1.wwi_excel.runJobInTOS(wwi_excel.java:2485)
    at local_project.wwi_excel_0_1.wwi_excel.main(wwi_excel.java:2334)

@cterenziI think it worked!!! Wow thank you Heart I will check the data and reply tomorrow!

Four Stars

Re: Big excel files as input

@cterenziIt works like a charm (less than 1 minute for 200000 rows).

 

But I get another error and maybe you could help me again? Smiley Happy Or should I start a new topic?

 

I get this error:

Spoiler
Exception in component tFileInputExcel_2 (wwi_excel)
java.lang.RuntimeException: The cell format is not Date in ( Row. 182369 and ColumnNum. 7 )
    at local_project.wwi_excel_0_1.wwi_excel.tFileInputExcel_2Process(wwi_excel.java:1379)
    at local_project.wwi_excel_0_1.wwi_excel.runJobInTOS(wwi_excel.java:2272)
    at local_project.wwi_excel_0_1.wwi_excel.main(wwi_excel.java:2121)

It's very weird because all other rows are loaded correctly and this field has the same formatation. (see screenshot attachment)

Little be sad that the whole file has only 182396 rows (27 more) Smiley Frustrated

Ten Stars

Re: Big excel files as input

Cell appearance and cell format are different things in Excel. If you right-click on that date and choose Format cells... it will likely be formatted General or Text instead of Date.

Also, I'm not sure whether that row number means the actual Excel row or the row number after excluding headers. You may want to check both places.
Moderator

Re: Big excel files as input

But I get another error and maybe you could help me again? Smiley Happy Or should I start a new topic?

 I get this error:

Spoiler
Exception in component tFileInputExcel_2 (wwi_excel)
java.lang.RuntimeException: The cell format is not Date in ( Row. 182369 and ColumnNum. 7 )
    at local_project.wwi_excel_0_1.wwi_excel.tFileInputExcel_2Process(wwi_excel.java:1379)
    at local_project.wwi_excel_0_1.wwi_excel.runJobInTOS(wwi_excel.java:2272)
    at local_project.wwi_excel_0_1.wwi_excel.main(wwi_excel.java:2121)

It's very weird because all other rows are loaded correctly and this field has the same formatation. (see screenshot attachment)

Little be sad that the whole file has only 182396 rows (27 more) Smiley Frustrated

 
 
Tags (0)
 


Hello hw 123,

This topic has been set as resolved. Could you please create a new topic? Which will

 get the best support from the community. Many thanks

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.