Again, I REALLY need your help
I'm having the error "Exception in thread "main" java.lang.OutOfMemoryError: Java heap space".
My OS is Windows 7 Professional Service Pack 1, 64 bits and my RAM 8GB. I'm using Talend Open Studio for Data Integration, Version: 7.0.1
Right now, I'm trying to read an Excel file with 1 million rows and 34 columns aprox. full of data, using tFileInputExcel and a tLogRow, but my job only reads the first row (header) and then I get the error.
If I can read all the data (I hope you can help me with this), I'll process the information with components such as tMap, tAggregateRow, tPivotToColumnsDelimited, tFilterRow, tHashInput and tHashOutput, sending the entire results to a tFileOutputExcel & tFileOutputDelimited.
My advanced settings are:
Any suggestion? If somebody thinks that I could send you the file I'm trying to read (to ckeck if the problem is my computer), just tell me (it's an excel file .xlsx, 153MB)
Solved! Go to Solution.
You can reduce the required memory space by replacing tHash components by files.
You can also store temporary data required by tMap components on disk.
For this, click on the 3rd icone on the upper left corner of the tMap then indicate the "Temp data directory path" and the buffer size.
The data stored in the tHash components isn't large and also, being information that results from the same job and then used as input, Talend apparently forces me to create metadata to reprocess the information with a tMap (and this could be a problem when I run the .bat file in another pc).
On the other hand, I tried to reduce the use of memory with your suggestion but I get the same error:
My job should look like this and I pretty sure that my problem is reading the excel file with the source data (because it works just fine with an small amount of data in the excel file, by the way @vapukov I was using the tLogRow just to try the reading part, but you are right, it's not the best idea).
Please tell me that you have any other idea.
I dont know if this is crazy but maybe with Talend I can split the file and then read the data separately? Or what do you think I should try...
Thank you in advance!
Change all sets of tAggregateRow into a tSortRow (sort by all group by criteria) and tAggregateSortedRow. Ensure you set the Use disk option on all components where possible (giving a more sensible buffer size of 100,000), including tMaps.
Also consider splitting it into 2 subjobs around the tFileOutputDelimited_3 (make the lookup a tFileInputDelimited of what tFileOutputDelimited_3 has just output).
Hi @david_beaty . Thanks for your answer. I'm sorry to bother you, but before making the changes you suggest, I wanted to tell you that even without those components (tMap or tAggregateRow) the job generates the error. I've even tried to just read the Excel file, filter the columns I need (with tFilterColumns) and then filter the rows I need (with tFilterRow) to save this data to a new Excel file (for example), and the error also appeared. With this context, do you still think that I should replace the tAggregateRow with the components you mention? Thank you!
I am really grateful for your answer because so far it is the only way that the memory error has not appeared and the job has read the data from the source excel. I had not tried that alternative (because I did not know it existed), I did it and the reading part worked.
However, a new problem appears: the source file (excel .xlsx) has a date column in this format "14-10-2017 01:42:12" (dd-MM-yyyy hh: mm: ss), when I select in the tFileInputExcel the Event Mode, the job generates the error where it says that this data is not a date and forces me to change it to String (therefore the date becomes something like this "05: 15.3"). This column is very important for the calculations that I must do in the job because after some filters, I have to use the data of the date (the time is irrelevant) to calculate statistical information such as frequency and repetition. Is there any way that using the Event Mode that column where the date is can still be Date type? Thank you!
I am not able to replicate.I see output same as it is in excel. It reads date with pattern "dd-MM-yyyy hh:mm:ss" also as a string.
Can you attach one sample input?
I have attached a very small sample of the input file and how it looks once it passes through column and row filters (which I require in the job). The interesting thing as I mentioned before is the change of the content in the column 'EventTime' that corresponds to a date (select one of the cells in the input file so you can see how it is inside, not only in the preliminary view).
Let me know if you can notice in the output file what I mean about the change when I have to configure the column as a string.
It is because of custom formatting of that excel column. By Any chance , could you request to change the custom format in INPUT to dd/MM/yyyy hh:mm:ss" ?
I had already tried, but I get this error:
Using this scheme (it's important that you know that using the User Mode, I had not had any error with this scheme):
And previously setting this format to that column in the excel file:
what do you think?
@akumar2301I just did it (date pattern in Talend like "dd/MM/yyyy HH:mm", I tried it with quotation marks and without quotation marks, with the month and the hour in uppercase and in lowercase) and I'm getting the same error.
I know almost nothing about Java, but could it be that the JRE installed in Talend says it is 1.8.0_171 and in Java I have 1.8.0_191? And if it is this, what should I change?
@MayTorres attaching simple job which reads your sample input excel file in String and Date format ( after Custom format change )
Well @akumar2301, this is embarrassing because your help has been amazing, but I tried your job with the file that I'm attaching here (after custom format change) and I still get the error. I really do not know what I can be doing wrong
Update: I just used the excel file 'Input' that you loaded with the job, and the same error comes out. I think definitely the problem is not the file but some unwanted behavior between Talend and Java when using the Event Mode.
Open the job in STudio and click the "Code" tab at the bottom of the screen. You should see a red section highlighted where the right hand side slider bar is showing you where the problem is.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Part 2 of a series on Context Variables
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema