One Star

Big files (tFileInputPositional)

Dear Talend Support Team,
We have a huge input file with more than 4 mio. rows in it. This file is read by tFileInputPositional and afterwards its data flow is linked
to tMap. There are in addition lookups with database tables but theses tables don't contain many rows. The problem is the
enormous memory consumption. We need a way to keep the memory moderately. Is there a way to read the huge input file in parts and
than process it and after all read the rest?

Kind regards,
Hilderich
13 REPLIES
Four Stars

Re: Big files (tFileInputPositional)

Hi Hilderich,
In order to solve memory problem, in tMap you can save the records in file system. Any ways when tfileinput component reads the file, it can't read all the rows at a time. It reads in chunks of records and then goes to tMap. Your tMap component is the one who collects all the records in memory/file system, works on join operation and pass it to next component after processing. Storing intermediate records in file system will help you to solve the memory problem.
This option is available in property setting in the input section of tMap (top third icon from left at input side)
Thanks
Vaibhav
One Star

Re: Big files (tFileInputPositional)

Hello Vaibhav,
Thanks for your answer. I forgot to mention that this option (store temp data to file) is already in use. Unfortunately the memory consumption has not improved.
When the job is in process I can observe the temp files written to disk but the consumption is still on its maximum. The problem might be the last tMap component before
the data are stored into the database. But on this final tMap there is no lookup designed and therefore I cannot save the flow temporarily to disk again. Any other ideas?
Kind regards,
Hilderich
Four Stars

Re: Big files (tFileInputPositional)

Hi,
you can try disabling part of job which will help you to understand which component or section of job is consuming memory..or also you can try to break one job into small subjobs and pass data from parent to child or use files in between processing... Performing all tasks in single job is not optimized way to deal with large amount of data and joins... even you can distribute join processing in stages if possible.
Vaibhav
One Star

Re: Big files (tFileInputPositional)

Are you sure it makes a difference to split it into two different jobs? Finally the second job has also the task to process 4 mio. rows transmitted from the job before.
One Star

Re: Big files (tFileInputPositional)

The bottleneck is component tDenormalize. Without this there is no memory consumption up to its limit. Any suggestions how to replace it by a more efficiently approach?.
btw: Your image attachment function here is defect - I cannot attach any images anymore.
Four Stars

Re: Big files (tFileInputPositional)

Yes, what you are trying to do with tDenormalize?
One Star

Re: Big files (tFileInputPositional)

We need to group the data structure but we skip field "LKZ" from grouping. By this we get the values for "LKZ" comma separated and that is what we want.
This all can be done and is realized already by tDenormalize in the job above.
Four Stars

Re: Big files (tFileInputPositional)

--- just an idea...
you can put a tfilterrow component before tdenormalize and distribute rows based on particular key value which does not oppose the grouping functionality required by tdenormalize... then you can have two tdenormalize component in main and reject flow... there by dividing the memory usage onto two components... also can use sort component before tdenormalize to give him sorted data so as to process quickly...
One Star

Re: Big files (tFileInputPositional)

Thank you for your help and your suggestions. As far as I know tSortRow is also a memory killer. I could imagine tSortRow in combination with tDenormalize would blow up the memory. :-)
Seven Stars

Re: Big files (tFileInputPositional)

tSortRow can sort on disk under advanced settings. You can then use the tAggregateSortedRow and the list function to denormalize it and reduce the memory consumption.
One Star

Re: Big files (tFileInputPositional)

Hello rbaldwin,
That sounds good. I am going to try it tomorrow and give you feedback right here. I am going home now.
Kind regards,
Hilderich
Moderator

Re: Big files (tFileInputPositional)

Hi hilderich,

Is there any feedback for your issue?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Big files (tFileInputPositional)

This approach was helpful and it is in use.