tStandardizeRow Usage?

Highlighted
One Star

tStandardizeRow Usage?

Hello i have a file not delimited and i would like to parse it
Would it be possible to split my file(according to row lengths) by using RegEx ?
For exemple i want to say:
the 1st row is from 1 to 7 char, the 2nd is from 8 to 12 ...
Is it possible? Where can i configure it?
Than you in advance
Moderator

Re: tStandardizeRow Usage?

Hi,
Regarding your previous post , it seems you have to use MapReduce job.
If so, TalendHelpCenter:tFileInputRegex haven't supported for MapReduce yet.
Here is a solution for your use case: Put your file into Hadoop firstly then tHDFSInput ---> tMap(tHDFSInput---> tJavaMR).
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: tStandardizeRow Usage?

Hi Sabrina,
Thank you for your attention,
So, i will use tHDFSInput (with a single column schema , raw string)-> a tjavaMR (with my csv real columns ) -> tlogRow

Is there something wrong according to you?
One Star

Re: tStandardizeRow Usage?

Finally,
i have used a tHDFSinput followed by a tMap.
The tmap does a substring on input rows.
Do you think it is a good solution?
I am working with very big file (90gb)

Best regards
Moderator

Re: tStandardizeRow Usage?

Hi,
In case there is any memory issue caused by big file for your job , could you please take a look at the online KB article
TalendHelpCenter:ExceptionoutOfMemory.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: tStandardizeRow Usage?

Thank you Sabrina.
Can you confirm to me a last thibg?
Indeed, mapreduce jobs are played in my cluster, aren't they?

So the memory exception should happen because of the tlog? If i directly insert the data in a database. It shouldn't happen no?

Thank you a lot for your help Sabrina.
Moderator

Re: tStandardizeRow Usage?

Hi,
The tMap component is cache component consuming two much memory. You'd better store temp data on disk.
If i directly insert the data in a database. It shouldn't happen no?

It depends on your input data and your design.
There are several possible reasons for an outOfMemory Java exception to occur. Most common reasons for it include:
1:Running a Job which contains a number of buffer components such as tSortRow, tFilterRow, tMap, tAggregateRow, tHashOutput for example
2.Running a Job which processes a very large amount of data.

Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

What’s New for Talend Spring ’19

Watch the recorded webinar!

Watch Now

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch

Downloads and Trials

Test drive Talend's enterprise products.

Downloads

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Download