split a file containing a big string by fixed length.

One Star

split a file containing a big string by fixed length.

Hi,
i have a file containing rows with out any delimitor, but the rows are fixed length. how to do this job in talend.

if file contains
abcdefghijklmno

i need the out put as below if the column length is 3.
abc
def
ghi
jkl
nmo

can we do this in talend - java?
One Star

Re: split a file containing a big string by fixed length.

I think you have a module tFileInputPositional or tFileInputMSPositional
One Star

Re: split a file containing a big string by fixed length.

Hi Neth,
i have a file of 300MB of continuous string. i want to split it into records of constant size say 35 chars. I don't know how many records file contains. so i don't think fFileInputPositional or tFileInputMSPositional help to do this. any other ideas?
One Star

Re: split a file containing a big string by fixed length.

how many fields do you have ?
One Star

Re: split a file containing a big string by fixed length.

When handling large database type files it is sometimes necessary to split the file into "records" or known line lengths as the file has been output without any delimiters/separators between records.
Is there any feature in talend allows a user-specified string to be inserted at a constant user-specified increment in the file from some start point in the file to some end point in the file.
Seven Stars

Re: split a file containing a big string by fixed length.

Try this code in a tJavaRow, which breaks a string into sets of no more than 66 characters, preserving whole words, delimiting each set by "~|~" and including a set number delimited by ":::".
Integer LineNumber = 1;
String DelimitedLine = "";
String RemainingLine = input_row.InputLine;
Integer EndOfLineIndex;
while (RemainingLine.length()>66) {
EndOfLineIndex = RemainingLine.lastIndexOf(" ",66);
DelimitedLine = DelimitedLine+"~|~"+String.valueOf(LineNumber)+":::"+RemainingLine.substring(0,EndOfLineIndex);
RemainingLine = RemainingLine.substring(EndOfLineIndex+1);
LineNumber = LineNumber+1;}
output_row.OutputLine = (DelimitedLine+"~|~"+String.valueOf(LineNumber)+":::"+RemainingLine).substring(3);

You can then follow the tJavaRow with a tNormalize (don't forget the escape characters in the item separator as it's a regular expression i.e. use "~\\|~") to separate the sets into rows and a tExtractDelimitedFields to separate the set numbers from the actual input subset.
One Star

Re: split a file containing a big string by fixed length.

hi alevy,
can u provide an example for this.... becoz i am reading the data from the file ..... i am always getting "Out of Memory exception" ...... can u provide me a job for the same???
Seven Stars

Re: split a file containing a big string by fixed length.

The example I provided works with strings of a "reasonable" length, assuming essentially that your input still has some sort of row/record delimiter and that you just need to break the strings down into smaller chunks. If your entire file is one string, then I'm not surprised you get an out of memory error Smiley Happy
Other than increasing the memory allocated to the run-time environment (see lots of other posts about this), you might have to write your own code to handle the file.
Sorry I can't suggest anything else.
One Star

Re: split a file containing a big string by fixed length.

Hi alevy,
thanks for reply, as my files is in >10GB i am unable to handle this file in Talend, because of Out Of Memory problem. i am able to get the first record of desired length, after that we have to write the file in to other file without the extracted string. there the Out of memory occurs, because we have to hold ~20GB data in my case.
As of now i am using Ultra Edit tool to split the DB file in to records, its providing a dedicated function for that. i wish the same functionality will be provided in Talend soon.
One Star

Re: split a file containing a big string by fixed length.

Hello, may I push this topic? Is there any solution to the original problem? I have the same thing to do: Big file, one chunk of data, but the data is to divide in defined rows. There is no EOL delimiter, one row is only defined by the number of characters in that row.