Process a text file to pull multiple lines putting them into one row.

One Star

Process a text file to pull multiple lines putting them into one row.

I have a text file the describes one row across multiple lines. The columns are delimited by TABs. Each line ends with a new line char. Data elements vary in length. Each line does not contain the same number of data elements. Rows will always be made of three lines.
The file takes the following form:
1. data data data data data
2. datadata data datadata
3. data data
4. data data data data data
5. datadata data datadata
6. data data
7. data data data data data
8. datadata data datadata
9. data data
.
.
.
row one will be made up of lines 1,2,3 from above.
row two will be made up of lines 4,5,6 from above.
each following row will follow that pattern.
How can I process the file so it takes the input in three line chunks and dumps them into one row?
Thanks,
Justin
Tags (1)
Employee

Re: Process a text file to pull multiple lines putting them into one row.

Are you sure line 3 belongs to row 1 and row 2 at the same time?
I advise you to take a look at 1337 which should provide a way to solve your problem.
One Star

Re: Process a text file to pull multiple lines putting them into one row.

You are right. Thanks for that. Each line is only used once. I will take a look at the link you provided. thanks
One Star

Re: Process a text file to pull multiple lines putting them into one row.

I took a look at the link you provided. I am not sure how to apply that to my issue.
Is there an easy way to pull input in multi line chunks then treat each chunk as a row? Right now I am using the tFileInputDelimted component. The issue I am having is I am unable to delimit on the true end of "my" row since each line ends with a new line char. Is there a component that will allow me to delimit on row count or something like that?
-Justin
One Star

Re: Process a text file to pull multiple lines putting them into one row.

I have the same need, any advice will be highly appreciated.
thanks.
One Star

Re: Process a text file to pull multiple lines putting them into one row.

When this question was originally raised, I tried to use a tFileInputRegex with Regex = ^(.+\n.+\n.+\n)$ and Row Separator = '' but could not get it to work. Unfortunately, I gave up after that (sorry mrbaggio).
This does work with the RegExp view - see my screenshot below. Therefore I can only conclude that this a bug. Can someone from the Talend team confirm please?
Cheers,
c0utta
One Star

Re: Process a text file to pull multiple lines putting them into one row.

I found a way to get it to work using a row generator. The idea was to add a number to each "line" then later group on that line number and strip off that extra number in the end.
it would look like this:
1 data data data data data
1 datadata data datadata
1 data data
2 data data data data data
2 datadata data datadata
2 data data
3 data data data data data
3 datadata data datadata
3 data data

If you need more info let me know.
One Star

Re: Process a text file to pull multiple lines putting them into one row.

It seems that RegExp doesn't work, I tried a others expressions, definitly I give up this way.
Unfortunately, I cannot add a line number, as the files come outside the company => bank files ;-)
Bank files are often multi-lines row, so I think this is a basic need, and Talend team knows how to make it work, isn't it ?
Thanks for your help.
Regards
One Star

Re: Process a text file to pull multiple lines putting them into one row.

I was not able to modify the original file either. I did not have control of the input, but added the number as I processed the file. I just created a temp file with the extra line numbers.
One Star

Re: Process a text file to pull multiple lines putting them into one row.

Sorry but I'm newbie, pls describe the steps
thanks a lot.
One Star

Re: Process a text file to pull multiple lines putting them into one row.

Hi

Using a tRowGenerator is an excellent idea as well. As per the Use Case, you could also use int(sequence("line",0,1)/3) to achieve the same result.
I would still like one of the RegExp "gurus" from Talend to answer my question about the tFileInputRegex and why it won't handle multiple lines.
Cheers,
c0utta
One Star

Re: Process a text file to pull multiple lines putting them into one row.

Hi cOutta,
concerning you regex question:
If you work on windows you have to care about the newline coding.
In this case try your expression again but use \r\n instead of \n. For me it works (Windows,TOS 2.3.0)
By the way: The RegExp view is based on EPIC and the behaviour is more like perl than java. I'm not sure but maybe it is based on a perl interpreter at all.
Bye
Volker
One Star

Re: Process a text file to pull multiple lines putting them into one row.

Hi Volker,
A couple of houres trying (\r\n, \r or \n) to make regexp working, no way it doesn't work, do yo mind if I ask you some screen shots ?
I also tried adding a line number, I didn't find a way to give two lines the same number.
Thanks for your time.
Is a Talend guru there ? heeeeeeeeeelp Smiley Wink
One Star

Re: Process a text file to pull multiple lines putting them into one row.

Hello together,
here is a solution for the first problem of Justin:
1. Define two context variables (default for lineCounter = 0)
2. Read data as a full line with tFileInputRegex.
3. Concatenate the lines in tJavaRow.
4. Take out the null values with tFilter Row.
This is the code of tJavaRow:
context.actualLineCounter++;
context.actualLine= context.actualLine + input_row.line;
if (context.actualLineCounter == 3) {
output_row.line= context.actualLine;
context.actualLine= "";
context.actualLineCounter= 0;
} else {
output_row.line= "";

I hope this is the needed solution.
Result of the job:
.--------------------------------------------------------------------.
| tLogRow_1 |
|=------------------------------------------------------------------=|
|line |
|=------------------------------------------------------------------=|
|null1. data data data data data2. datadata data datadata3. data data|
|4. data data data data data5. datadata data datadata6. data data |
|7. data data data data data8. datadata data datadata9. data data |
'--------------------------------------------------------------------'

@a_vaio: Will this help you to?
If I find some time, I will make a feature request for a new component...
Bye
Volker
One Star

Re: Process a text file to pull multiple lines putting them into one row.

Hi volker,
I've tried the tJavaRow, as I'm using TOS2.2.4 I've some issues with using context variables.
However your idea is helpful, I'm still working on.
Will keep you updated.
Cheers.
One Star

Re: Process a text file to pull multiple lines putting them into one row.

Hello a_vaio,
alternative the same should work with tJavaFlex and definig variables at the start-code-section.
Bye
Volker
One Star

Re: Process a text file to pull multiple lines putting them into one row.

Volker,
I'd suggest that you create a Use Case for your response, as tJavaRow/tPerlRow is very flexible and can be used to solve many issues. I'm sure this would help many new users. The Talend team will need to set you up so you can contribute to the wiki.
A second point: I concur with a_vaio in that I cannot get the RegExp to work either with \r\n, \r or \n using either Perl or Java jobs. Therefore I assume this is related to the encoding on the file that I'm using since you are able to get it to work. I am also using TOS 2.3.0.r8623, so I'll contact you offline to discuss.
Cheers,
c0utta
One Star

Re: Process a text file to pull multiple lines putting them into one row.

Hello coutta,
I agree with you that this information (and many more) is very helpful. About half the questions are related to the same things...
I even thought this week about a list of "TOS-pattern".
I had registered myself in the past to the wiki. But I could not change or add pages.
Bye
Volker
Employee

Re: Process a text file to pull multiple lines putting them into one row.

Volker,
You should now be able to edit Wiki Use case pages.
Regards,
One Star

Re: Process a text file to pull multiple lines putting them into one row.

Thanks, I'll give it a try it one of the next nights.
One Star

Re: Process a text file to pull multiple lines putting them into one row.

I tried to optimize my example for merging n rows into one. For this I tries to use tJavaFlex with different input and output columns. In this case I get a compile error because the component generates (per default) mappings between input and output. Is it possible to change this behavior?
I need a variable which is one time initialized, I can add / change value each row and the value is not lost between rows. Additional using a context variable is no good solution. Using tJava does not work because the variable would be local to this component and can't be used in tJavaRow.
Bye
Volker
Employee

Re: Process a text file to pull multiple lines putting them into one row.

Here comes my solution in a Perl job (soon will come the Java solution). Based on the same trick as in 1337.
My slightly modified input file is:
1. A1   B1      C1      D1      E1
2. F1 G1 H1
3. I1 J1
4. A2 B2 C2 D2 E2
5. F2 G2 H2
6. I2 J2
7. A3 B3 C3 D3 E3
8. F3 G3 H3
9. I3 J3

the tMap $Var corresponding expression is:
sequence('topic2017', 1, 1)

tPerlRow1 code is:
$_globals{currentA} = $input_row;
$_globals{currentB} = $input_row;
$_globals{currentC} = $input_row;
$_globals{currentD} = $input_row;
...

tPerlRow2 code is:
$_globals{currentF} = $input_row;
$_globals{currentG} = $input_row;
$_globals{currentH} = $input_row;

The output of my job is:
Starting job topic2017 at 09:33 05/03/2008.
.-------------------------------------------------.
| tLogRow_3 |
+----+----+----+----+----+----+----+----+----+----+
| a | b | c | d | e | f | g | h | i | j |
+----+----+----+----+----+----+----+----+----+----+
| A1 | B1 | C1 | D1 | E1 | F1 | G1 | H1 | I1 | J1 |
| A2 | B2 | C2 | D2 | E2 | F2 | G2 | H2 | I2 | J2 |
| A3 | B3 | C3 | D3 | E3 | F3 | G3 | H3 | I3 | J3 |
'----+----+----+----+----+----+----+----+----+----'
Job topic2017 ended at 09:33 05/03/2008.
One Star

Re: Process a text file to pull multiple lines putting them into one row.

Here comes my Java solution
My input file is :
1. A1 B1 C1 D1 E1
2. F1 G1 H1
3. I1 J1
4. A2 B2 C2 D2 E2
5. F2 G2 H2
6. I2 J2
7. A3 B3 C3 D3 E3
8. F3 G3 H3
9. I3 J3
the tMap Var.row_number corresponding expression is:
Numeric.sequence("s1",1,1)
tJavaRow_1 :
globalMap.put("f",input_row.f);
globalMap.put("g",input_row.g);
globalMap.put("h",input_row.h);
tJavaRow_2 :
globalMap.put("a",input_row.a);
globalMap.put("b",input_row.b);
globalMap.put("c",input_row.c);
globalMap.put("d",input_row.d);
globalMap.put("e",input_row.e);
The output of my job is:
Starting job topic2017 at 11:27 05/03/2008.
A1|B1|C1|D1|E1|F1|G1|H1|I1|J1
A2|B2|C2|D2|E2|F2|G2|H2|I2|J2
A3|B3|C3|D3|E3|F3|G3|H3|I3|J3
Job topic2017 ended at 11:27 05/03/2008.