tjavaFlex instead of tjavaRow ?

One Star

tjavaFlex instead of tjavaRow ?

Hi,
my input is:
date;type;name;value
2009-07-10;X;A;1.2
2009-07-10;X;B;2.1
2009-07-10;Y;A;3.1
2009-07-11;X;B;1.5
Output is expected:
date;type;A;B
2009-07-10;X;1.2;2.1
2009-07-10;Y;3.1;null
2009-07-11;X;null;1.5
This transformation can easily be handled with tPivotOutputDelimited, but the performance is lousy (50.000 records take 30 minutes).
I have then used tjavaRow to gather the counters per date/type-key in variables and write the output after all counters are gathered. This can be done after the date/type-key changes. In the case of my example it means that the first line
2009-07-10;X;1.2;2.1
will be written when the third input line is processed and the 2nd line
2009-07-10;Y;3.1;null
willbe written when the fourth input line is processed.
This works much faster than the pivot-function but one issue is still open:
The last output line will never gets written since tjavaRow will not be called afterward.
Hence I need an exit-routine to write out the stored values for last date/type-key.
How can I carry out this task?
Can I use tjavaFlex instead of tjavaRow for processing input rows? (tjavaFlex has the "End Code" section)
I have tried it with the following error:
The schema from the input link "row1\" is different from the schema defined inside the component
Thanks for help!
One Star

Re: tjavaFlex instead of tjavaRow ?

I really got stuck with this tjavaFlex-issue. To simplify the question:
Is it possible with tjavaFlex to
1. use code like output_row = input_row as Main Code ?
2. define an output schema different to the input schema?
Please advice!
Do I find more examples on tjavaFlex anywhere?
Regards,
Guenter
One Star

Re: tjavaFlex instead of tjavaRow ?

Hello,
I just tried, and no it doesn't seems possible to have a different ouput schema from the input one.
Plus I noticed a bug in my version TOS 3.1.1 : http://www.talendforge.org/bugs/view.php?id=8315
A workaround could be to write your code in a routine and call it from a tMap where you can change the output schema.
One Star

Re: tjavaFlex instead of tjavaRow ?

Yes, I have noticed the same behavior withe the code overriding.
What I need is a tjavaRow with an "End Code" section and tjavaFlex really loks like it could fulfill this request.
Can anybody from the Talend Team comment please?
Thanks
BTW, I'm using TOS 3.1.3.r26090 on Ubuntu 8.10
One Star

Re: tjavaFlex instead of tjavaRow ?

Hi,
I have an idea. The code generation for a row stream works that way, see code generation model: https://help.talend.com/search/all?query=Component+code+generation+model
* 3 blocks: all begin parts, all main parts and eventually all end parts
* in the begin block, components come in a reverse order compared to the designer tab
* in the main block, all components come in the same order as in the designer tab
* in the end block, all components come in the same order as in the designer tab

So may be you could use a tJavaRow followed by a tJavaFlex. You could use the start part (for variables initialization) and the end part of the tJavaFlex, and keep the tJavaRow for the main code.
Do you think it could fit your needs ?
Regards,
Karine
One Star

Re: tjavaFlex instead of tjavaRow ?

Hi Karine,
thanks for your tips. I have just tried the attached chain. Seems that it can solve my task.
Unfortenately I have to leave now (for vacation :-). Will figure it out in 3 weeks.
tjava sets some init variables
tjavaRow processes all input lines
tjavaFlex executes the final code

Regards,
Guenter
One Star

Re: tjavaFlex instead of tjavaRow ?

Hi Guenter,
the number of your columns are variable? So you could wrote all lines not before the last row is processed? If so you solution wouldn't work. If you wrote your rows in tJavaRow, you wouldn't know the structure at this time. What would you like to expect if you have three values of one date in your example?
Bye
Volker
One Star

Re: tjavaFlex instead of tjavaRow ?

Hi Volker,
the number of counters I have to process is defined (in my example I have 2 (A and B), regardless what counters are present in the file. Every object can have a flexible number of counters (from 0 to any number), but only counters A and B are of interest. That means:
missing counters are set to null(empty) and
unknown counters are ignored(e.g. a counter with the name C).
Since I don't know how many counters are actually present in a file I have no indicator for the last counter of one object (see example). But I know the file is sorted so all counters for one object(date/type) come concentrated in a bunch.
My approach is to store the date/type and the counters A and B in variables as long as the date/type does not change. As soon as it changes I write out the stored values in one record, reset the store and start again with gathering the counters.
At the end the tjavaFlex will be called once and writes the remaining stored countes as one row.
tjava defines the counters in the store and sets them to empty strings
tjavaRow gets an input line, compares the date/type with the store. If the date/type differs it writes the stored values in one row otherwise it stores the counter from the input and writes an empty row which is filtered out afterwards.
tjavaFlex writes the remaining stored counters.
Did I make myself clear?
Regards
Guenter
One Star

Re: tjavaFlex instead of tjavaRow ?

Hi Guenter,
thanks. Now it is clear for me.
What about the following solution:
a) define your output structure with all columns you need (date, type, a, b in your example).
b) Process each input row and move the data into the right column. You could use a tJavaRow or a tMap for this. So the first two values (date, type) are moved 1::1, the third value is moved depending on type.
c) Now you have as much output row in the right way as input rows. Now you need to merge them together. Use a tAggregate for this job. For date and type use first, for a, b, and so on use max (or anything else which will give you the right value).
tAggregate row is performance improved since 3.x, so I think this should not be the bottleneck. Additional you could use tAggregateSortedRow.
Hope this helps.
Bye
Volker
One Star

Re: tjavaFlex instead of tjavaRow ?

Hi Volker,
thank you very much for advising a beginner.
I'm gonna try this after my return from holiday.
Cheers,
Günter
One Star

Re: tjavaFlex instead of tjavaRow ?

Have a good holiday!
One Star

Re: tjavaFlex instead of tjavaRow ?

Hi Volker,
seems that tAggregateRow solves my task perfectly. Performance is high (150.000 lines within 1 second).
Since my input is already sorted I understand that tAggregateSortedRow could even be better (I guess it saves the sorting step).
tAggregateSortedRow has an additional field "Input rows count" (__ROW_COUNT__) which i don't understand. What is it for?
Regards
Günter
OK, I read the many posts regarding this issue. Seems that I should stay with the tAggregateRow.
One Star

Re: tjavaFlex instead of tjavaRow ?

Hi Günter,
I don't know what "Input rows count" is used for. Maybe you will find some information in the documentation.
Bye
Volker