suggestion - tUnion, tNormalize and tDenormalize

Four Stars

suggestion - tUnion, tNormalize and tDenormalize

Hi,
maybe is a tUnion a usefull addon? A tUnionAll function in combination with the tUniqRow would achieve the same.
The tNormalize and tDenormalize could be usefull. With a string concat aggregate function you could simulate a denormalize.
But maybe there are easy Perl solutions for this.
I think a sorted input option in the tAggregate component could be usefull. If the input is already sorted (by the database input) then the tAggregate can be processed row-wise.
Tags (1)

Accepted Solutions
Employee

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi
Yes, you will receive email notification once the jira issue is updated.

Best Regards,
Michaël.


All Replies
Employee

Re: suggestion - tUnion, tNormalize and tDenormalize

maybe is a tUnion a usefull addon? A tUnionAll function in combination with the tUniqRow would achieve the same.

Can you give some details about the way you would like a tUnion to work? Several input, one output? First we output the first input data, then the second and so on? Would you manage the order of the input?
The tNormalize and tDenormalize could be usefull. With a string concat aggregate function you could simulate a denormalize.
But maybe there are easy Perl solutions for this.

Please also give some details and/or and example.
I think a sorted input option in the tAggregate component could be usefull. If the input is already sorted (by the database input) then the tAggregate can be processed row-wise.

The way the sort is currently done with tSortRow would globally make the job slower (or at least memory usage would be higher) if you add a tSortRow before a tAggregateRow (which is what would be done if we add a "sort input" option in the tAggregateRow). So sorting the input in the tAggregateRow wouldn't be efficient in my opinion.
Now if we could say to the tAggregate that its input is already sorted, we could indeed output the first aggregated line before the last input line. But it won't be faster, taking your job as the whole. Would you find it useful for your own usage?
Four Stars

Re: suggestion - tUnion, tNormalize and tDenormalize

A tUnion would function the same way as in SQL. Yes, several input and one output and duplicate elimination.
Yes, if you would simulate the SQL Union ALL then you could return the fist input and then the second and so on, the order is not relevant.
Simulation of the SQL Union is the same as the above with extra duplicate elimination.
input1
id|name
--+--------
01 delostilos
02 plegall
input2
id|name
--+--------
02 plegall
03 cantoine
tUnion_ALL output
id|name
--+--------
01 delostilos
02 plegall
02 plegall
03 cantoine
tUnion output
id|name
--+--------
01 delostilos
02 plegall
03 cantoine

A tNormalize would normalize a 'flat' record.
input1
group|names
-----+--------
01 delostilos
02 plegall,cantoine
tNormalize output
group|names
-----+--------
01 delostilos
02 plegall
02 cantoine

The tDenormalize would do the reverse action. A sort of a string concatenate aggregation function.
If the sorted input option for a tAggregate isn't faster then I don't need it. I thought it could be faster if you have a tDBInput that sorts the data in the database and afterwards the tAggregate in your Process it could make a difference with lots of records.
Thanks for your reply.
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

And why not a tMinus or a tIntersect ?
Four Stars

Re: suggestion - tUnion, tNormalize and tDenormalize

And why not a tMinus or a tIntersect ?

Yes indeed. That would complete the 'set' components.
A 'real' joiner would also be nice. The tMap component does only a lookup, which is the equivalent of an scalar correlated subquery in the select part in SQL. When you have multiple matches in your lookup, then the tMap will only find the first value.
Employee

Re: suggestion - tUnion, tNormalize and tDenormalize

delostilos and favre, can you please create 6 feature requests : tUnion, tMinus, tIntersect, tJoin, tNormalize, tDenormalize.
Four Stars

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi plegall,
Where do we have to create those requests and how?
Employee

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi plegall,
Where do we have to create those requests and how?


You can create new feature request or report bugs on our bugtracker.

 

Four Stars

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi,
I've made the entries. Know I get mails for feedback. Wat is the procedure to follow?
Employee

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi
Yes, you will receive email notification once the jira issue is updated.

Best Regards,
Michaël.

One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi Team,
Please let me know if tNormalize or tDeNormalize work with flat file source only.
If not then for database sources how we can use that. Or is there any other component with the same functionality as the tNormalize/tDeNormalize for relational sources?
Thanks & Regds,
wntsm
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi wntsm,
tNormalie and tDenormalize are working on a row. The source depends on your input component and doesn't matter.
This is the same for all components (expect ELT).
Bye
Volker
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi Volker,
Thanks for your reply.
Could you please let me know where I can find a good help document on Talend.
Basically, being a new user with resepect to talend, I want to explore the various functionality required in ETL and want to go through some help documents on how to achieve that using Talend.
Thanks & Regds,
wntsm
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi,
you should download the official documentation. You can find it here.
There is one documentation for Talend Open Studio and one which will explain the components (with examples).
Bye
Volker
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi Volker,
Thanks a lot again for directing me to the proper place for the documentation.
While going through this thread, I came across some discusssion regarding a tUnion or tUnionAll component.
Currently,is there any such component in talend which takes care of the set operations like union ,unionall,minus etc?
Thanks & Regds,
wntsm
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi wntsm,
first: If you work with a database (tInput) you are free to use any SQL statement (also UNION and so on).
If you want to handle different connections you could:
UNION ALL => tUnite
UNION => tUnite --(row)--> tUniquRow
MINUS => tMap (with lookup to a second connection and inner join reject => rejected rows are your result).
Bye
Volker
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi Volker,
Many Thanks again for your help.
Regds,
Suman
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi Volker,
Need your help once again.
The scenario is like this:
Suppose I have the EMP table as source. Now depending upon the SAL I want to populate some records in Table1 and Table2.
e.g if SAL > 2000, populate Table1 else Populate Table2.
Which component should I use?
I tried with tMap and tFilterrow. But it was giving some error.
If some setting is needed in tMap or tFileterrow in order to achieve the desired result, can you please let me know.
A screenshot would be very helpful.
Thanks & Regds,
wntsm
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi wntsm,
if I don't misunderstand anything I would say your job should something like the following:
tInput(EMP) --(row)--> --(out1)--> tOut(Table1)
tMap
tInput(SAL) --(lookup)--> --(out2)--> tOut(Table2)
In tMap you have to do a lookup from EMP to SAL (based on your foreign key).
For the output you define two output streams and define as filter for the first one rowLookup.SAL > 2000 for the second one define "output reject".
If you mapping is complex you could also define only one output and use a tFilterRow with your filter constraint. You could than connect tFilterRow with the first ouptut component and connect the second output with a reject link.
Bye
Volker
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi Volker,
Thanks for your reply.
In my case SAl is not a table. It is a column in the EMP table.
I want to populate say Table A with records from EMP where SAL> 2000 and if SAL < 2000 I want to populate Table B.
Any help is appreciated.

Thanks & Regds,
wntsm
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

In this case you do not need the lookup table. You just define two outputs with a condition rowN.SAL > 2000.
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi Volker,
Thanks for the input.
I already tried that but getting some error.
Please find the error details and the image.
Starting job Test at 21:01 01/02/2009.
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
The operator > is undefined for the argument type(s) BigDecimal, int
at my_project.test_0_1.Test.tOracleInput_1Process(Test.java:1248)
at my_project.test_0_1.Test.runJobInTOS(Test.java:1662)
at my_project.test_0_1.Test.main(Test.java:1576)
Job Test ended at 21:01 01/02/2009.
Regds,
wntsm
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi,
because your column is defined as BigDecimal you should use:
row1.SAL.compareTo(java.math.BigDecimal.valueOf(20000)) > 0

Bye
Volker
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi Volker,
Many thanks as usual.
Whatever you advised is working fine. Though I need to test with some few more cases but I think it will work for these cases too.
But while doing that, one question came to my mind. That is, do I need to have java knowledge also too exploit the functionality of this tool or some work around is there?

Another question is, I have installed perl 5.8.8 so that I can work with perl projects too. But here the problem is, Oracle client package for perl 5.8.8 is not provided by perl. Then how shall I work with Oracle sources/targets in perl 5.8.8?
I tried with ppm install for oracle . But it says could not find the package.
Do you have any idea how to take care of this situation?
Thanks & Regds,
wntsm
Employee

Re: suggestion - tUnion, tNormalize and tDenormalize

Another question is

Another question, another topic :-)
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi WNTSM!
Try looking on-line for CPAN - the perl repository. Look for Oracle & DBI. From that you should be able to figure what to download using PPM.
Dave Venus
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

Hi,
first concerning your perl-problem: maybe you need to add an additional repository to PPM. As Dave said, start a CPAN to get the name of the component. Then try to find a repository. Google often help me out in this situation.
About knowledge:
I think yes, Perl or Java knowledge is absolutely needed to use Talend Open Studio. The question is more how much knowledge.
If you only use Talend Open Studio to design simple jobs with no special demands performing performance. I think this would be no problem for all users. You should know how to enter strings (escape special chars), cast variables, do some simple operations (?:-operator for example) and so on. I think reading a chapter of a java book about the basics would be enough.
If you would like to go a step further and create your own routine or do some complex operations in tMap you need more knowledge. You should be firm with Java.
The last step would be something like creating your own component, using WebService, think about performance and so on you should be familiar with Java programming, design and architecture.
But I think this would be the same for every tool you use. But you may need to learn a proprietary syntax or will not have the flexibility you would have with Talend.
Bye
Volker
One Star

Re: suggestion - tUnion, tNormalize and tDenormalize

delostilos and favre, can you please create 6 feature requests : tUnion, tMinus, tIntersect, tJoin, tNormalize, tDenormalize.

Any news about tMinus and tIntersect ?