Using tJoin. Bugs are crawling all around!

One Star

Using tJoin. Bugs are crawling all around!

Hello,

I'm new to Talend and have been trying to perform the same task with different approaches in order to better learn how to use ETLs in general.

I had a rather simple problem that needed near 6 joins and to sum aggregations.
- I first wrote the SQL "INSERT ... SELECT ..." code without any problem.
- Then I tried using the bricks that Talend proposes so I had multiple tJoins in cascade, with a final tAggregate.
- Finally I tried connecting all my tables with a single tMap, with a final tAggregate.
The first and the last solutions worked great in Talend. (tMap seems to be a powerful component).
On the other hand, tJoin are a nightmare for beginners!

I've used the simplest test possible (2 tables, a tJoin, and an output) in order to experience the usage of tJoin.
In short I noticed the following :
- Edit the schema like this :
* Select all the columns from the main table, you may trim some columns from the bottom, but don't make holes and don't change their order!
* Add some columns from the lookup table (I didn't experiment enough to be able to give more advises)
- Check the first checkbox and add the mapping for the columns of the lookup table.
(The "++" button adds too many useless rows, I didn't understand this step very well.)
If you do not follow the above, you will see that :
- Affectations are shifted, resulting in incorrect data and in the best case in compilation errors (so you can spot the problem).
- You won't have any data in the joined columns.

Such behavior makes me think of a bug (especially for the first item) in the code generation.
I also can't see the purpose of the first checkbox. Why doing a join if not to fill the eventually added output columns from the lookup table.
I think the edition of the schema could be improved (for this regard) by keeping track of the source tables.


I realize that the tMap brick use more powerfull and extends tJoin functionalities, but there should be a purpose to the fact that tJoin is still available.
Moreover tJoin is a natural reflex for beginners and corresponds well the notion of Joins (hopefully) and does nothing more.
I think it should be kept, even if it looks like being abandoned (as I can only see "Try tMap" in answer of any problem with tJoins).


What didn't I understand with tJoin? Or is it really poorly/counter-natural-ly implemented?
Thanks!
Community Manager

Re: Using tJoin. Bugs are crawling all around!

Hello TeKa
First, thanks very much for your interesting in Talend and your feedback!!
Yes, tMap is most powerful component in talend, you can do 'all' type of data transform on tMap, whether you are a newbie or an expert, you should be very family with tMap component. About the tJoin component, it only do a inner or outter join, but tMap can do all type of transform, personally, I like use tMap every time. Smiley Happy
Select all the columns from the main table, you may trim some columns from the bottom, but don't make holes and don't change their order!

Which version of TOS do you use? I use TOS3.2.3 and it can change the order of column. Maybe it was bug on old version.
- Check the first checkbox and add the mapping for the columns of the lookup table.
(The "++" button adds too many useless rows, I didn't understand this step very well.)

Sometimes, you also want to output some columns from lookup schema, for that you need check this checkbox and define the mapping. The '++' button is used to add all the output column at a time, if there are so many output columns and most of them come from lookup schema, in this case, the '++'button allow you to add all columns at a time instead of adding one by one.

Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Using tJoin. Bugs are crawling all around!

I'm using the new TOS-Win32-4.0.0M2 version.

As I'm familiar with SQL, I perfectly understand one's desire to include columns from the lookup table (and as for inner join, not doing so would result in a kind of filter), I couldn't see why one would not do so at first.
The fact is that by editing your schema you specify which columns to add, but you still have check the "include lookup into output" checkbox to say "I wan't those additional columns to be filled" (otherwise they would be zeroed or nulled).
Shouldn't the schema edition be sufficient for saying "I want this columns from these lookup tables inserted in the result?". And the unveiled list would say "I'd like these additional columns I've defined in the schema (newly created, not imported from a source table) to be equal to those source columns." (Which would define an alias or a copy of some columns).
In fact, those controls, those fields, the properties of tJoin, aren't self-explicit to me.


I think the tJoin implementation has known a regression since the version your talking about.
Moreover, it is not described in the reference manual for my version.
The above makes me think of a brick being progressively abandoned.


Best,
TeKa

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Have you checked out Talend’s 2019 Summer release yet?

Find out about Talend's 2019 Summer release

Blog

Talend Summer 2019 – What’s New?

Talend continues to revolutionize how businesses leverage speed and manage scale

Watch Now

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog