[resolved] How to create an id mapping with id shift

One Star

[resolved] How to create an id mapping with id shift

Dear Sir or Madam,
We are currently looking into merging two databases into one. Both share the same schema but with different data. The output schema will be the same as the input schema. Currently, we are evaluating Talend Open Studio as a solution for this task.
Let's assume the following simple scenario: There are two tables, one with a list of cities and one with customers:
Cities: ID (PK, AI), Name
Customer: ID, First Name, Last Name, City (FK, references Cities.ID)
Now, let's look at the following two cases:
1. DB2 contains a customer / city not contained in DB1 or vice versa. 
2. DB2 contains a customer / city contained in DB1 or vice versa. 
In case 2, if a city is already in DB1.cities, we could use DB.cities.ID as key for this city. Then we would have to replace the old foreign key in the field DB2.customers.city (and propably a lot more tables) with the one from DB1. Or we could assign a new key to all fields.
In any case, I think we do have to create a mapping for oldKey -> newKey or (if all keys get reassigned) oldKeyDB1, oldKeyDB2 -> newKey. 
I already tried to perform this task by using the tMap function, but I am not able to create a mapping as the tMap component does not support a full outer join. Thus cities contained in DB2 but not in DB1 are lost. 
Can you give me a few suggestions about which components to use and how?

Thank you very much. Smiley Happy

Accepted Solutions
One Star

Re: [resolved] How to create an id mapping with id shift

Ok, i don't see the exact point of your request but let's try to have your matching table
Ok so you want to have a full outer, then i think you would do a left outer DB1 on DB2 with Old_ID_DB1 | Old_ID_DB2 | city
then an inner DB2 to DB1 catching the reject with the same schema.
Then you will have in one hand the matching and the none matching DB1 to DB2 from the outer and the none matching DB2 to DB1 in the inner.
I hope this is clear.

All Replies
One Star

Re: [resolved] How to create an id mapping with id shift

You can use Tunite component to unite your flux from the 2 Databases,
a schema like (DBname, ID, City)
Then aggregate your data to prioritize DB1 or DB2 as you prefer, or to generate ID if you want.
One Star

Re: [resolved] How to create an id mapping with id shift

Hi Fwandz,
Thanks for your response.
When I use the tUnite component to unite two tables, I get the result I expect from an SQL union all operator: A large table which starts with all rows from table 1 and ends with all rows from table 2. How do I get tUnite to include an additional column with the database name?
As I am a new Talend user, the "aggregate" step is more confusing to me. I tried the tAggregateRow component, but this seems not quite suited for the purpose. Isn't it more of a "transform" step? Could you please give me a more detailed insight about how to do this step?
Thank you.
One Star

Re: [resolved] How to create an id mapping with id shift

You can add it in your DB component in the request and ad as String Varchar the new line in the schema
One Star

Re: [resolved] How to create an id mapping with id shift

The problem is that i dont get exactly what you want to do.
You want to take the data from DB 2 and to add the cities from DB1 adding a new ID?
One Star

Re: [resolved] How to create an id mapping with id shift

Basically yes, but there is a problem: There are cities in DB2 that are also contained in DB1 but have a different ID in DB2. When we merge the tables, the city must exist only one with one id. But since the city id is referred from the customers tables, we also need to change the id in the customer output table.
Example:
DB1:
Cities (ID | Name)
1 | London
2 | San Francisco
Customers (ID | First Name | Last Name | CityID)
1 | Max | Power | 1
2 | John | Smith | 1
3 | Juergen | Mueller | 2
DB2:
Cities:
1 | San Francisco
2 | Berlin
Customers:
1 | Eric | White | 1
2 | Marc | Black | 2

Output:
Cities:
1 | San Francisco
2 | Berlin
3 | London
Customers:
1 | Eric | White | 1
2 | Marc | Black | 2
3 | Max | Power | 3
4 | John | Smith | 3
5 | Juergen | Mueller | 1

I hope this example makes clarifies the problem. San Francisco is contained in both tables, but has different id. In the output table, it must be contained only once. For instance, it gets assigned the id "1". Now all customers from DB1 need to have their foreign key updated (see the bold figures). The same applies for London, which now has a different id as well. 
As we propably need to join more tables, I intend to create a mapping table which could look like this:
Old_ID_DB1 | Old_ID_DB2 | NEW_ID
1 | null | 3 (London)
2 | 1 | 1 (San Francisco)
null | 2 | 2 (Berlin)
For each old table, I would be able to join the old ID with this mapping table and get the new one which we could then write in the output table in order to get the output above.

Does that make it clear? 
One Star

Re: [resolved] How to create an id mapping with id shift

Ok, here's what i would do in you re case :
First select distinct City from Db1 and DB2, Tunite and TUniq them,
Then use a sequence to make a new ID for the new DB.
In a second time (for exemple another job, or the same if you don t want to request for the data you already have), you do the same for the Customers with their cities, then you output your new city table and join it on the NAME matching then the new IDs.
That's the simplest way to do your job.
I don't really see the point of keeping old ID, as they won't mean anything in your new DB, but if you really wanna do this, it's possible too. It might be a good idea if you want to keep trace for old DB, but you ll need to add a column for DB type, but that d be a little more complicated.
One Star

Re: [resolved] How to create an id mapping with id shift

Thanks for your suggestions. It helped me to get the output I wanted. Smiley Happy I attached my job to the post. 
What is still missing is the mapping table. Just for clarification: This table is only used temporarily and will not be part of the new database. My point is that this might be very useful so that I do not have to perform joins on a NAME in order to get the new id (this also may be a problem with duplicate names). What I would do in SQL was to perform a full outer join on both "cities" tables and add an additional column with the new id. But tMap and tJoin only support an left outer join or an inner join. Do you have any other suggestion how to implement the mapping table or how to perform a full outer join in Talend?
In any case, thanks very much for your time! Smiley Happy
One Star

Re: [resolved] How to create an id mapping with id shift

Ok, i don't see the exact point of your request but let's try to have your matching table
Ok so you want to have a full outer, then i think you would do a left outer DB1 on DB2 with Old_ID_DB1 | Old_ID_DB2 | city
then an inner DB2 to DB1 catching the reject with the same schema.
Then you will have in one hand the matching and the none matching DB1 to DB2 from the outer and the none matching DB2 to DB1 in the inner.
I hope this is clear.
One Star

Re: [resolved] How to create an id mapping with id shift

Yes, that is clear and a good workaround. Thank you very much!