process multiple files with different schema

Highlighted
Five Stars

process multiple files with different schema

Hi there ,

 

I am new to the talend and stuck in one of the issue when trying to upload data in database . I invoke multiple Rest services which generate files in different format. Most of the columns in files are same but few of them had different column name and some of the files has few missing columns . here is the example of scheme structure of 2 files generated .

 

Example - tInputDelimited  

tFileInputDelimited_1 - ColumnName ( FirstName , Company , Account_Country)

tFileInputDelimited_2 - ColumnName ( FirstName , Country )

( Here Account_country and Country are the same type of columns , but both files has different name of columns . Company column is missing from 2nd file) . 

 

I want to merge both files together and want to produce tFileOutputDelimited like this -

ColumnName ( FirstName , Company , CountryOfAccount)

 

Attach is PNG file for the example talend's job .

 

Can someone please help .

 


Accepted Solutions
Eight Stars

Re: process multiple files with different schema

I'm kinda lost in the explanation, I don't know what happened to the column "company". I advise you to use frequently tLogRow at each stage of your flow to identify what causes this mismatch. I suspect it comes from the input (maybe even a mistake at the tExtractJsonFields, maybe).

Anyway, here is a simple demo of how it should be:

Screen Shot 2019-02-13 at 22.59.03.pngThe job structureScreen Shot 2019-02-13 at 23.02.38.pngtMap_1Screen Shot 2019-02-13 at 23.03.15.pngtMap_2

 

This gives the following output. Note the value of each column.

.------------------+----------------+----------------.
|                     tLogRow_1                      |
|=-----------------+----------------+---------------=|
|FirstName         |Company         |Account_Country |
|=-----------------+----------------+---------------=|
|A firstname from 1|A Company from 1|A country from 1|
'------------------+----------------+----------------'

.------------------+----------------.
|             tLogRow_2             |
|=-----------------+---------------=|
|FirstName         |Country         |
|=-----------------+---------------=|
|A firstname from 2|A country from 2|
'------------------+----------------'

.------------------+----------------+----------------.
|                     tLogRow_3                      |
|=-----------------+----------------+---------------=|
|firstname         |company         |country         |
|=-----------------+----------------+---------------=|
|A firstname from 1|A Company from 1|A country from 1|
|A firstname from 2|null            |A country from 2|
'------------------+----------------+----------------'

I hope this help.

 

Regards,

Navds

View solution in original post


All Replies
Eight Stars

Re: process multiple files with different schema

You can merge two flows using tUnite, only the schema needs to be identical. For that, you need to add a third column Account_Country to the tFileInputDelimited_2 using either a tMap or a tJavaRow (let me know if you don't know how to do that).

Keep in mind that column names don't matter, only the order.

 

Five Stars

Re: process multiple files with different schema

Hi navds , 

 

thanks for reply . I tried to add tMap but I am not able to assign values properly . 

 

So 

tFileInputDelimited_1 columns  - FIRSTNAME , COMPANY , ACCOUNT_COUNTRY, LASTNAME

tFileInputDelimited_2 columns -  FIRSTNAME, COUNTRY, LASTNAME

 

I used tMap to add COMPANY column in tFileInputDelimited_2  between  FIRSTNAME and COUNTRY so schema is in sync with tFileInputDelimited_1 .

Value of  COMPANY  will be  "null"  in tFileInputDelimited_2  File .

 

but the output result I am getting is weird .

In tFileOutputDelimited File ,   "Country" value is getting assign to COUNTRY_NM  Column . So here is the column and value structure in new ouptput file I am getting

 

tFileOutputDelimited

column ------------value

FIRSTNAME -----FIRSTNAME

COMPANY -- COUNTRY

COUNTRY -- LASTNAME

LASTNAME --COMPANY

 

so the values are not shifting according to new column added and getting assign to it's old column .

 

Please help !

 

Attached is png file for tmap I have created .

 

 

 

 

 

Eight Stars

Re: process multiple files with different schema

I'm kinda lost in the explanation, I don't know what happened to the column "company". I advise you to use frequently tLogRow at each stage of your flow to identify what causes this mismatch. I suspect it comes from the input (maybe even a mistake at the tExtractJsonFields, maybe).

Anyway, here is a simple demo of how it should be:

Screen Shot 2019-02-13 at 22.59.03.pngThe job structureScreen Shot 2019-02-13 at 23.02.38.pngtMap_1Screen Shot 2019-02-13 at 23.03.15.pngtMap_2

 

This gives the following output. Note the value of each column.

.------------------+----------------+----------------.
|                     tLogRow_1                      |
|=-----------------+----------------+---------------=|
|FirstName         |Company         |Account_Country |
|=-----------------+----------------+---------------=|
|A firstname from 1|A Company from 1|A country from 1|
'------------------+----------------+----------------'

.------------------+----------------.
|             tLogRow_2             |
|=-----------------+---------------=|
|FirstName         |Country         |
|=-----------------+---------------=|
|A firstname from 2|A country from 2|
'------------------+----------------'

.------------------+----------------+----------------.
|                     tLogRow_3                      |
|=-----------------+----------------+---------------=|
|firstname         |company         |country         |
|=-----------------+----------------+---------------=|
|A firstname from 1|A Company from 1|A country from 1|
|A firstname from 2|null            |A country from 2|
'------------------+----------------+----------------'

I hope this help.

 

Regards,

Navds

View solution in original post

Five Stars

Re: process multiple files with different schema

HI Navds ,

 

tLogRow_3 output doesn't come in the format as explained in PIC-1 . It comes like PIC-2 .  

 

It is not assigning values in proper column and that's one of my issue

another one is I am using Iterator to process multiple files ( more then 2) as shown in pic3 . How to use tMap and tUnite there and merge files together .

 

PIC-1

------------------+----------------+----------------.
|                     tLogRow_3                      |
|=-----------------+----------------+---------------=|
|firstname         |company         |country         |
|=-----------------+----------------+---------------=|
|A firstname from 1|A Company from 1|A country from 1|
|A firstname from 2|null            |A country from 2|
'------------------+----------------+----------------'

 

PIC-2

------------------+----------------+----------------.
|                     tLogRow_3                      |
|=-----------------+----------------+---------------=|
|firstname         |company         |country         |
|=-----------------+----------------+---------------=|
|A firstname from 1|A Company from 1|A country from 1|
|A firstname from 2|A country from 2                 |
'------------------+----------------+----------------'

PIC-3

Capture_talend.PNG

Eight Stars

Re: process multiple files with different schema

The iteration should not pose a problem, you just need to check "Append" in tFileOutputDelimited. Eventually, you need to clear the content of this file on PreJob if you don't want to keep previous values.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog