Six Stars

Merge columns of a file

Morning Guys,

 

I have a file looks like bellow and i want to merge all rows which have the same ID (first column) in a single row.

ID	Test_1	Date_test_1	Test_2	Date_test_2	Test_3	Date_test_3
1	yes	06/09/2017	 	 	 	 
4	no	 	yes	20/04/2004	 	 
4	 	01/10/2000	 	 	yes	17/05/1982
1	 	 	no	02/02/2015	 	 

 

I try to use tAggregateRow component but it does not help in my situation; because colums can contain or not data and i have not function to extract it.

I want to get an output like bellow.

 

I also see on this link : https://help.talend.com/reader/ZndcSsDNtKg8FpNIRCdjag/jPZNgHRF5ad4KUp4Mpvzkg

Which suggest to use component tSurviveFields but it does not present on Studio and not on talend Exchange

 

ID	Test_1	Date_test_1	Test_2	Date_test_2	Test_3	Date_test_3
1	yes	06/09/2017	no	02/02/2015	 	 
4	no	01/10/2000	yes	20/04/2004	yes	17/05/1982

 

Thank you for help,

  • Talend Integration Cloud
1 ACCEPTED SOLUTION

Accepted Solutions
Eleven Stars TRF
Eleven Stars

Re: Merge columns of a file

The reason is that fields are empty ("") when they come from tFileInputDelimited instead of null.

Try to include a tJavaRow after tFileInputDelimited to replace empty strings by null using this example:

output_row.Organization_Name = input_row.Organization_Name.equals("") ? null : input_row.Organization_Name.equals("")

TRF
20 REPLIES
Eleven Stars TRF
Eleven Stars

Re: Merge columns of a file

Hi,

Using a tMap with the same input file for the main row and the lookup you can do what you expect.

tSurviveFields is a DQ component (not available in DI).

 

Hope this helps.


TRF
Six Stars

Re: Merge columns of a file

Hello TRF,

I am sorry but i don't see how to do it.

The example i put here is limit to few columns but in reality i have more than 30 colums.

And concern the same ID, it can be repeat 20 times like record.

 

Thank for reply

Eleven Stars TRF
Eleven Stars

Re: Merge columns of a file

As a quick response you can start with the following design:

Capture.PNG

Both input (tFixedFlowInput) are identical and contain the following records (I know, there is only 2 fields):

100,1000
100,2000
200,
200,4000
200,5000
300,6000
400,
400,
400,7000

On the main flow, tUniqRow is used to keep only 1 record for each id value.

The same input is used as a lookup for the tMap whch is configured like this:

Capture.PNG

So, you have to repeat the operation shown for the 2nd output field for all the desired fields (this is the bad part, nothing is magic).

And finally, the result is the following:

Starting job test at 16:22 06/09/2017.

[statistics] connecting to socket on port 3779
[statistics] connected
100|1000
200|5000
300|6000
400|7000
[statistics] disconnected
Job test ended at 16:22 06/09/2017. [exit code=0]

AS you can see, if you have more than 2 records for the same id, the final value for the output fields is taken from the last record (look at id = 200).

If this is not what you expect (you want the 1st not null value as the final value), you need a more complex answer derived from this solution I've proposed for an other subject (https://stackoverflow.com/questions/46050056/talend-avoid-duplicate-external-id-with-salesforce-outp...). The idea is to iterate over all the duplicates for the same id 1 by 1 (instead of getting all the records at once), and do the same operation as proposed above. Like that, as soon as a field is populated, we will stop to try to fill it again.

 


TRF
Six Stars

Re: Merge columns of a file

Hello TRF,

Your solution on Stackoverflow is helpful because i want the 1st not null value as the final value.

I my case, i need to insert record on MySql database, so write records 1 by 1 will take many time.

So i should merge all records before to push it on MySql.

 

I am trying to adapt your job on Stackoverflow to solve my problem.

Any help will be usefull

Thanks,

 

Eleven Stars TRF
Eleven Stars

Re: Merge columns of a file

Simple solutions are often the best, and here the it seems to be tAggregateRow.

Capture.PNG

tFixedFlowInput with the data sample and the schema:

 

Capture.PNG

 

In tAggregateRow, I use "first" function with "ignore null values" option:

Capture.PNG

Here is the result:

Starting job test at 22:40 06/09/2017.

[statistics] connecting to socket on port 3413
[statistics] connected
100|1000|a
200|5000|a
300|6000|c
400|7000|a
[statistics] disconnected
Job test ended at 22:40 06/09/2017. [exit code=0]

I think that's what you need.


TRF
Six Stars

Re: Merge columns of a file

Morning TRF,

 

It is exactly what i expect, thank you very much for your help.

I know your suggestion is the solution, but i try to test it with an exemple of input file in my Job and i don't get what i expect.

 

I attach my input file into this post, and bellow the screenshot of my job.Talend_job.png

 

 

 

 

 

I show you also config of tAggregateRow component with the screenshot bellow.

I active "ignore null value" option for all fields without the 6 first one.tAggregateRow.png

 

 

 

 

 

 

 

 

 

Thank you for help

Eleven Stars TRF
Eleven Stars

Re: Merge columns of a file

Here is the result I get.

Seems OK from my point of view.

Job attached with the complete schema.

 

Edit: + the result


TRF
Eleven Stars TRF
Eleven Stars

Re: Merge columns of a file

@idembele2 let me know if the result is ok for.

If it is, don't forget to mark the topic as solved (Kudo also accepted).


TRF
Six Stars

Re: Merge columns of a file

Hello TRF,

Since my last post, i didn't get time to test your job. I do it tomorrow and let you know.

Thank you very much for help.

Six Stars

Re: Merge columns of a file

Hello TRF,

Trying to import your job in my studio, i get this messageImport_job.png

 

 

Eleven Stars TRF
Eleven Stars

Re: Merge columns of a file

Hi,

You probably use a version prior mine.

I use Talend Open Studio for Data Integration Version 6.4.1.

Let me know your version?


TRF
Six Stars

Re: Merge columns of a file

Hello TRF,

 

Yes you are right, i have a prior vesion than yours.

I use 6.3.1

 

Thanks,

Eleven Stars TRF
Eleven Stars

Re: Merge columns of a file

Here is the job redesigned with version 6.3.1.

I got the same result as with 6.4.1.


TRF
Six Stars

Re: Merge columns of a file

Hi TRF,

 

I confirm that your job work fine.

I know now why it does not work, it is very strange but i mean it regards the component tFileInputDelimited.

You know i just replace component tFixedFlowInput to tFileInputDelimited and i don't get good result.

 

I post this job here, you can check it. I use the same input file that i put in my post

And you need to change file path. 

 

Thanks,

Eleven Stars TRF
Eleven Stars

Re: Merge columns of a file

The reason is that fields are empty ("") when they come from tFileInputDelimited instead of null.

Try to include a tJavaRow after tFileInputDelimited to replace empty strings by null using this example:

output_row.Organization_Name = input_row.Organization_Name.equals("") ? null : input_row.Organization_Name.equals("")

TRF
Six Stars

Re: Merge columns of a file

Hello TRF,

 

You are a great guy, you are a professor.

Big respect and thank you very much for help.

It resolves.

Eleven Stars TRF
Eleven Stars

Re: Merge columns of a file

You're welcome.
Based on the schema for your file it seems you speak french, don't you? Where're yoy from?

TRF
Six Stars

Re: Merge columns of a file

Hello TRF,

 

Yes you saw correctly, i  leave in Paris and you ?

 

 

Highlighted
Eleven Stars TRF
Eleven Stars

Re: Merge columns of a file

Same as you Smiley Wink

TRF
Six Stars

Re: Merge columns of a file

Hello TRF,

 

Really, i am very happy to know that, so it is possible perhaps one day to meet you