unique row and converting row data to column

Highlighted
Seven Stars

unique row and converting row data to column

We have input file in following format.

 

empno    name   add

1              abc      address1

1              abc       address2

1               abc       address3

1             abc         address4

.

.

.

2           mno       address7

2           mno        address8

2           mno        address9

.

.

.

Output should be like below as first two column 's values are repeating thus

 

1  abc    address1 address2 address3 address4....

2  mno   address7 address8 address9....

.

.

.

 

Please suggest how to create file with above output.

 

Regards,

Vivek

 

 


Accepted Solutions
Eight Stars

Re: unique row and converting row data to column

Hello,

 

This is an out of memory error for uniqrow.

 

First of all, for processing 400M data you need to increase the heap space.

 

Secondly, in tUniqRow, in advanced settings, Check the option "Use Of Disk" and pick, "Buffer Size in Memory to Medium (1 Million)"

 

This will ensure only 1 Million rows are processed in Memory and rest all will be processed in disk.

 

Try to run the job with max heap space Xmx5G

 

Thanks and Regards,

Subhadip


All Replies
Employee

Re: unique row and converting row data to column

@vivek_u 

 

Hi Vivek,

 

    I believe following solution will help you.

image.png

image.png

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Seven Stars

Re: unique row and converting row data to column

Thanks.But we have multiple columns under TDenormalize component .But output is little different.

i.e.

suppose we have pincode column as well with Address.So we mentioned Address and Pincode both under 'To Denormalize' section.But output is showing all addresss and then all pincode 

i.e. address1;address2;address3;address4,54678,54890,58765,52345

 

But our requirement is to display one row complete data then next row data .... 

address1;54678;address2;54890;address3;58765;address4;52345

 

Please assist how to achieve this in talend.

 

Regards,

Vivek

 

 

Eight Stars

Re: unique row and converting row data to column

Hello Vivek,

 

Before passing the data to tDeNormalize, concat the values of Address & PinCode in a tMap.

 

The flow will be tMap ( Address+";"+PinCode ) -> tDenormalize.

 

Thanks and Regards,

Subhadip

Employee

Re: unique row and converting row data to column

Yeah. Add Pincode as the last part using a tMap and then pass the value to the denormalize.

 

I hope we have answered your query. Could you please spare a second to mark the post as resolved? Often members ignore this part when they get the solution and overlook the contribution made by authors to Talend community in between their own busy schedules :-(

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

 

 

Seven Stars

Re: unique row and converting row data to column

Thanks All.

That worked for us.However we have total more than 400 millions of records.My graph component is as per below.

TOracleInput -> Tfileoutput -> Tmap ->  Tdenormalize -> Tfileoutput 

 

This graph is not able to process 400 millions record and throwing below error.

 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at artemis_dev.test3_0_1.Test3.tOracleInput_1Process(Test3.java:4781)
at artemis_dev.test3_0_1.Test3.runJobInTOS(Test3.java:5971)

 

Above error came after processing around 200 millions records but the process is too slow.

 

Please suggest how to fix above error and how can we improve execution time.We want to tune this graph at max and need to reduce total timings.

 

Regards,

Vivek

Employee

Re: unique row and converting row data to column

Hi,

 

    Please increase the Java memory settings in run tab according to your input data flow capacity.

 

image.png

 

I would also recommend to use disk space for tMap operations. Please refer the below link for this step.

 

https://help.talend.com/reader/EJfmjmfWqXUp5sadUwoGBA/J4xg5kxhK1afr7i7rFA65w

 

Once its resolved, I humbly request you to mark the solution as closed since we have answered your initial and supplementary question. If you have any new issue, please ask it as new post instead of putting all the queries into single post there by diluting the focus.

 

Some community members often overlook this aspect once they get the solution ignoring the time spent by contributors to answer the query :-(

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Eight Stars

Re: unique row and converting row data to column

Hello,

 

This is an out of memory error for uniqrow.

 

First of all, for processing 400M data you need to increase the heap space.

 

Secondly, in tUniqRow, in advanced settings, Check the option "Use Of Disk" and pick, "Buffer Size in Memory to Medium (1 Million)"

 

This will ensure only 1 Million rows are processed in Memory and rest all will be processed in disk.

 

Try to run the job with max heap space Xmx5G

 

Thanks and Regards,

Subhadip

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog