One Star

convert columns to rows with no aggregation

Hi
I've got this kind of data table (Excel or csv):

name joe peter Kelly
surname smith warton hartford
age 23 28 34
occupation professor bartender manager

and I want to use the first column as names for my table fields like this:
name surname age occupation
joe smith 23 professor
peter warton 28 bartender
kelly hartford 34 manager
I've used several components but to no avail. Take into account I don't need aggregation whatsoever, just transpose and convert rows to columns using first column as header.

Thanks
21 REPLIES
Moderator

Re: convert columns to rows with no aggregation

Hi,
Could you please have a look at a custom component TalendExchange:tTurnRow to see if it is what you're looking for?
Here is a related forum with use case http://www.talendforge.org/forum/viewtopic.php?id=16973.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: convert columns to rows with no aggregation

Hi xdshi
No such component on my Talend installation. My version is 5.4.1r

Thanks for your quick response and please forgive my duplicate post due to browser error
Moderator

Re: convert columns to rows with no aggregation

Hi,
This is a custom component developed by our community user and shared on Talend Exchange, you can download it from Talend Exchange freely, read the below link to learn how to install custom component into Talend Studio.
TalendHelpCenter:Installing a custom component.
So far, there is no a proper talend component for your requirement.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: convert columns to rows with no aggregation

Hi Sabrina
TalendExchange:tTurnRow component does not work with 5.4 version

any other hint would be much appreciated

Regards
Moderator

Re: convert columns to rows with no aggregation

Hi,
This component is old, it is not compatible with the new releases of Talend Studio, and mlange do not update it!
To get this component work with the new releases of Talend Studio, you have to modify two places.
1. Modify the tTurnRow_java.xml file, add the family element, for example:
<FAMILIES>
<FAMILY>Processing</FAMILY>
</FAMILIES>
<DOCUMENTATION>
<URL/>
</DOCUMENTATION>
2. Modify the tTurnRow_messages.properties file, remove this line from the file:
FAMILY=Processing
Feel free let me know if it is OK with you.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: convert columns to rows with no aggregation

Hi Sabrina

Followed your instructions but component does not show on palette.
any other suggestion?

regards
Moderator

Re: convert columns to rows with no aggregation

Hi,
Have you checked "Troubleshooting" section in KB article TalendHelpCenter:Installing a custom component? Is it working after restarting studio?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: convert columns to rows with no aggregation

Hi Sabrina

Solved the issue deleting the Components Caché.

Thanks a lot !!!
Moderator

Re: convert columns to rows with no aggregation

Hi,
Is this custom component working well for your use case? Don't hesitate to post your issue on forum.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: convert columns to rows with no aggregation

Hi Sabrina
I'm testing my ETL with a limited set of data. (see output before and after TurnRow)
Custom component seems to work fine but......(there's always a "but") it only processed 4 columns.
Taking into account that my first colum (the future column names) has 14 records, how can I force tTurnRow to process all data?
Thanks for being so helpful

OUTPUT BEFORE COMPONENT (INPUT DATA)
.---------------------+------------------------+------------------------+------------------------.
| tLogRow_1 |
|=--------------------+------------------------+------------------------+-----------------------=|
|Column0 |Column1 |Column2 |Column3 |
|=--------------------+------------------------+------------------------+-----------------------=|
|NAS-Port-Type |Wireless-802.11 |Wireless-802.11 |Wireless-802.11 |
|Event-Timestamp |Apr 21 2014 00:00:13 UTC|Apr 21 2014 00:00:13 UTC|Apr 21 2014 00:00:13 UTC|
|Acct-Session-Id |80203e90 |80267e90 |80203e90 |
|Acct-Output-Gigawords|0 |0 |0 |
|Acct-Input-Octets |2745995 |2746795 |2745995 |
|Acct-Input-Gigawords |0 |0 |0 |
|Timestamp |1398038413 |1345668413 |1398038413 |
|User-Name |user@name.com |user@name.com |user@name.com |
|NAS-Port |2149596816 |2149596816 |2149596816 |
|Acct-Session-Time |7022 |70 |7022 |
|Acct-Status-Type |Interim-Update |stop |Interim-Update |
|Acct-Output-Packets |20912 |23342 |20912 |
|Acct-Output-Octets |17889908 |17885408 |17889908 |
|Acct-Input-Packets |19376 |19345 |19376 |
'---------------------+------------------------+------------------------+------------------------'

OUTPUT AFTER COMPONENT (OUTPUT DATA)
.---------------+------------------------+---------------+---------------------.
| tLogRow_2 |
|=--------------+------------------------+---------------+--------------------=|
|Column0 |Column1 |Column2 |Column3 |
|=--------------+------------------------+---------------+--------------------=|
|NAS-Port-Type |Event-Timestamp |Acct-Session-Id|Acct-Output-Gigawords|
|Wireless-802.11|Apr 21 2014 00:00:13 UTC|80203e90 |0 |
|Wireless-802.11|Apr 21 2014 00:00:13 UTC|80267e90 |0 |
|Wireless-802.11|Apr 21 2014 00:00:13 UTC|80203e90 |0 |
'---------------+------------------------+---------------+---------------------'
One Star

Re: convert columns to rows with no aggregation

Hi
in addition to this issue, the real production input file may have thousand columns (see that every column is a record for each field)
Maybe I'm doing it all wrong from the beginning.....

Regards
One Star

Re: convert columns to rows with no aggregation

just for the record this is my original file contents to process (maybe I'm doing it all wrong from the beginning)

Mon Apr 21 00:00:13 2014
Acct-Status-Type = Interim-Update
NAS-Port-Type = Wireless-802.11
User-Name = "user@name.com"
NAS-Port = 2149596816
Acct-Session-Id = "80203e90"
Event-Timestamp = "Apr 21 2014 00:00:13 UTC"
Acct-Input-Octets = 2745995
Acct-Output-Octets = 17889908
Acct-Input-Gigawords = 0
Acct-Output-Gigawords = 0
Acct-Input-Packets = 19376
Acct-Output-Packets = 20912
Acct-Session-Time = 7022
Timestamp = 1398038413
Mon Apr 21 00:00:14 2014
Acct-Status-Type = stop
NAS-Port-Type = Wireless-802.11
User-Name = "user@name.com"
NAS-Port = 2149596816
Acct-Session-Id = "80267e90"
Event-Timestamp = "Apr 21 2014 00:00:13 UTC"
Acct-Input-Octets = 2746795
Acct-Output-Octets = 17885408
Acct-Input-Gigawords = 0
Acct-Output-Gigawords = 0
Acct-Input-Packets = 19345
Acct-Output-Packets = 23342
Acct-Session-Time = 70
Timestamp = 1345668413
REPEATED THOUSAND TIMES........

I've processed the file this way:
fileInputDelimited ("=" as field separator key/value)
> tMap (trimming and deleting empty lines)
> tDenormalize (column: values, Delimiter: ";")
> tFileOutputDelimited

So I get this:
---------------------+---------------------------------------------------------------------------------------------------.
| tLogRow_1 |
|=--------------------+--------------------------------------------------------------------------------------------------=|
|keys |values |
|=--------------------+--------------------------------------------------------------------------------------------------=|
|NAS-Port-Type |Wireless-802.11;Wireless-802.11;Wireless-802.11;Wireless-802.11 |
|Event-Timestamp |Apr 21 2014 00:00:13 UTC;Apr 21 2014 00:00:13 UTC;Apr 21 2014 00:00:13 UTC;Apr 21 2014 00:00:13 UTC|
|Acct-Session-Id |80203e90;80267e90;80203e90;80267e90 |
|Acct-Output-Gigawords|0;0;0;0 |
|Acct-Input-Octets |2745995;2746795;2745995;2746795 |
|Acct-Input-Gigawords |0;0;0;0 |
|Timestamp |1398038413;1345668413;1398038413;1345668413 |
|User-Name |user@name.com;user@name.com;user@name.com;user@name.com |
|NAS-Port |2149596816;2149596816;2149596816;2149596816 |
|Acct-Session-Time |7022;70;7022;70 |
|Acct-Status-Type |Interim-Update;stop;Interim-Update;stop |
|Acct-Output-Packets |20912;23342;20912;23342 |
|Acct-Output-Octets |17889908;17885408;17889908;17885408 |
|Acct-Input-Packets |19376;19345;19376;19345 |
'---------------------+---------------------------------------------------------------------------------------------------'
Four Stars

Re: convert columns to rows with no aggregation

Does this solves your problem?
If not what was your expected output?
Vaibhav
One Star

Re: convert columns to rows with no aggregation

Hi sanvaibhav
This did not solve my problem.
As you can see TurnRow component just turned 4 columns but I could have thousands of them.
regards
One Star

Re: convert columns to rows with no aggregation

Hi sanvaibhav
This did not solve my problem.
As you can see TurnRow component just turned 4 columns but I could have thousands of them.
regards

Not really thousands. The actual example you can see below has got 14 rows to pivot into 14 columns. The result transposed 4 columns only.
BEFORE COMPONENT (INPUT DATA)
.---------------------+------------------------+------------------------+------------------------.
| tLogRow_1 |
|=--------------------+------------------------+------------------------+-----------------------neutral
|Column0 |Column1 |Column2 |Column3 |
|=--------------------+------------------------+------------------------+-----------------------neutral
|NAS-Port-Type |Wireless-802.11 |Wireless-802.11 |Wireless-802.11 |
|Event-Timestamp |Apr 21 2014 00:00:13 UTC|Apr 21 2014 00:00:13 UTC|Apr 21 2014 00:00:13 UTC|
|Acct-Session-Id |80203e90 |80267e90 |80203e90 |
|Acct-Output-Gigawords|0 |0 |0 |
|Acct-Input-Octets |2745995 |2746795 |2745995 |
|Acct-Input-Gigawords |0 |0 |0 |
|Timestamp |1398038413 |1345668413 |1398038413 |
|User-Name |user@name.com |user@name.com |user@name.com |
|NAS-Port |2149596816 |2149596816 |2149596816 |
|Acct-Session-Time |7022 |70 |7022 |
|Acct-Status-Type |Interim-Update |stop |Interim-Update |
|Acct-Output-Packets |20912 |23342 |20912 |
|Acct-Output-Octets |17889908 |17885408 |17889908 |
|Acct-Input-Packets |19376 |19345 |19376 |
'---------------------+------------------------+------------------------+------------------------'

OUTPUT AFTER COMPONENT (OUTPUT DATA)
.---------------+------------------------+---------------+---------------------.
| tLogRow_2 |
|=--------------+------------------------+---------------+--------------------neutral
|Column0 |Column1 |Column2 |Column3 |
|=--------------+------------------------+---------------+--------------------neutral
|NAS-Port-Type |Event-Timestamp |Acct-Session-Id|Acct-Output-Gigawords|
|Wireless-802.11|Apr 21 2014 00:00:13 UTC|80203e90 |0 |
|Wireless-802.11|Apr 21 2014 00:00:13 UTC|80267e90 |0 |
|Wireless-802.11|Apr 21 2014 00:00:13 UTC|80203e90 |0 |
'---------------+------------------------+---------------+---------------------'
Four Stars

Re: convert columns to rows with no aggregation

Whether your columns in each block are varying?
Whether the order of the column is changing?
If the answer is Yes for above, then how do you plan to arrange the data in the output? Any logical or business suggestion?
Vaibhav
One Star

Re: convert columns to rows with no aggregation

I feel like this is a complete hack, but it works if you're ok hard coding the column names. I couldn't figure out how to make it more dynamic.
tFileInputDelimited -> tJavaFlex -> tMap
tFileInputDelimited schema only has 1 column... Column0. It's set to read the file for line breaks, but doesn't parse any of the data in each row.
tJavaFlex splits the column into a key/value pair and assigns them to variables. It then compares the Key to each column name and if it's a match, assigns the value to that column.
Because it's a tJavaFlex, once a value is assigned to a column, it stays assigned through subsequent rows until it is replaced. So when it finds column1 in the first row, it keeps that value all the way through row15. You get a growing table as it finds more values. 15 lines in your original file means 15 rows of output, but only the last row has all the rows filled in.
tMap has a filter so it only shows rows where Column14 is not null.
Like I said, it's a hack and all of your key names are hard coded into the code. But it accomplishes what you wanted to do. You could then loop through all your files or if this is all 1 big file, you can do something at the top of your Main code that looks to see if Column14 != null and if that's true, reassign null to all the column rows to empty them and start over.
Main code part of tJavaFlex
if (row1.Column0.indexOf("=")!=-1)
{
String key = row1.Column0.substring(0,row1.Column0.indexOf("=")-1).trim();
String value = row1.Column0.substring(row1.Column0.indexOf("=")+2).replace("\"","").trim();

if (key.equals("Acct-Status-Type"))
{
row2.Column1 = value;
}
else if (key.equals("NAS-Port-Type"))
{
row2.Column2 = value;
}
else if (key.equals("User-Name"))
{
row2.Column3 = value;
}
else if (key.equals("NAS-Port"))
{
row2.Column4 = value;
}
else if (key.equals("Acct-Session-Id"))
{
row2.Column5 = value;
}
else if (key.equals("Event-Timestamp"))
{
row2.Column6 = value;
}
else if (key.equals("Acct-Input-Octets"))
{
row2.Column7 = value;
}
else if (key.equals("Acct-Output-Octets"))
{
row2.Column8 = value;
}
else if (key.equals("Acct-Input-Gigawords"))
{
row2.Column9 = value;
}
else if (key.equals("Acct-Output-Gigawords"))
{
row2.Column10 = value;
}
else if (key.equals("Acct-Input-Packets"))
{
row2.Column11 = value;
}
else if (key.equals("Acct-Output-Packets"))
{
row2.Column12 = value;
}
else if (key.equals("Acct-Session-Time"))
{
row2.Column13 = value;
}
else if (key.equals("Timestamp"))
{
row2.Column14 = value;
}
}
One Star

Re: convert columns to rows with no aggregation

Last post didn't show the output.
One Star

Re: convert columns to rows with no aggregation

Hi Sbaer
Thanks a lot for your suggestion but the issue is that nobody can assure me the fields are coming the same order every set of data. As sanvaibhav asked:
Whether your columns in each block are varying? Yes
Whether the order of the column is changing? Yes
My initial idea was to use TurnRow component forcing it to pivot all columns not only 4

Regards
Four Stars

Re: convert columns to rows with no aggregation

Hi,
Whether your columns in each block are varying? Yes
Whether the order of the column is changing? Yes
How do you plan to arrange the data in the output? Any logical or business suggestion?
Vaibhav
One Star

Re: convert columns to rows with no aggregation

I agree that my solution is not the best, but there has to be some structure to what is coming in. With the possible solution I provided, the names would have to be consistent but the arrangement of data doesn't matter.
As long as it always come in as "Acct-Status-Type = Interim-Update", where it shows up in the file won't matter.
I've used a similar tJavaFlex solution in my own jobs when the order and number of columns varies so greatly that I couldn't create a consistent set of metadata. By using the Java code to find the header row and loop through the header names, I was able to apply an index to each column that I wanted and ignore everything else.
But true, for something like this to work, at least the name of the column, or in your case the structure of the key=value pair, has to be consistent.