Four Stars

Max value of each row across multiple columns and the corresponding name of the source column

Hi,

I have a CSV file and I need to get the max value for each row between c1, c2, c3, c4 and c5.

The maximum value must be stored in the a new column as the following example:

 

c1c2c3c4c5Max_Value
87.283.681.4838387.2
85.984828483.387.2
84.982.7868382.386
85.684.682.784.684.386.2
85.283.68283.38385.6
86.884.382.784.383.686.8
86.28482.3848787
88.184.682.384.383.688.1

 

Anyone can give me a hint or a way to achieve this ?

 

Kind regards,

 

Mohammad 

1 ACCEPTED SOLUTION

Accepted Solutions
Twelve Stars

Re: Max value of each row across multiple columns and the corresponding name of the source column

You can achieve this pretty easily using a bit of Java. 

 

Assume you have a tMap and your input columns are....

row1.newColumn

row1.newColumn1

row1.newColumn2

row1.newColumn3

row1.newColumn4

 

If you want to find the largest of these numbers for every row, create an output called "largestValue" and assign this expression....

Math.max(row1.newColumn,Math.max(row1.newColumn1,Math.max(row1.newColumn2,Math.max(row1.newColumn3,row1.newColumn4)))) 

What is happening here is that a Java Math method called "max" is being used to assess two numbers, the first and second number. In this example I am calling this method inside calls to the same method repeatedly so that the resulting value is passed on as an input to another call.

Rilhia Solutions
3 REPLIES
Twelve Stars

Re: Max value of each row across multiple columns and the corresponding name of the source column

You can achieve this pretty easily using a bit of Java. 

 

Assume you have a tMap and your input columns are....

row1.newColumn

row1.newColumn1

row1.newColumn2

row1.newColumn3

row1.newColumn4

 

If you want to find the largest of these numbers for every row, create an output called "largestValue" and assign this expression....

Math.max(row1.newColumn,Math.max(row1.newColumn1,Math.max(row1.newColumn2,Math.max(row1.newColumn3,row1.newColumn4)))) 

What is happening here is that a Java Math method called "max" is being used to assess two numbers, the first and second number. In this example I am calling this method inside calls to the same method repeatedly so that the resulting value is passed on as an input to another call.

Rilhia Solutions
Twelve Stars TRF
Twelve Stars

Re: Max value of each row across multiple columns and the corresponding name of the source column

Hi,

@rhall_2_0 got the best solution as for a limited and fixed number of value.

If you have much more values per row or a variable number of value, you need a more complete solution:

Capture.PNG

- tFixedInputRow is used as the input - if you input is a CSV file, access it using tFileInputRaw to ignore the number of fields at this stage

- tMap add a sequence number for each row (as a key)

- tReplicate replicates the flow if you don't want to access again the original data source

- tNormalize generates a key/value flow (one row for each value - here 41 rows cause I add 99 on 1rst record)

- tAggregate search forn the max value for each key

- tHashOutput memorize the result 

OnSubjobOk

- tHashInput_1 the original data unchanged but with the unique key

- tHashInput_2 to retrieve the max value for each key

- tMap to join both input using the key and add the max value as a new field

- tLogRow to print the result on the console

Starting job test at 15:01 15/11/2017.

[statistics] connecting to socket on port 3981
[statistics] connected
.--+------------------------+----.
|           tLogRow_15           |
|=-+------------------------+---=|
|id|line                    |max |
|=-+------------------------+---=|
|1 |87.2	83.6	81.4	83	83	99 |99  |
|2 |85.9	84	82	84	83.3      |85.9|
|3 |84.9	82.7	86	83	82.3    |86  |
|4 |85.6	84.6	82.7	84.6	84.3|85.6|
|5 |85.2	83.6	82	83.3	83    |85.2|
|6 |86.8	84.3	82.7	84.3	83.6|86.8|
|7 |86.2	84	82.3	84	87      |87  |
|8 |88.1	84.6	82.3	84.3	83.6|88.1|
'--+------------------------+----'

[statistics] disconnected
Job test ended at 15:01 15/11/2017. [exit code=0]

As you can see I've added the value 99 on the first record and it is find as the max value for the row 1.

Hope this helps.


TRF
Four Stars

Re: Max value of each row across multiple columns and the corresponding name of the source column

Thanks guys for your answers I appreciate it.