I have a column that has a value as["a1","a2.e3","b3","c4","xyz\/ss"] (including .). My requirement is to split rows each one having one value (a1, a2.e3, b3, c4, xyz/ss) based on the below with the logic
1st row having 1st value (i.e. a1), 2nd row having 2nd value (i.e a2.e3) and so on. Can any one suggest what is the regex to be used for the same in the split row column that is to be populated?
Note: The programming should calculate no. of values based on number of commas, if there are 3 commas then there are 4 values and each value should be extracted based on the commas.
There can be a max of 10 values in each column and min of 1 value, the split row should be like; if you have only 4 values the output should contain only 4 rows.
tNormalize will do this. You will probably want to clean up the string a bit with a tMap or tReplace if you want to drop the brackets and quotes from the values. I did this in a tMap as:
Feed that into tNormalize, tell it the column you want to split and the separator character, and you should be good for any number of items in the list.
After removing the brackets, I am getting the column value as
May I know what should be put on ColA's "newline" in each row to have the col values as
Row 1 : a1
Row 2 : a2.e3
Row 3 : c4
Just add a tMap after tSplitRow to filter empty rows, then use a sequence on the tMap right part to populate the output field:
"Row " + Numeric.sequence("Row", 1, 1) + ": " + row1.colA
Here is the whole job based on tNormalize as suggested by @cterenzi:
tMap to clean the record:
tNormalize to split the field ColA with separator ","
tMap to add "Row #:" before the split result:
And finally the result: