Six Stars

Need help on regex in tSplit rows

I have a column that has a value as["a1","a2.e3","b3","c4","xyz\/ss"] (including [].). My requirement is to split rows each one having one value (a1, a2.e3, b3, c4, xyz/ss) based on the below with the logic
1st row having 1st value (i.e. a1), 2nd row having 2nd value (i.e a2.e3) and so on. Can any one suggest what is the regex to be used for the same in the split row column that is to be populated?

Note: The programming should calculate no. of values based on number of commas, if there are 3 commas then there are 4 values and each value should be extracted based on the commas.
There can be a max of 10 values in each column and min of 1 value, the split row should be like; if you have only 4 values the output should contain only 4 rows.


Input :
col A
["a1","a2.e3","b3","c4, "xyz\/ss"]


output
Col A
a1
a2.e3
b3
c4
xyz/ss

  • Data Integration
Tags (1)
5 REPLIES
Nine Stars

Re: Need help on regex in tSplit rows

tNormalize will do this.  You will probably want to clean up the string a bit with a tMap or tReplace if you want to drop the brackets and quotes from the values.  I did this in a tMap as:

row1.newColumn.replaceAll("\\[","").replaceAll("]","").replaceAll("\"","")

Feed that into tNormalize, tell it the column you want to split and the separator character, and you should be good for any number of items in the list.

Six Stars

Re: Need help on regex in tSplit rows

After removing the brackets, I am getting the column value as
Col A
a1,"a2.e3","b3","c4","xyz\/ss"

 

May I know what should be put on ColA's "newline" in each row to have the col values as

Col A
Row 1 : a1
Row 2 : a2.e3
Row 3 : c4
Row 4:xyz/ss
tsplitrow.jpg

Nine Stars TRF
Nine Stars

Re: Need help on regex in tSplit rows

Just add a tMap after tSplitRow to filter empty rows, then use a sequence on the tMap right part to populate the output field:

"Row " + Numeric.sequence("Row", 1, 1) + ": " + row1.colA

 


TRF
Six Stars

Re: Need help on regex in tSplit rows

If you please don't mind, can you send me the zip file of the job which you are talking about as I find it difficult to imagine how exactly you propose the solution.


Nine Stars TRF
Nine Stars

Re: Need help on regex in tSplit rows

Here is the whole job based on tNormalize as suggested by @cterenzi:

Capture.PNG

tMap to clean the record:

Capture.PNG

tNormalize to split the field ColA with separator ","

Capture.PNG

tMap to add "Row #:" before the split result:

Capture.PNG

 

And finally the result:

Capture.PNG

 

 


TRF