[resolved] Ignoring repeated delimiters

One Star

[resolved] Ignoring repeated delimiters

HI all,
(I must be having a "bad google search day" because I can't find anything that I'm looking for but I don't believe that I'm the first person with the following requirement !).
I have a file containing data of the following sort:
abc 100 def
ab 97 x
I'm trying to split each record into three fields using tExtractDelimitedFields using a blank space as the delimiter. However, where there are multiple spaces (i.e. delimiters) between two consecutive fields then my output is not what I want.
Using the example above I'll get the following output (as displayed by tLogRow):
|abc|100|def
|ab|||97|||x
Note that in the second record there are two output columns between the 'ab' and the '97', I only want one.
MS Excel has a useful "treat multiple delimiters as one" option. Does anything like that exist for the tExtractDelimitedFields component ?
Is there something better to use ?
I'm using TOS 3.2.3.r35442 generating Java code.
Cheers,
Dave

Accepted Solutions
Community Manager

Re: [resolved] Ignoring repeated delimiters

Hello
Need to create a custom code to delete the Redundant " ".
in.csv:

abc 100 def
ab 97 x
ab 33 s

Go to repository-->routines and create a new routine, call f10515
// template routine Java
package routines;
public class f10515 {
public static String deleteRedundantSpace(String line) {
line = line.trim();
String newLine = "";
if (line.contains(" ")) {
String[] s = line.split(" ");
for (int i = 0; i < s.length; i++) {
if (!s.equals("")) {
newLine = newLine + " " + s;
}
}
}
return newLine.trim();
}
}

Result:
Starting job forum10515 at 14:23 08/04/2010.
connecting to socket on port 3948
connected
.---------+----------+----------.
| tLogRow_1 |
|=--------+----------+---------=|
|newColumn|newColumn1|newColumn2|
|=--------+----------+---------=|
|abc |100 |def |
|ab |97 |x |
|ab |33 |s |
'---------+----------+----------'
disconnected
Job forum10515 ended at 14:23 08/04/2010.

Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business

All Replies
Community Manager

Re: [resolved] Ignoring repeated delimiters

Hello
Need to create a custom code to delete the Redundant " ".
in.csv:

abc 100 def
ab 97 x
ab 33 s

Go to repository-->routines and create a new routine, call f10515
// template routine Java
package routines;
public class f10515 {
public static String deleteRedundantSpace(String line) {
line = line.trim();
String newLine = "";
if (line.contains(" ")) {
String[] s = line.split(" ");
for (int i = 0; i < s.length; i++) {
if (!s.equals("")) {
newLine = newLine + " " + s;
}
}
}
return newLine.trim();
}
}

Result:
Starting job forum10515 at 14:23 08/04/2010.
connecting to socket on port 3948
connected
.---------+----------+----------.
| tLogRow_1 |
|=--------+----------+---------=|
|newColumn|newColumn1|newColumn2|
|=--------+----------+---------=|
|abc |100 |def |
|ab |97 |x |
|ab |33 |s |
'---------+----------+----------'
disconnected
Job forum10515 ended at 14:23 08/04/2010.

Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] Ignoring repeated delimiters

Hi shong,
Many thanks for that, it would have taken me a long time to get to that idea.
Regards,
Dave
P.S. Sorry for the delayed response, I've been onsite with a customer.
One Star

Re: [resolved] Ignoring repeated delimiters

I just did this by specifying my Field Separator as "\\s+" telling the regex to match one or more whitespace characters.
Is there any way to get the final field to grab whatever is left? My final field is an error string of variable length which includes many spaces between the words in the error. I do not want to define a large number of fields and then splice them back together.
One Star

Re: [resolved] Ignoring repeated delimiters

Hi,
I have a column value in excel with multiple delimiters. I was to split string based on specific delimiter.
Example: Below value in one column
AAA.BBB_B.CCC
AAA.BBB.CCC
AAA.BBB_B
AAA..CCC
Result:
Column1|Column2|Column3
AAA|BBB_B|CCC
AAA|BBB|CCC
AAA|BBB_B
AAA||CCC
Regards,
Sathiyapriya