XML file with carriage return

One Star

XML file with carriage return

Hi there I am trying to parse an xml file into a postgres db and everything seems to be working fine going from the tfileInputXML to the tMap and finally onto the tPostgresqlOutput the only problem is the way my xml files are set up it does not take in all the details for example here is what the xml file contains
<id>1234</id>
<name> John Smith</name>
<address> 123 Main Street
SomeTown
SomeCountry</address>
The problem is only the id 1234 the name John Smith and part of the address get entered 123 Main Street but because of the carriage return I am missing the rest of the address is there anyway to rectify this problem in Talend or will all of the xml files need to be edited to remove the carriage returns.
Sixteen Stars

Re: XML file with carriage return

I think I have heard of this before with regard to Postrgres dbs. I may be wrong. Have you tried using a tLogRow after the  tfileInputXML component to see if Talend is actually reading the full address? I suspect it will be. If it is not, this could be a Talend bug. What version are you using?
One Star

Re: XML file with carriage return

Hi there thanks for the quick response. I just set it up with a tLogRow component there and yes it does seem to take the full address after the tFileInputXML component however it still puts the 2nd line of the address onto a new line like so
1234|John Smith|123 Main Street
SomeTown
SomeCountry|

I am using Talend Open Studio v 5.1.1
Sixteen Stars

Re: XML file with carriage return

That is entirely expected (that it would keep the carriage return and/or line feed when displaying the record with the tLogRow). I suspect that this is an issue with either Postgres or the Postgres Talend component. Would it be a deal breaker to remove the carriage returns?
One Star

Re: XML file with carriage return

Hi rhall_2.0 unfortunately yes it would be a deal breaker as you put it to remove all the carriage returns as I am dealing with around 20,000 separate xml files most with the problem and only a few without judging by manual checking. I cannot seem to find a setting anywhere within talend to allow it to recognize a /n or carriage return or linefeed.
Does this mean I will need to implement another kind of solution in order to rectify this error.
Sixteen Stars

Re: XML file with carriage return

Maybe try this method. Create a routine with this method in it...
public static String removeChar(int ascii, String value, String replaceVal) {
String returnVal = null;

if(value!=null){
char asciiChar = (char)ascii;
String replaceString = ""+asciiChar;
returnVal = value.replaceAll(replaceString, replaceVal);
}
return returnVal;
}

You need to supply the ascii character number that represents the characters that are causing you an issue. You can find them here.
If you call this method in a tXMLMap (for example) for every element that you get this problem with, it will edit the string as you process it. You will therefore not have to worry about doing it manually.
You will need to try a few things out, but I think this should work.
Note: I wrote this method in this post and it is not tested. You will need to test it first.
One Star

Re: XML file with carriage return

Hi thanks for the response again, which component should I be used in order to implement this piece of code? tJava, tJavaFlex or tJavaRow? and also where in my job flow I should be placing it?
Sixteen Stars

Re: XML file with carriage return

Create a routine (under the "Code" section in your project tree) and add this method. If you create a routine called MyRoutine, then you can use this method anywhere that you wish by using the following code.....
routines.MyRoutine.removeChar(13,row1.column1, "")

The above would replace carriage returns (13) in the column called "column1" from row1 with an empty String.
I would recommend using it in a tMap or tXMLMap for the columns/entities that need to use it. If you are dealing with a carriage return and a line feed you may need to edit the method to deal with both or call it twice with different ascii numbers.
One Star

Re: XML file with carriage return

Hi thanks again I managed to work that out after a bit of googling around talend routines, I have applied it using just the carriage return value(13) and unfortunately did not work that way, going to try to edit the java code to include both a carriage return and a line feed I will return if I run into further problems and still retrieve the full data.
Sixteen Stars

Re: XML file with carriage return

As an experiment to find out what combination of characters you need to remove, you can try something like this....
routines.MyRoutine.removeChar(13,routines.MyRoutine.removeChar(10,row1.column1, ""), "")

The method returns a String so you can nest it inside another call to the same method. It will work from the inside out, so in the example above the ascii 10 chars are removed first, then the ascii 13 ones.
This may need some experimentation until you nail the characters you need to find and remove.
One Star

Re: XML file with carriage return

Hi rhall_2.0 it ended up being just the linefeed ascii value and therefor was able to fix the problem using the method you provided. Thanks very much for your help. Do you have much knowledge of the tFileList component? I can iterate through all the xml files I have in one folder by using a tFileList following onto my tfileinputxml component. i.e DataFolder1-> xml files
However the way my data is set up it comes through two folders. i.e. Home Folder->Data Folder1 -> xml files and there are 12 different data folders is there anyway to set up one or two tfilelist components to iterate from the home folder and then take the current directory? I have set it up with two tFileList components and the first to read directories of which 12 are found but when I set the second tFileList to ((String)globalMap.get("tFileList_1_CURRENT_FILEDIRECTORY")) and to retrieve files it does not seem to work as expected.
EDIT: I have just managed to rectify the problem mentioned above please disregard