Seven Stars

Using HTML tag on xml file from txmlmap

Hello, I have a XML file with many rows, and in my tXmlMap, I need one rows who contains html

ligne.PNG

In this rows, I use the html tag in my tXmlMap, but he stop read at the first line and talend send me error

ORA-01400: Cannot insert NULL into ("DB"."table"."column")

But, other xml file with many html rows , its working for exemple, after my <p> Hello, </p> I press enter to make a new line, its working

 

Edit: I tried to use this

StringHandling.EREPLACE(row2.html,"</p>","</p><br>") 

but nothing

 

 

Tags (1)
1 ACCEPTED SOLUTION

Accepted Solutions
Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

OK. I have a bit of a hack you can use. It is relatively convoluted, but it works. This is how you do it.

 

1) Read the data in as a String using a tFileInputRaw component.

2) The schema of the tFileInputRaw component will be a column called "content" of type String. Connect this to a tConvertType and convert it to a String.

3) Connect a tJavaFlex and use the code below.....

row11.content = row12.content.replaceAll("<p>", "").replaceAll("</p>", "").replaceAll("\\<\\?xml(.+?)\\?\\>", "").replaceAll("\\<\\?mso(.+?)\\?\\>", "").replaceAll("\\s{2,}", "").replaceAll("[^\\x20-\\x7e]", "").trim();

This removes all of the rubbish that Talend does not like and creates a reasonably well formatted piece of XML. It also removes all <p> and </p> tags.

4) Connect this output to a tConvertType and convert the String to a Document.

5) Connect to a tExtractXMLField and use the XPaths you used before.

 

This works. I've tried it with your sample file with and without unmatched <p> tags.

Rilhia Solutions
42 REPLIES
Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

I'm afraid your explanation of your problem is not very clear. Are you trying to interrogate HTML using a tXMLMap?

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

Hello ( sorry for my bad English :s )

So I have on xml file , and in this, i need the tag, so i put in my tXmlMap  but in my tLogRow , it show me nothing while my tag contains  one rows ( at the top of the page screen, this is the error returned)

And when i want to insert this data, I got error " NULL" but in my XML file ther is not null

 

Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

Can you post an example of the XML and the element that you want to return? There may be a different component you can use as I suspect the tXMLMap will not be suitable.

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

This is my xml tag who contains the html information ( I need all informations in all <P> tag row1.PNG

 

 

This is my logrow , he show me nothing

row1_logrow.PNG

 

This is my txmlmap colomns ( main row)

row1_map.PNG

 And in my tFileInputXML, in the XPath request for my html, i used //* at end for read all line, and now, he show me the rows BUT only the first  <p> tag and not all <p> tag

Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

That is because your <html> element contains only another element as far as the XMLMap is concerned. Therefore it is quite rights returning null. Try using the tExtractXMLField to get this HTML tag. You will need to use XPaths for this. You will also need to tick the Get Nodes" box for the column you want to receive the value in.

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

Oh okey, I see, but how i should use the ExtractXMLfield ? because I got " Error on line1 of document, Nested exception"


rhall_2_0 wrote:

That is because your <html> element contains only another element as far as the XMLMap is concerned. Therefore it is quite rights returning null. Try using the tExtractXMLField to get this HTML tag. You will need to use XPaths for this. You will also need to tick the Get Nodes" box for the column you want to receive the value in.


 

Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

You need to be working with a Document. If it is a String, you need to convert it to a Document using a tConvertType component. If you are using HTML and NOT an XML Document, it won't work in many cases. It MUST be XML

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

So i'm using a tHttpRequest to get my file and tFileInputXML  for read it .

helpos.PNG

I replace my tFileInputXml by the tExtractXMLField ?

Seven Stars

Re: Using HTML tag on xml file from txmlmap

Problem is <p> tag

Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

You're not using XML. XML and HTML are similar but are not interchangeable. If you want to parse HTML, you will need to use third party Java APIs. I have written a tutorial on how this can be done here: https://www.rilhia.com/tutorials/using-third-party-java-library-scrape-content-table-web-page

 

 

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

when I press space between

<my:Description><html xmlns="http://www.w3.org/1999/xhtml" xml:space="preserve"> <p>​

my html tag and <p> tag its work fine, but the file comes with no space and I can't edit file (The job insert the data in db)

 

Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

What error are you getting? The actual error in the output window in full.

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

Ther is my xml file

Seven Stars

Re: Using HTML tag on xml file from txmlmap

nothing.PNGThis is from normal xml file without the space between  ><p>

 

and ok.PNG

this is when i put space between ><p>

Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

This is a nasty file to work with. I don't believe it is well formed at all....but then Microsoft seldom conform to standards. The "my" is not bound....was the file you sent me edited? Anyway, after removing the broken XML I was able to get the following XPath to work with or without a space between html and <p> ....

 

"./DATA/Description/html/p"

 

I was using just a tFileInputXML and had "/WebFORM" as the Loop XPath query.

 

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

Hi and thx for reply, so yes I upload a new file, sorry I edited the old file.

 

I don't use the XPath "./DATA/Description/html/p" because I have other XML file with many tag like this and its show me only the first <p> tag , "Bonjour," and not all tag, and I used "./DATA/Description//*"  to read all line in description

Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

Using your new file (it is still missing a namespace for "my", but I added a dummy one), I was able to return .....

<html xmlns="http://www.w3.org/1999/xhtml" xml:space="preserve"><p>​Bonjour,</p>
<p>Je vous remercie de bien vouloir renommer le répertoire Solidarités par Actions Solidaires qui se trouve sur H, direction des Solidarités et de la Santé Publique.</p>
<p>Cordialement</p>
<p>Stest </p></html>

....using an XPath of "./mySmiley Very HappyATA/mySmiley Very Happyescription/html" and keeping "Get Nodes" ticked. By unticking "Get Nodes" I get .....

Bonjour,
Je vous remercie de bien vouloir renommer le répertoire Solidarités par Actions Solidaires qui se trouve sur H, direction des Solidarités et de la Santé Publique.
Cordialement
Stest

Is this not what you want? 

Rilhia Solutions
Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

Did the above work?

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

Hey, sorry for late reply ( time difference maybe Cat Happy )  So I tried again the XPath

 "/my:DATA/my:Description/html"

But it still return me all the html in my xml file, its Ok

 

But ther is not returning html on file with only one <p> tag like this file !

error2.PNG

Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

I don't understand what you are saying as there is ambiguity in what you're telling me and no examples given. This is likely because of a language barrier (....and your English is much better than my ....any other language :-) ). So, lets try this. I'll write out what I think you are asking and would you post exactly the XML you are trying and failing with. You say it doesn't work with ONE <p>, but I believe every piece of XML you have sent has had TWO <p>. 

 

Do you want the text between the <html> tags with NO html code OR do you want all of the text and code?

 

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

Yea, sorry, i'm very bad to explain hard problem Man Sad

 

So, Its doesnt work with only one <p> tag like this xml file, but its working for 2 or more.

I'm using this XPath query

"my:DATA/my:Description/html"

I just want the data between html code , because I can have <span> into <p> tag or <table>  , and I insert all data into Oracle DB

 

Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

OK, this is because the XML parser assumes that the <html> and <p> are part of the XML. Since there is only 1 <p>, it sees the XML as broken. This is actually badly formatted XML. XML or HTML within XML should be held in a <![CDATA[ section or maybe encoded to base64.

 

Do you have any control over the content of these XML files? If so, ensure that ALL opening tags have closing tags. If not, read the file in as a String, search for <p> and </p>, then remove them. After that convert the String to XML using a tConvertType component and process using the XPath you have.

 

I know this sounds like a pain, but XML parsers are quite strict with this. Microsoft are bending the rules again.....

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

So yea its badly formated, I can't have acces because I'm using tSoap and get the file on Microsoft SharePoint...

How can I remove the <p> ?

Because I have tried to use on my tXMLmap

StringHandling.EREPLACE(row2.html,"<p>","<p></p><p>"

and

StringHandling.EREPLACE(row2.html,"</p>","<br/></p>") 

But its stil same problem and don't show the html

 

Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

OK. I have a bit of a hack you can use. It is relatively convoluted, but it works. This is how you do it.

 

1) Read the data in as a String using a tFileInputRaw component.

2) The schema of the tFileInputRaw component will be a column called "content" of type String. Connect this to a tConvertType and convert it to a String.

3) Connect a tJavaFlex and use the code below.....

row11.content = row12.content.replaceAll("<p>", "").replaceAll("</p>", "").replaceAll("\\<\\?xml(.+?)\\?\\>", "").replaceAll("\\<\\?mso(.+?)\\?\\>", "").replaceAll("\\s{2,}", "").replaceAll("[^\\x20-\\x7e]", "").trim();

This removes all of the rubbish that Talend does not like and creates a reasonably well formatted piece of XML. It also removes all <p> and </p> tags.

4) Connect this output to a tConvertType and convert the String to a Document.

5) Connect to a tExtractXMLField and use the XPaths you used before.

 

This works. I've tried it with your sample file with and without unmatched <p> tags.

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

I'm sorry I'm lost Man Sad

Do you have a screen of the job you'have create ? ^^'

Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

The file is read by the tFileInputRaw. The code goes in the tJavaFlex. The rest is explained by my last post.
XMLJob.png

Rilhia Solutions
Seven Stars

Re: Using HTML tag on xml file from txmlmap

teet.PNG

I got error on tJavaFlex

Exception in component tJavaFlex_1
java.lang.NullPointerException
Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

You tConvertType is the problem. Either tick "Auto Cast" (catches me out all the time) or configure the manual cast. 

Rilhia Solutions
Fifteen Stars

Re: Using HTML tag on xml file from txmlmap

Did this work for you?

Rilhia Solutions