One Star

how to find and replace in xml

Hi,
I need to convert some XML files but I cannot figure out how.
The target will be the same XML except that for a particular element down in the tree, content of the element would be replaced with by some other string.
<? version="1.0" encoding="UTF-8"?>
<ADI>
<Metadata>
<AMS Product="MOD" Asset_Name="xxx" Description="yyy" Provider_ID="zzz" Creation_Date="2012-06-21" Asset_ID="0001" Version_Major="6" Version_Minor="0" />
<App_Data Value="xxx" Name="Metadata_Spec_Version" App="MOD"/>
<App_Data Value="" Name="Provider_Content_Tier" App="MOD"/>
</Metadata>
<Asset>
<Metadata>
<AMS Product="MOD" Asset_Name="xxx" Description="ppp" Provider_ID="01" Creation_Date="2012-06-21" Asset_ID="0001" Version_Major="6" Version_Minor="0" />
<App_Data Value="xxx" Name="Metadata_Spec_Version" App="MOD"/>
<App_Data ... />
<App_Data ... />
<App_Data ... />
...
</Metadata>
<Asset>
<Metadata>
<AMS Product="MOD" Asset_Name="xxx" Description="ppp" Provider_ID="01" Creation_Date="2012-06-21" Asset_ID="0001" Version_Major="6" Version_Minor="0" />
<App_Data ... />
<App_Data ... />
...
</Metadata>
<Content Value="aaa.mpg"/>
</Asset>
<Asset>
<Metadata>
<AMS Product="MOD" Asset_Name="yyy" Description="qqq" Provider_ID="02" Creation_Date="2012-06-21" Asset_ID="0002" Version_Major="6" Version_Minor="0" />
<App_Data ... />
...
</Metadata>
<Content Value="bbb.mpg"/>
</Asset>
<Asset>
<Metadata>
<AMS Product="MOD" Asset_Name="zzz" Description="rrr" Provider_ID="03" Creation_Date="2012-06-21" Asset_ID="0003" Version_Major="6" Version_Minor="0" />
<App_Data ... />
...
</Metadata>
<Content Value="ccc.jpg"/>
</Asset>
</Asset>
</ADI>

the target XML file will be all same except "aaa.mpg", "bbb.mpg" and "ccc.jpg" will be replaced by say "new_aaa.mpg", "new_bbb.mpg", etc.
Note that in the structure of the XML, Asset can be defined recursively.
I first thought of reading XML line by line and doing a String find/replace but this is not a robust solution as the file in question is XML and the "find string" can theoretically appear anywhere in the file.
so will that be something like?
tFileInputXML --> tXMLMap --> tFileOutputCSV
4 REPLIES
One Star

Re: how to find and replace in xml

Hi
If you think simply replacement is not a robust solution, the best way is to create a job as seen below.
tFileInputXML-->tReplace-->tAdvancedOutputXML
Use treplace to replace strings in the specified column.
Then use tAdvancedOutuputXML to recreate this xml file.
Regards,
Pedro
One Star

Re: how to find and replace in xml

Hi Pedro,
Thank you very much for the answer.
Can I ask something more to understand things better.
In my proposed solution, at the first tFileInputXML, I was able to extract the whole root element into a "Document" type column and pass it to tXMLMap. After your answer, I understand that this is going to incur a second XML parse which is inefficient, but do you think, it is possible to do a find/replace with tXMLMap?
I am trying to understand the use cases for tXMLMap, as my case above will be doing something more complex in future. It will need to replace the value of a specific element by calculating it from sibling elements or from attributes of its parent element. Would that be practical to use a tXMLMap in those cases?
One Star

Re: how to find and replace in xml

Hi
You can use tXMLMap.
Just add tReplace between tFileInputXML and tXMLMap.
Regards,
Pedro
One Star

Re: how to find and replace in xml

Thank you Pedro but I think my problem is not about the replace bit as I think I could do it within tXML as well by using an itermediate Variable. After spending some time on the components I think it is more about handling a recursive XML structure in Talend. I will come up with a new topic on that.
Cheers.