Four Stars PaS
Four Stars

Serialize XML to String

Hi Guys,


I try to parse a XML document with the tXMLFileINput and tXMLMap component to a flat data structure which I need for a REST Request.

Here´s an example what I would like to achive:

 

My Input (demo data):

 

<?xml version="1.0" encoding="UTF-8"?>

<Customers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="customer4.xsd">
	<Customer id="1">
		<CustomerName>Griffith Paving and Sealcoatin</CustomerName>
		<CustomerAdresses>
			<CustomerAddress>talend apres 91</CustomerAddress>
			<CustomerAddress>511 Maple Ave. Apt. 1B</CustomerAddress>
			<CustomerAddress>1799 Rosemary Way</CustomerAddress>
			<CustomerAddress>1859 Green Bay Rd.1</CustomerAddress>
		</CustomerAdresses>
		<LabelState>Connecticut</LabelState>
		<RegTime>03-11-2006</RegTime>
		<Fresh>67852.0</Fresh>
		<Frozen>61521.4852</Frozen>
	</Customer>
	<Customer id="2">
		<CustomerName>Bill's Dive Shop</CustomerName>
		<CustomerAdresses>
			<CustomerAddress>310 Walker Ave.</CustomerAddress>
			<CustomerAddress>844 Spruce St.</CustomerAddress>
			<CustomerAddress>965 Marion Place Apt. 65C</CustomerAddress>
			<CustomerAddress>511 Hill</CustomerAddress>
		</CustomerAdresses>
		<LabelState>zona</LabelState>
		<RegTime>19-11-2004</RegTime>
		<Fresh>88792.0</Fresh>
		<Frozen>15434.1</Frozen>
	</Customer>

</Customers>

And this is what I would like to achive - a short 2-column schema:

 

Name | Addresses
Griffith Paving and Sealcoatin | talend apres 91, 511 Maple Ave. Apt. 1B, 1799 Rosemary Way, 1859 Green Bay Rd.1
Bill's Dive Shop | 310 Walker Ave., 844 Spruce St., 965 Marion Place Apt. 65C, 511 Hill

So the addresses should be serialized as a string in a text field, separated by comma. I´ve already set up this example https://makina-corpus.com/blog/metier/2014/advanced-xml-transformation-2013-part-1-xml-to-table-rows but I´ve no idea how two join the separated addresses and serialize them.

 

Maybe somebody has a hint for me?? Smiley Happy

 

Thanks in advance & best

  • Data Integration
2 REPLIES
Eleven Stars

Re: Serialize XML to String

There are many ways to deal with this problem. This is my preferred way as it (in my opinion) allows you to extend in many ways. It also helps you get to grips with some cool (but often unknown) functionality of a tMap. 

 

First of all you need to get your rows. It looks like you have done that. Next, you need to concatenate the addresses. To do this you can use tMap variables. The cool thing about these is that they store values between rows and are processed from top to bottom. So in essence, what you will be doing is simply appending the address section from each row, as the rows pass through. Easy.

 

But you will need to know when you need to create a new address. I am assuming your data will come through with multiple records holding components of multiple addresses. To do this you will need to make sure your data is grouped by  record key and the order of address sections is correct. Once this is sorted (tSortRow) you can make use of the technique I use here (https://www.rilhia.com/quicktips/quick-tip-compare-row-value-against-value-previous-row) to know when I have a section. This is not exactly what you are trying to achieve, but it uses the technique of finding the last row of a group and then resetting the tMap variable at the beginning of a new group.

 

The other thing to consider with this is that if your address is made up of 4 rows, you will output 4 rows with the complete address arriving in the last row of the group. To handle that you can use a tAggregateRow to group by your address key and use the "Last" function to ensure you only get the complete address for the group.

 

As I said, there are other ways to achieve this, but this is easy to extend and will also introduce you to hidden functionality that is massively useful. Hope it helps

Rilhia Solutions
Four Stars PaS
Four Stars

Re: Serialize XML to String

Hello!

 

Thanks for the detailed explanation. I will try to follow your suggested solution. Just as a short explanation, here are the two data flows I have at the moment extracted from the XML shown before:

 

.--+-------------------------.
|         tLogRow_4          |
|=-+------------------------=|
|id|CustomerAddress          |
|=-+------------------------=|
|1 |talend apres 91          |
|1 |511 Maple Ave. Apt. 1B   |
|1 |1799 Rosemary Way        |
|1 |1859 Green Bay Rd.1      |
|2 |310 Walker Ave.          |
|2 |844 Spruce St.           |
|2 |965 Marion Place Apt. 65C|
|2 |511 Hill                 |
'--+-------------------------'

.--+------------------------------+-----------+----------+-------+---------.
|                                tLogRow_5                                 |
|=-+------------------------------+-----------+----------+-------+--------=|
|id|CustomerName                  |LabelState |RegTime   |Fresh  |Frozen   |
|=-+------------------------------+-----------+----------+-------+--------=|
|1 |Griffith Paving and Sealcoatin|Connecticut|03-11-2006|67852.0|61521.484|
|2 |Bill's Dive Shop              |zona       |19-11-2004|88792.0|15434.1  |
'--+------------------------------+-----------+----------+-------+---------'

Thanks & best