Combining data from delimited files to produce xml doc for posting to solr

Highlighted
Four Stars

Combining data from delimited files to produce xml doc for posting to solr

Hi All,

New to Talend

I am attempting to combine 2 delimited files to produce a document that can be submitted to solr
My inputs are approx. as follows (simplified, but hopefully communicates the requirement)
I would eventually expect this requirement to increase to cover more than just x2 delimited files

prod.csv
prod_id  name

sku.csv
sku_id  prod_id  name

For each row in prod.csv, I would like to generate a document structure that can be posted to solr in the following format

<doc>
  <field name="prod_uid">prod.csv_prod_id</field>
  <field name="prod_skus">[sku.csv.sku_id, sku.csv.sku_id,sku.csv.sku_id,]</field>
</doc>

 

I have spent time trying to work with tMAP and tXMLMap (and various other components) and whilst I have got somewhere, I typically end up posting (viatRESTClient) individual documents for every product+sku combination, rather than grouping the sku_ids in a single element for a product
There are lots of examples in the forum, but nothing I have been table to take to solve this (what I am assuming is) simple problem


All help welcomed, thanks

Nine Stars

Re: Combining data from delimited files to produce xml doc for posting to solr

The key component you need here is tAggregateRow.  If you group by prod_id and aggregate the sku_ids into a list it should be straightforward:

Here is my test input and output:

Starting job TestXmlArray at 00:17 07/07/2019.

[statistics] connecting to socket on port 3925
[statistics] connected
.--------------------.
|   #1. tLogRow_1    |
+-----------+--------+
| key       | value  |
+-----------+--------+
| prod_id   | 1      |
| prod_name | Orange |
+-----------+--------+

.-------------------.
|   #2. tLogRow_1   |
+-----------+-------+
| key       | value |
+-----------+-------+
| prod_id   | 2     |
| prod_name | Apple |
+-----------+-------+

.-------------------.
|   #3. tLogRow_1   |
+-----------+-------+
| key       | value |
+-----------+-------+
| prod_id   | 3     |
| prod_name | Pear  |
+-----------+-------+

.-----------------.
|  #1. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 10    |
| prod_id | 1     |
+---------+-------+

.-----------------.
|  #2. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 21    |
| prod_id | 2     |
+---------+-------+

.-----------------.
|  #3. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 22    |
| prod_id | 2     |
+---------+-------+

.-----------------.
|  #4. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 31    |
| prod_id | 3     |
+---------+-------+

.-----------------.
|  #5. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 32    |
| prod_id | 3     |
+---------+-------+

.-----------------.
|  #6. tLogRow_3  |
+---------+-------+
| key     | value |
+---------+-------+
| sku_id  | 33    |
| prod_id | 3     |
+---------+-------+

.--------------------------------------------------------------------------------------------------------------------------------------------------------.
|                                                                     #1. tLogRow_2                                                                      |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------+
| key | value                                                                                                                                            |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------+
| xml | <?xml version="1.0" encoding="ISO-8859-15"?>

<doc>
  <prod_id>1</prod_id>
  <prod_name>Orange</prod_name>
  <prod_skus>[10]</prod_skus>
</doc>
 |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------+

.-----------------------------------------------------------------------------------------------------------------------------------------------------------.
|                                                                       #2. tLogRow_2                                                                       |
+-----+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| key | value                                                                                                                                               |
+-----+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| xml | <?xml version="1.0" encoding="ISO-8859-15"?>

<doc>
  <prod_id>2</prod_id>
  <prod_name>Apple</prod_name>
  <prod_skus>[21, 22]</prod_skus>
</doc>
 |
+-----+-----------------------------------------------------------------------------------------------------------------------------------------------------+

.--------------------------------------------------------------------------------------------------------------------------------------------------------------.
|                                                                        #3. tLogRow_2                                                                         |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| key | value                                                                                                                                                  |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| xml | <?xml version="1.0" encoding="ISO-8859-15"?>

<doc>
  <prod_id>3</prod_id>
  <prod_name>Pear</prod_name>
  <prod_skus>[31, 32, 33]</prod_skus>
</doc>
 |
+-----+--------------------------------------------------------------------------------------------------------------------------------------------------------+

[statistics] disconnected

Job TestXmlArray ended at 00:17 07/07/2019. [exit code=0]

See attached screenshot of the test job with the details of the tAggregateRow.  prod_sku_ids is of type List in the schema.

xml-array.png

In the tMap you do an inner join on the prod id:
xml-array-map.png

--
Please give Kudos and mark topics as solved where appropriate.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 3

Read about some useful Context Variable ideas

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog