One Star

Import data into MongoDB - Bug with tMongoDBOutput, tExtractJSONFields

Dear friends, 
I have a problem when trying to import data from flat file into MongoDB using tMongoDBOutput, tExtractJSONFields and tWriteJSONFields, describing in the scenario as below: 
Ref: internet, you can search by writing some words in data set of my example, sorry I cannot attach an URL. 
Here is a simple example of how to read a CSV file and convert the data into JSON format.
CSV data (original data):
WDCi,Lean,KL
WDCi,Kai Herng,KL
WDCi,Walter,Sydney
WDCi,Deborah,Sydney
WDCi,Terry,US
FPT,Minh-Hieu,Hanoi
FPT,Anthony,Paris
FPT,Luis,Paris
JSON data (result wanted) in Mongo:
{
   "_id" : ObjectId("554b7fdcb42309e07933f70f"),
   "name" : "WDCi",
   "locations" :
           },
           {
               "location" : "Sydney",
               "staffs" :
           },
           {
               "location" : "US",
               "staffs" : {
                   "staff" : "Terry"
               }
           }
       ]
   }
{
   "_id" : ObjectId("554b7fdcb42309e07933f710"),
   "name" : "FPT",
   "locations" :
           }
       ]
   }

My job:

Firstly, I will use a tFileInputDelimited component to read the CSV file: 
http://blog.wdcigroup.net/wp-content/uploads/2012/07/fileschema.pngFigure 1.1
Then, I will drop the tWriteJSONField component from the palette and link the output row of the tFileInputDelimited component to it. After that, I define the schema for tWriteJSONField component. In this scenario, I create one column named “company” and set it as the output column of the component. The output row of tWriteJSONField component is linked to a tExtractJSONFields component so that I can use to insert into MongoDB later.

I configure the JSON tree structure.

I add the “company” column as the “Group By” column to the tWriteJSONField component as shown below.

After the tWrithJSONField, I work with tExtractJSONFields. Firstly, I edit the schema by adding new columns

After that, I configure the Mapping: 

Finally, it's inserting into MongoDB. I do that by adding the tMongoOutput. The basic setting of tMongoOutput is: 

In advanced setting, the configured JSON Tree:

The result when I see in MongoDB is: 
{
   "_id" : ObjectId("554b7fdcb42309e07933f70f"),
   "name" : "WDCi",
   "locations" : {
       "locations" :
           },
           {
               "location" : "Sydney",
               "staffs" :
           },
           {
               "location" : "US",
               "staffs" : {
                   "staff" : "Terry"
               }
           }
       ]
   }
}
/* 2 */
{
   "_id" : ObjectId("554b7fdcb42309e07933f710"),
   "name" : "FPT",
   "locations" : {
       "locations" :
           }
       ]
   }
}


The problem is: the array of location embedded document is included inside an "locations" element. How could I do to have the result as wanted?
Thanks very much.  
p/s: I tried to uncheck the "Get nodes" box of location in tExtractJSONFields, but the result in MongoDB showed: 
{
    "_id" : ObjectId("554b7ba4b42394748949a3e3"),
    "name" : "WDCi",
    "locations" : ""
}
{
    "_id" : ObjectId("554b7ba4b42394748949a3e4"),
    "name" : "FPT",
    "locations" : ""
}
2 REPLIES
Community Manager

Re: Import data into MongoDB - Bug with tMongoDBOutput, tExtractJSONFields

Hi 
You are able to generate directly desired result with tMongodbOutput, about the configuration of XML tree on tMongodbOutput, please see the following screenshots.
BRS
Shong

----------------------------------------------------------
Talend | Data Agility for Modern Business
Four Stars

Re: Import data into MongoDB - Bug with tMongoDBOutput, tExtractJSONFields

Dear Friends,
How to prevent the creation of empty value key in tmongodbOutput component.