Six Stars

How to convert multiple rows in one JSON Array

Hi,

 

from a row link, (with, let's say, a schema consisting in three fields: field1, field2 and field3), to a JSON array, like this one:

[{"field1":"a1","field2":"a2","field3":"a3"},{"field1":"b1","field2":"b2","field3":"b3"},{"field1":"c1","field2":"c2","field3":"c3"},{"field1":"d1","field2":"d2","field3":"d3"},...]

 

Any idea?

tWriteJSONField does not seem enough, right? 

 

Thanks in advance, best regards,

Lorenzo

 

Tags (1)
1 ACCEPTED SOLUTION

Accepted Solutions
Seven Stars

Re: How to convert multiple rows in one JSON Array

Cool. That makes your requirements much clearer.

 

First, you'll need to inject a field in your recordset with a constant value to group by later on.

 

If you have a tMap anywhere between the tPostgresqlInput and the tWriteJSONField component, then just add the new field in there and give it a constant literal value (.e.g "x"), but if not then add a tMap to do this:

 

tMap.png

 

Next configure the tWriteJSONField component with a dummy looping element:

 

tWriteJSONField.png

 

Now update the output schema for tWriteJSONFields so it only has one field:

 

tWriteJSONField3.png

 

At this point your output will include the "dummy_loop", which you don't want, so just group by the looping field to get rid of this, and tick "Remove root node":

 

tWriteJSONField2.png

 

Your final output will then be a single string JSON array:

 

Output.png

 

 

8 REPLIES
Six Stars

Re: How to convert multiple rows in one JSON Array

I would also add that, maybe, the feature "group by" in the tWriteJSONField could be useful for my purpose, BUT it is absolutely not clearly explained. 

Could you please explain? and also, please, what does the 'loop element' (the one I'm forced to setup, but that apparently has no any effect) and the 'group element' mean?

 

Sometime, dealing with Talend Components is quite frustrating. Very basic use-case, features not well described...

 

Thank you,

Lorenzo

Seven Stars

Re: How to convert multiple rows in one JSON Array

Hi @Lorenzo,

 

Take a look at tFileOutputJSON, which does what I think you're looking for, if you tick "Generate an array json".

 

As the output from this component (and indeed anything generating JSON) is a single string, I'm guessing that you'll not be doing any processing on this after the JSON has been generated, but if necessary, you could just read the file back in using tFileInputRaw with the default "Read the file as a string" option, which will give you the JSON as a string.

 

Regards,

 

 

Chris

Six Stars

Re: How to convert multiple rows in one JSON Array

Hi @ciw1973

 

I select rows from a DB, not from File.

 

 

Seven Stars

Re: How to convert multiple rows in one JSON Array

Yup. You feed the rows from the component which is querying the database into the tFileOutputJSON component to generate your JSON.

 

Quick example here, although for speed I'm using a tRowGenerator instead of a database input:

 

Talend JSON from Rows.pngTalend JSON from Rows 2.png

Six Stars

Re: How to convert multiple rows in one JSON Array

Hi @ciw1973

 

thanks for your quick reply. It seems to me a workaround, and, most important, a waste of resources. I have to load an huge amount of rows, that means this load of data will pass through a very twisted path: DB --a--> TalendJobServer --b--> FileSystem --c--> TalendJobServer --d--> DB

where 'b' and 'c' seem redundant.

 

Btw, thanks for your help.

 

Regards,

Lorenzo

Seven Stars

Re: How to convert multiple rows in one JSON Array

Yeah, it definitely feels like there should be a way to route the output from tFileXXX components through the job without having to use the file system as temporary storage, but in practice, the issue for even modest data sets is one of memory usage.

 

Whilst Talend jobs will often handle large sets of records, they're processed in small batches as rows, and having e.g. 1,000 rows in memory at once will rarely be a problem. Scale that up to millions of records, with hundreds of columns, and keeping the resultant JSON string in memory just isn't going to be practical.

 

By streaming the output to disk, we ensure that as long as there is enough disk space available (and disk space is cheap and with SSDs and caching, pretty fast these days) your job will scale to any size. Streaming data in this way is all but essential when implementing robust enterprise system integrations.

 

Bearing in mind the above, if you still want to take the file system out of the equation (although you'll still need the same components in the Talend job) then just use an in memory file system/RAM disk like /dev/shm on Linux or ImDisk for Windows, but keep a close eye on your memory usage. Exceed the available memory, and your OS will page to disk anyway, which will be much slower than just writing everything to disk in the first place. In all likelihood, this will manifest as your unattended job which has been working fine for months, suddenly taking ten times longer to execute.

Six Stars

Re: How to convert multiple rows in one JSON Array

Hi @ciw1973

thanks for your explanation, it make sense and helps to be more experienced, I appreciate.

 

Since we have to send data to a Flume endpoint (REST call) we will probably go through a small size of records (1000-5000) for each loop. We are then managing select from DB by bunches of... let's say 1000 records. And since the number of columns is quite low, we'd like to avoid the filesystem (or, at least, I would avoid it, as a develop challenge), even because our job server has up to 512GByte RAM.

 

So, could you please suggest ideas to design without writing/reading from file?

 

I have also to say that the Json structure we need has some nested levels, so probably the tFileOutputJson is not useful, because it requires an already nested structure in input (right?), and we actaully building the nested structure in this way:

tPostgresqlInput --> tWriteJSONField (with a nested structured defined in JSON Tree) --> ....

and then, in your solution (slightly changed) it should go to a tFileOutputDelimited.

But.. I would use the flow without writing file, and the first problem I face is that tWriteJSONField convert to JSON row by row ( {"aaa":"111","bbb":"222",...} ), without providing a JSON array ( [{"aaa":"111","bbb":"222",...},{"aaa":"111","bbb":"222",...}] )

 

So, the basic and main question (from the beginning) is:

how to convert (without going through filesystem) a resultset (a very common flow in Talend) in a json array like [{"field1":"value1","field2":"value2"},{"field1":"value3","field2":"value4"},...] ?

 

Tia,

Lorenzo

 

 

 

 

 

 

Seven Stars

Re: How to convert multiple rows in one JSON Array

Cool. That makes your requirements much clearer.

 

First, you'll need to inject a field in your recordset with a constant value to group by later on.

 

If you have a tMap anywhere between the tPostgresqlInput and the tWriteJSONField component, then just add the new field in there and give it a constant literal value (.e.g "x"), but if not then add a tMap to do this:

 

tMap.png

 

Next configure the tWriteJSONField component with a dummy looping element:

 

tWriteJSONField.png

 

Now update the output schema for tWriteJSONFields so it only has one field:

 

tWriteJSONField3.png

 

At this point your output will include the "dummy_loop", which you don't want, so just group by the looping field to get rid of this, and tick "Remove root node":

 

tWriteJSONField2.png

 

Your final output will then be a single string JSON array:

 

Output.png