One Star

Parse file containing multiple JSON documents

I need some help devising a strategy to parse JSON docs within a talend job (Java job, not Perl). I am using Talend Version: 5.0.2 and developing on a Mac, planning to run on a Linux box.
Unfortunately, I cannot use the tFileInputJSON component because of the format of my files -- each file contains several hundred JSON docs, with a complete JSON doc taking up one line in the file. I think the right solution is to read the file line by line then pass it into a JSON parser and from there send the results to the rest of the job.
As I see it my options are:
a) send the line input to some sort of Java JSON parser. If that's the strategy I need to take, I'd like some advice on how to deal with the output and passing that output into my tmap/other parts of the job.
b) find a Talend component that parses JSON docs, but doesn't require an input of a file with a single valid JSON format.
I've searched around for this component but can't seem to find it. From my search, it seems even the tFileInputJSON component is relatively new.
Anyone have some advice on where I should turn next?
Thanks in advance.

This post closely mirrors a previous, unanswered post: http://www.talendforge.org/forum/viewtopic.php?id=18291
9 REPLIES
One Star

Re: Parse file containing multiple JSON documents

Hi
TOS doesn't support a JSON component like tExtractXMLField.
For any new feature or new component, please report it on BugTracker.
So the workaround is to create a job as follows.
No.1: Read line by line from this file.
No.2: Save current line into a new delimited file called temp.txt.
No.3: Use tFileInputJson to extract temp.txt and do your job logic.
No.4: Use tFileDelete to delete temp.txt and start a new loop for next line.
Regards,
Pedro
One Star

Re: Parse file containing multiple JSON documents

Thanks pedro -- I think that's going to have to work for now. I've also opened a question at stack overflow for anyone who'd like to follow what's going on there. Will keep you posted if we end up making a component/routine. Also filed a request!
http://stackoverflow.com/questions/10003100/json-parser-for-talend
One Star

Re: Parse file containing multiple JSON documents

Alright pedro -- I am about to share with you how bad I am with talend...
How do I accomplish No. 1 and No. 2 on your list? I know tFileInputFullRow reads line by line, but I am having trouble getting it to write a single one of those rows. It seems to read each line -- then write each line. So if I have a two line file, I cannot figure out how to split one line off to write.
Care to give me another push?
Thanks!
One Star

Re: Parse file containing multiple JSON documents

Hi
You can create a job as the following images.
Regards,
Pedro
One Star

Re: Parse file containing multiple JSON documents

Pedro! Thanks so much. Learned a ton from your example and got it to work. Really really appreciate it.
One Star

Re: Parse file containing multiple JSON documents

Hi,
Can any one help me with this!
Iam new to talend, I want to load data from mongodb source. In basic settings I have option of edit schema, how to specify schema for nested document in mongodb??
Thanks in advance
Moderator

Re: Parse file containing multiple JSON documents

Hi snigdha224,
A schema is a row description, i.e. it defines the number of fields that will be processed and passed on to the next component.
I have replied your related forum Forum 29488.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Parse file containing multiple JSON documents

Hello pedro - I'm not seeing any images in your post dtd:  2012-04-11 04:15:09.  I know it's been over 3 years ago, but any chance you still have those images and would be kind enough to post them?  I need to do something similar, but have limited Talend experience. 
Many thanks!
Six Stars

Re: Parse file containing multiple JSON documents

@_AnonymousUser@pedrohuo I have same requirement, Can you guys please tell me the solution, so I can achieve it.