Mapping a file types in Talend Open studio for bigdata

One Star

Mapping a file types in Talend Open studio for bigdata

Hi,
I have installed Talend Open studio for Bigdata 5.3.1 and what i wanted to achieve in this is explained below.
Suppose consider i have 3 files with different formats namely csv,xml and json.
For the first time when i load and read these files i will create job components and define the schema for each file. Also i wanted to write some external script such a way that , for second time if the file comes with the same field structure with different data of either csv or xml or json ,my script should call the talend and execute the job particular to that file format. In the sense ,for the 2nd time if the file to be read is xml then it should read the schema created for the xml in the first time, and the file coming is csv then it should use the schema created for the csv file in the 1st time.
So my script can be .sh or .bat file.So can i specify opening the talend and running the jobs based on the file type(csv,xml,json) ? Is it possible to do ?
Note: Talend open studio does not provide metadata tab under repository manager. So do we need to go for Context variable in this case?
Please help me what can be done in this scenario.
Thanks,
ShreeCS
Four Stars

Re: Mapping a file types in Talend Open studio for bigdata

There are many ways to accomplish what you've described...
Starting with what you have in mind, yes, you can create a job that has a context for the file type, and call the default Talend .sh or .bat and override that context like the following:
./MyTalendJob_run.sh --context_param TODAYDATE=2014-03-24 --context_param FileExt=json
I'm not sure what you mean by TOS does not have metadata in the Repository... It does - at least in my studio Smiley Happy
Now instead of even worrying about passing a particular context value - how about: you design your job to have three flows inside; each one starts with checking the extension of the file (.csv, .xml, .json); and depending on the extension, taking one path within the job that uses a particular schema? You could design the job to simply read all files within a starting directory, pick one at a time and process....
tFileList --> tJava (decide extension) --> onSubJob OK --> tFileInputDelimited (in case of .csv) ---> read contents with schema --> do something...
One Star

Re: Mapping a file types in Talend Open studio for bigdata

Hi,
I'm using TOS for BigData where it doesn't have metadata under repository manager.
One Star

Re: Mapping a file types in Talend Open studio for bigdata

Hi willm,
Using TOS 5.3.1 for Bigdata where i'm not able to see the on SubJob OK in the trigger.Also i have attached the screen shot below.
So in this case what should i do?
One Star

Re: Mapping a file types in Talend Open studio for bigdata

Hi,
How to use the output of the tFileList in the tJava component ?
Thanks,
ShreeCS
One Star

Re: Mapping a file types in Talend Open studio for bigdata

Hi Willm,
I followed the job process u suggested me.
Here its finding the file type(extension) and reading this. Now i have used only csv file and xml,later on i ll go for json as well. But the thing is i'm getting some error like "Content is not allowed in prolog. Nested exception: Content is not allowed in prolog" and also not reading the xml file properly.
Also after the tJava component i have 2 flows,one is for csv and other is for xml. I have used tFileInputDelimited and tFileInputXML and connected to jJava using onComponentOK. I'm not sure what i need to use here as i do not get the option like onSubjobOK in trigger.
Also i have attached the screenshots of my job and the error.
Thanks,
ShreeCS
One Star

Re: Mapping a file types in Talend Open studio for bigdata

Hi,
Im able to resolve the xml error. I changed xml type to document to do so.
I need one help how we can write expression in IF clause to check for the file ,if it is csv it should go to tFileInputDelimited and if xml then it shold go for tFileInputXML components. I tried writing expression in IF clause in talend ended up with errors.
Here in my case IF is between tJava and tFileInputDelimited.

Thanks,
ShreeCS
One Star

Re: Mapping a file types in Talend Open studio for bigdata

Hi,
Im able to resolve the xml error. I changed xml type to document to do so.
I need one help how we can write expression in IF clause to check for the file ,if it is csv it should go to tFileInputDelimited and if xml then it shold go for tFileInputXML components. I tried writing expression in IF clause in talend ended up with errors.
Here in my case IF is between tJava and tFileInputDelimited.
Thanks,
ShreeCS
Four Stars

Re: Mapping a file types in Talend Open studio for bigdata

See attached for a way to do this...
One Star

Re: Mapping a file types in Talend Open studio for bigdata

Hi,
Thanks for the help and guidance. Now the job is working fine.
Thanks,
ShreeCS
Four Stars

Re: Mapping a file types in Talend Open studio for bigdata

Another good way is to use directly in tJava
context.File_Ext = ((String)globalMap.get("tFileList_1_CURRENT_FILEEXTENSION"));
context.File_Ext = StringHandling.UPCASE(context.File_Ext);
This is much simpler and less complicated...
Vaibhav
Four Stars

Re: Mapping a file types in Talend Open studio for bigdata

Agreed, sanvaibhav :-). Thx
One Star

Re: Mapping a file types in Talend Open studio for bigdata

Hi,
One more thing is , for csv files i have defined the schema (field structure) for the 1st time. For the 2nd time , the csv file with the same field structure will be read using the schema defined already. But in case of XML file while reading for the 1st time i have to specify the Loop Xpath Query whre i will specify the root tag of the XML file. For the 2nd time if i read the xml file with the different root tag will not be read. So what can i do in this case ? how can i achieve this ?
Also one more thing is , i wanted save those files after reading. Here i'm using tLogRow to see the output in the console but i wanted to use tFileOutputDelimited for cs and tFileOutputXML for xml files. If i'm reading only one csv file and one xml file , i'm able to save those files . If i reading more than one csv xml file ,i'm not getting the result. For that also again i need to use tJava component and write the code for different output file. How this can be done?
Please guide me on this.
Thanks,
ShreeCS
Four Stars

Re: Mapping a file types in Talend Open studio for bigdata

Hi Shree,
Answer to first para question is - root node change means metadata change...--> can't read the file
Answer to second para question is - Answer lies in the same thread above... there is a screenshot by willm... Refer that and use similar logic to tweak.. use tIterate link from tFilelist to read multiple files one at a time and based on extension change flow using If clause.
Thanks
Vaibhav