Unarchive each single file at a time

Two Stars tpk
Two Stars

Unarchive each single file at a time

Hi all,
I have some 15-20 zip files in one folder, as a part of my job i have to UN-archive each zip file and process the extracted .tsv file and load it in to the oracle table and after successfully loading the processed data in to the table all the extracted .tsv files should be deleted and the parent zip file which is extracted and processed recently should be moved to another folder(Say for exampleSmiley Tonguerocessed folder in the same directory), then the next zip file should be extracted and again the same process should be continued untill all the zip files in the directory are completed,processed and moved to another folder.
Right now i am running my job 15 times for each file which is pretty time taking, Can any one tell me how to loop the entire process for 15 times

For example let us say i have 15 zipped files in my C:/Test directory with names as below
1. file1_2012-04-06
2. file2_2012-04-06
.
.
.
15. file15_2012-04-06

Right now i am using the below job to run 15 times for each single file at a time for each single file
tFileUnarchive --------> tOracleBulkExec ---------> tFileCopy --------> tFileDelete

But running this 15 times is not some thing a professional way of doing, so any one help me to do the above entire process in a single stretch. Please tell me the settings that i need to configure and a sample job if you guys have any
Thanks and Regards,
Pavan
One Star

Re: Unarchive each single file at a time

Try to use tWaitForFile component and link it to your job/subjob with Iterate. Set 'Trigger action when' to 'a file is created' and 'Then' -> Continue loop. If you want to run the main job using tRunJob you can load the file name to a context variable and transmit to the child job.
Two Stars tpk
Two Stars

Re: Unarchive each single file at a time

Hi Kelebek,
I have a few questions here
1. Why should i use tWaitForFile? I have all the files placed in one folder, i didn't get why i should use tWaitForFile
2. What should i run in tRunJob? i mean to ask what process should be executed in main job?
3. Can you give any image of how to pass file name as context variable and how to use that and in which component should it be used?
Thanks you for the information you provided, it was a bit helpful, but i could not understand it clearly. Can you explain or show in more clear way?
Thanks and Regards,
Pavan
One Star

Re: Unarchive each single file at a time

1. tWaitForFile works in a loop. If you set the 'Max number of iterations' to 15 it will be executed 15 times and will pick your files one by one. In the component configuration you set the file mask which could be '*.txt' and then it will pick all your files with this extension.
2. In the main job is everything you implemented so far (unarchiving, loading, copying, deleteing).
3. The screen shot of the tMap_2 from the picture in my previous post attached. On the left is standard tWaitForFile schema. On the right is standard tContextLoad schema. 'FileName' is a context variable that you have to create in your jobs. When this is done you have to tick the 'Transmit whole context' box in the tRunJob component. And that's it, now you can use the variable in your source file name as 'context.FileName'.
Cheers
One Star

Re: Unarchive each single file at a time

If you wanted to, you could also automate the number of the tWaitForFile executions.
I would do it this way:
1. Use tSystem component to run a UNIX/Windows command to count the number of files in your folder and write it to a file.
2. Create a context variable to store the number of files (lets say 'FilesCount').
3. Use the file generated in step 1 as a source and load the number to the variable created in step 2 using tLoadContext.
4. Use the variable in the tWaitForFile component as the 'Max. number of iterations': Integer.parseInt(context.FilesCount)
And your process is fully automated.
Two Stars tpk
Two Stars

Re: Unarchive each single file at a time

Hi Kelebek,
You idea is mind blowing, can you show me how we can use tSystem component and what code should be written in that component, i am very poor in coding part. Can you give the images of the below mentioned process in a step by step process and the settings that need to be configured in each component. As i am learning, i am sure one day i will excel with guidance of you all people. You people are doing great job for a novice user like me.
Waiting for your reply!
Thanks and Regards,
Pavan

If you wanted to, you could also automate the number of the tWaitForFile executions.
I would do it this way:
1. Use tSystem component to run a UNIX/Windows command to count the number of files in your folder and write it to a file.
2. Create a context variable to store the number of files (lets say 'FilesCount').
3. Use the file generated in step 1 as a source and load the number to the variable created in step 2 using tLoadContext.
4. Use the variable in the tWaitForFile component as the 'Max. number of iterations': Integer.parseInt(context.FilesCount)
And your process is fully automated.
One Star

Re: Unarchive each single file at a time

Easier to use tFileList to get all the files and iterate in each of them.
One Star

Re: Unarchive each single file at a time

You'r right janhess, I didn't think of this! It is much, much easier...
Two Stars tpk
Two Stars

Re: Unarchive each single file at a time

Hi Kelebek,
I had put the images of my jobs i have done based on your ideas given for the requirement which i had asked. I had done a job called Sample_File_Copy_1 which will do all my (UN-archiving, Loading, Copying and Deleting process) for each single file. You advised me to define contexts and use the contexts in the particular components, the problem occurs here for me, i am totally confused here, how to define contexts and where to configure the defined context i mean in which component. I had put some images of the context group i had created and which i used in my Sample_File_Copy_1. I have few questions here,
1. How can we define file name as context?Because i will be having some 15 zip files in same directory with different names, How can i get the name of each file name passed as context at each iteration?
2. Will the file name passed as a context will be the file name of the zip file that is being UN-archived in my tUnarchive component which i am using in my job?
3.And after i copy the first processed file, I am deleting the file, i have a problem here which i forgot to mention earlier, here each time a zip file is UN-archived it extracts some 8-10 *.tsv files, so after copying the zip file to a different directory i should delete the Zip file including the all *.tsv files which got extracted from the particular processed zip file.How can this be accomplished, any ideas please?
Kindly put the images of the job and configuration properties that should be configured in a step by step process, Please kindly bare with me.
Sorry for troubling you.
Thanks and Regards,
Pavan
Two Stars tpk
Two Stars

Re: Unarchive each single file at a time

Hi all,
Any Ideas/Suggestions?
Kindly help me out, i am in deadly need to complete the above mentioned using talend.
Thanks and Regards,
Pavan
One Star

Re: Unarchive each single file at a time

Hi Pavan,
I think you should use the tFileList as janhess suggested. Example attached.
In the tFileCopy you should use the file name variable coming from tFileList before it (which are the unzipped files I suppose) - ((String)globalMap.get("tFileList_1_CURRENT_FILE"))
The same in tDelete component
But what I can see wrong in your job are links between components. Between the tFileList and tFileCopy should be Iterate link which will proceed copy and delete for all unzipped files. (you don't need copy and delete as separate components, there is 'remove source file' tick box in the tcopy component)
Key in the tContextLoad is defined by you context variable name, value is the file name in this case (different for each iteration). This value should be used as zip file name in your unarchive component.
Cheers
Two Stars tpk
Two Stars

Re: Unarchive each single file at a time

Hi Kelebek,
I didn't get you, can you explain in more clear way. The second image which you have been attached in your above post for tUnarchive is confusing. you have defined context.filepath in Archive file field and context.filename in extraction directory field. I think it should be vice-verse am i right?
Can you show me an image what values should be defined for each context variable defined? And where should i add tFileList, i mean to ask in which job and in which position the tFilelist component should be placed in the job.
Please give me a clear image of the entire job which should be run after which and the context variable values to be used.
I am feeling very embarrassed by asking you so many times, but i have no other options by any means i have to do this job and most worstly i am completely new to talend having nil exp
Thanks and Regards,
Pavan
One Star

Re: Unarchive each single file at a time

Reading the documentation on components will give you a better understanding of how they work.
Also look at the tutorials and other downloadable documentation.
Two Stars tpk
Two Stars

Re: Unarchive each single file at a time

Hi janhess,
I am confused and i am afraid that i will change my existing job and turn up in a big mess. I am not understanding how we can define the context value for file name.I have attached the image of my job
1.Sample_File_Copy_1.png ---> This is my main job which does all my UN-archiving,Loading,Copying and deletion. Now my big question is how can and what context value should be defined for FileName context variable to get each file name in each iteration
2.Sample_File_Copy_2.png -----> This job will use the context variables defined in first job and pass them to the tRunJob(Sample_File_Copy_1) component for each iteration with a new file name.But what kelebek and you advised is to use tFileList_1, my question is where in which job should i use and what is the sequence and what parameters should be passed or configured

Kindly put in a sequence so that i can understand easily.

Thanks and Regards,
Pavan
One Star

Re: Unarchive each single file at a time

Put a tFileList at the begining of the job to get all the archive files. Iterate from this for each archive and follow Kelebek's instructions.