I got a scenario for which I require your inputs and if possible a job on how to implement it.
Here is the scenario.
I have got a file in a particular location say
c:/inbound/srcfiles/aabbcc123/filename_20022018_aaabbcc123.csv. The location will be different each time (c:/inbound/srcfiles remains the same whereas whenever a file comes, a folder with a name as passphrase; in this case "aabbcc123" is created and the file is put inside that folder.)
what I want is,
I need to duplicate these files as what the content is, into 8 different file names say
filename8_20022018_aaabbcc123.csv into the same file location where the original file is sourced from.(i.e.c:/inbound/srcfiles/aabbcc123/filename_20022018_aaabbcc123.csv, where aabbcc123 is a passphrase that will be created during initial source file generation and these duplicated files are to be placed inside a folder named in passphrase, which is dynamic every time the source file is created.).
Remember the datetimestamp of all these duplicated file should be just as same as what it is in source and aabbcc123 is passphrase which will be unique everyday with the actual file and all these duplicated file should also have the same paraphrase just as the actual source file.
My intention is not to split the file into 8 different branches in a single job (each branch for 1 file) rather doing it in single flow. Can somebody help with a sample job which I can use to implement this scenario.
Thanks in Advance.
OK, this is how you need to do this. I won't do it for you, because you won't learn anything. But I am happy to talk you through the process.
1) You need to acquire the filename. Is this supplied via just sending in a filename via a context variable? Or maybe is this found using a tFileList component? Whichever way, you need to find the filename first.
2) You then need to split it up into components. You have a file path and a file name to start with. You also have to split the file so you can add to the name easily. Your filepath is "c:/inbound/srcfiles/filename_20022018_aaabbcc123.csv". So your path is "c:/inbound/srcfiles/". You can get this by searching for the last "/" in the name and use the substring String method (it is Java, Google it. There are lots of examples). The next part you need is the filename ("filename_20022018_aaabbcc123.csv"). You can get this by finding the last "/" and selecting the rest of the String. Look at this page if you are not used to Java (https://docs.oracle.com/javase/8/docs/api/java/lang/String.html). You then need to extract the first part of the filename which will be changed (adding a number to it). We know this is followed by an underscore ("_"). Search for that and return the String before that from the filename you have just found. Everything after that is the end of the filename. You should then have....
Path = "c:/inbound/srcfiles/"
Filename = "filename_20022018_aaabbcc123.csv"
FilenamePart1 = "filename"
FilenamePart2 = "_20022018_aaabbcc123.csv"
3) The above is the hardest part. Once that is done, it is easy. Just set up a tLoop component. Look here https://help.talend.com/reader/jomWd_GKqAmTZviwG_oxHQ/iL2h45sTpz~InS1_0iOj5w
I assume that you will always need to loop 8 times.
4) After the tLoop, link to a tJava and prepare the copy to filename using the values you created above. You can use the iteration of the tLoop to append to the FilenamePart1 to help number it. You will use some code like below.....
....to append to the FilenamePart1 variable. The code above gives you the current iteration of the Loop.
5) The next and final component is the tFileCopy component. Connect to this using an OnComponentOK link. Use the original filename variable for the source file and the newly created filename variable for the output file.
That is all it will take. It seems like a lot (and you will have some things to work out), but it really is not that hard once you get to grips with the flow.
Unfortunately I do not have a job that does precisely this. This is just how I would build one if I had to
Because the folder with a name as passphrase is changed each time, you can use a tfileList to iterate the file in the specified directory c:/inbound/srcfiles including the subdirectories, if there exist several files, get the only latest file (assuming filename_20022018_aaabbcc123.csv is always the latest file), and then do a loop to copy the file many times as suggested by rhall_2_0.
Join us live for a sneak peek!
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.